Blog article
See all stories »

LLMs, RAGs and a Smart Golden repository

In today’s world when data has become the integral driver for the businesses, the need for the streamlining and effectively managing the dataflows has become one of the top priorities of the businesses. There are multiple challenges in doing so. The shear amount of data is one such challenge, the variety of data we are witnessing is completely a different challenge altogether. Solving the problem for the large amount of data gave rise to technologies like Bigdata and Datalakes. The processing and generating insights from multiple datapoints in real-time are different challenges altogether and a costly one too. Particularly in case of Banks and Financial services companies which are still in phase of digital transformation and there are multiple sources of data distributed across various departments. In financial sector, which is inundated with extensive textual material like corporate reports, regulatory documents, 10Ks and 10Qs, broker analyses, and various other reports, the potential impact of    Generative AI is even more pronounced. Historically, well-paid financial experts have devoted considerable time navigating through lengthy documents and comprehensive reports to collect, comprehend, condense, and convey insights.

Fortunately, we are witnessing a revolution in field of AI today by the advent of Generative AI. These advancements offer the potential to enable professionals in finance, including equity research analysts, risk managers, private equity associates, and sustainability researchers, to shift their attention away from routine data processing and towards more pivotal responsibilities. This includes tasks like analyzing significant insights, drawing conclusions, and making well-informed and strategic decisions at an accelerated pace.

In our case in Societe Generale we started exploring Gen AI as early in 2020 when OpenAI just came up with GPT-3. As part of Digital transformation initiative, the idea was to improve the capability of the operations at the same reducing the costs. The solution we came up was a golden repository of data which is data neutral (be it textual, audio, video etc.) as well as scalable enough to manage large dataset with smart inference mechanism in real-time.

 

RAG Framework and Knowledge Base

The approach of the solution involved building a RAG (Retrieval Augmented Generation) pipeline from the scratch. RAG is nothing but an AI framework for retrieving facts from an external knowledge base to ground large language models (LLMs) on the most accurate, up-to-date information and to give users insight into LLMs' generative process. Compared to a plain Generative AI which only involves LLMs and LLM servers. The RAG has three basic components i.e.

  1. Knowledge Base: It the golden repository in which all the data is fed. It is generally a vector database which stores data in form of vector objects with combination of embedding generating LLM model. The vectors are nothing but numbers that we get when we index the data using an embeddings generator. This is done by using LLM models which generates embedding from a dataset. The data isolation and scalability are achieved using classes and tenants. Each class can have more than 50,000 tenants and each tenant can support more than 100,000 objects.
  2. Retriever: The retriever engine matches user query with the vector database and top results is passed to the generator. The matching is performed by comparing percentage similarity between query and vector objects stored in the vector database.
  3. Generator: It is the LLM model which generates the response as per the search results from vector database which is passed as context to it. The prompt is added on top of the context which can be configured by either the developer or end-user before passing it to the LLM model.

 

Transition to Golden Repository

Clearly, we can see that LLM model is being utilized just for its comprehension capability which is controlled and calibrated by the RAG framework. In order to expand the Knowledge Base by feeding different data streams we can control the feeding mechanism in vector database. Here the choice of LLM development framework becomes quite critical. There are many choices for these frameworks like Langchain, Llamaindex, hugginface hub etc. It is an interesting point to note that quite a few of them support cross compatibility and interoperability as well. Infact it is actually these which are rightnow setting the standards in terms of architecture and features for the development activity in Gen AI domain. According to my view Llamaindex rightnow provides a more customized as well as general framework to configure the data pipelining at the development level. Let us understand this in a bit more detail. The data which is fed to RAG is first parsed in the form of documents. Llamaindex also supports frameworks like unstructured which do the heavy lifting of ETL tasks for multiple data stream ranging from text to images returning a streamlined output in form of documents. For data streams like audio and videos we can leverage opensource speech to text models like OpenAI whisper to convert speech data into textual data which is easily parsed by Llamaindex. These models are multilingual and dialect neutral which is critical in ensuring data quality of knowledge base. The documents are then broken into chunks and passed to the embedding generator which generates the vector embedding for each chunk. These vector embeddings are then stored in the vector database.

 

Advantages of using RAG approach

For a domain as regulated, dynamic, and nuanced as financial services, directly using LLMs present significant limitations that impede their full utility. Choosing the right AI framework and solution approach is instrumental in harnessing its true potential for financial services. RAG approach transitioning into golden repository gives us following advantages:

  • Comprehensiveness and timeliness: Most of the LLM models are pretrained models trained on past data. We need to finetune them with present data to incorporate it as well. Regular finetuning and retraining of these models requires significant infrastructure investments which is costly and infeasible. RAG just utilizes the comprehension capabilities of LLM rather than its pretrained data. All relevant data for us is managed at knowledge base level which can be easily updated like a normal database. Different response models are easily configurable at retriever level to manage the context size in order to control comprehensiveness of final response from LLM.
  • Transparency and trustworthiness: The final response we get from LLMs in RAG framework maintains transparency by returning the source of response as well which points to the original data stream.
  • Credibility and accuracy: We can easily compare the final response of the query from RAG with the source cited to check the credibility and accuracy. The hallucination is mitigated by only passing the relevant context through selecting relevant objects by performing efficient vector matching between query and vector database behind the scenes.

 

Conclusion

The simplification by means of creating a golden repository for multiple data streams expands the horizon of the problem statement in search for general solutions. RAG is one such approach to achieve this. A powerful LLM without relevant context fares poorly against an average LLM with better knowledge base. This approach is by no means a perfect one since the field is ever evolving. The idea is to come up with general solutions by assimilating multiple usecases into one general problem statement to best utilize the potential of LLMs. Enhancing capabilities of LLMs using smart frameworks is the important key. It saves both time and resources without compromising on desired results.

 

5862

Comments: (0)

Now hiring