Introduction to Retrieval Augmented Generation (RAG)

The era of Large Language Models (LLMs) has transformed the landscape of artificial intelligence, especially with the emergence of ChatGPT, which has amazed users with its conversational and memory capabilities.

However, despite being trained on vast corpora of data, these models have certain limitations when answering questions about specific domains or recent events.

This is where the concept of Retrieval Augmented Generation (RAG) comes into play.

Fine-Tuning vs. Retrieval Augmented Generation

Fine-tuning and RAG are two powerful techniques for specializing an LLM, but they operate differently. Fine-tuning directly adjusts the model's parameters to make it more specific, while RAG automatically enriches the questions posed to the LLM by adding relevant context.

The Limitations of Fine-Tuning

Large data volume required: Fine-tuning requires high-quality and diverse data in significant quantities to achieve good performance.
Costly infrastructure: The fine-tuning process demands powerful infrastructure, especially GPUs.
Data updates: Regularly updating the data used for fine-tuning is difficult and expensive.
Hallucinations: The issue of hallucinations (incorrect or fabricated answers generated by the model) persists.

RAG, on the other hand, overcomes these limitations by automatically adding relevant context to questions.

What Is RAG?

RAG combines two main techniques: retrieval and text generation. This method automatically adds relevant context to questions posed to the LLM, thereby increasing the accuracy and relevance of its responses.

How Does RAG Work?

Retrieval: When a question is asked, the RAG system starts by searching for the most relevant documents that might contain the answer. This search is performed using similarity techniques between the question and document segments (chunks).
Generation: Once the relevant documents are retrieved, the LLM uses this contextual information to generate a precise response.

Example of a Response Without RAG

Consider a concrete example illustrating how RAG works. In the diagram below, a user asks a question about our company, Aqsone. Since the LLM was not trained on data related to the company, it is unable to respond or provides an answer based on a hallucination.

‍

Example of a Response With RAG

In this new diagram, RAG is employed. Relevant documents are provided to RAG for searching information related to the question. Using similarity between the question and the documents, it identifies the document(s) that potentially contain elements needed to answer the question (Retrieval). The question, combined with these contextual elements, is then submitted to the LLM, which uses them to answer the question (Generation).

‍

Going Deeper

To delve further into the technique behind RAG, consider the following diagram and steps:

Document integration:some text
- Documents are transformed into chunks, meaning they are split into smaller text segments.
- These chunks are converted into vectors using an embedding model.
- The vectors are stored in a database.
Processing and answering the user's question:some text
- The user’s question is also transformed into a vector.
- A similarity analysis is performed between this vector and the vector database to return X chunks of context.
- The question and the text associated with the X chunks of context are sent to the LLM, which can then answer the question.

‍

Advanced RAG Techniques

RAG can be optimized using various advanced techniques, such as:

Reranking: Reordering retrieved chunks by relevance to improve response quality.
Embedding Model Fine-Tuning: Specializing the embedding model for specific input texts.
Multi-Query: Formulating multiple variations of a question to increase the likelihood of retrieving relevant information.
Self-Query: Using automatically constructed queries to refine the retrieval of context.

Use Cases and Demonstrations

Retrieval Augmented Generation (RAG) has powerful applications across various domains. Here are a few concrete examples where RAG can provide significant added value:

IT Support Chatbots

Description: These chatbots are designed to answer frequently asked questions and resolve user issues using a knowledge base derived from past questions/answers and tickets.

Example: An employee attempts to connect to a secure network but fails. They ask the IT support chatbot what to do. The RAG system searches past tickets and FAQs, finds a documented solution for this type of problem, and provides clear steps to resolve the issue.

Onboarding Assistance Chatbots

Description: These chatbots assist new employees by providing the necessary information and resources for their integration into the company, based on internal documentation.

Example: A new employee asks how to submit a leave request. The chatbot retrieves the relevant segments from the HR documentation that describe the leave request procedure and provides a detailed response with step-by-step instructions.

Discover our use case for the HR Assistant Chatbot.

Industrial Diagnostics Chatbots

Description: These chatbots support maintenance teams by providing rapid diagnostics for equipment malfunctions using sensor data and similar failure histories.

Example: A technician observes abnormal vibrations in a machine and consults the chatbot, which identifies a potential bearing issue based on past cases. The chatbot then provides diagnostic steps and repair recommendations, helping to resolve the problem more efficiently.

Contract Analysis Chatbots

Description: These chatbots help users examine and understand contract clauses, and can also identify potentially abusive clauses.

Example: A user submits a contract to the chatbot and asks if any clauses might be abusive. The system analyzes the document, flags specific clauses like excessive late penalties or unusual termination conditions, and explains the risks associated with these clauses.

Conclusion

Retrieval Augmented Generation represents a significant advancement in the field of AI, offering effective solutions to the limitations of traditional LLMs. By combining retrieval and generation, RAG enhances the accuracy, relevance, and specialization of responses, making interactions with AI systems more useful and pertinent.