Retrieval-augmented generation (RAG) vs. fine-tuning
Both RAG and fine-tuning aim to improve large language models (LLMs). RAG does this without modifying the underlying LLM, while fine-tuning requires adjusting the weights and parameters of an LLM. Often, you can customize a model by using both fine-tuning and RAG architecture.
Building on top of large language models
An LLM is a type of artificial intelligence (AI) that uses machine learning (ML) techniques to understand and produce human language. These ML models can generate, summarize, translate, rewrite, classify, categorize, and analyze text—and more. The most popular use for these models at an enterprise level is to create a question-answering system, like a chatbot.
LLM foundation models are trained with general knowledge to support a broad range of use cases. However, they likely aren’t equipped with domain-specific knowledge that’s unique to your organization. RAG and fine-tuning are 2 ways to adjust and inform the LLM with the data you want so it produces the output you want.
For example, let’s say you’re building a chatbot to interact with customers. In this scenario, the chatbot is a representative of your company, so you’ll want it to act like a high-performing employee. You’ll want the chatbot to understand nuances about your company, like the products you sell and the policies you uphold. Just as you’d train an employee by giving them documents to study and scripts to follow, you train a chatbot by using RAG and fine-tuning to build upon the foundation of knowledge it arrives with.
What is RAG and how does it work?
RAG supplements the data within an LLM by retrieving information from sources of your choosing, such as data repositories, collections of text, and pre-existing documentation. After retrieving the data, RAG architectures process it into an LLM’s context and generate an answer based on the blended sources.
RAG is most useful for supplementing your model with information that’s regularly updated. By providing an LLM with a line of communication to your chosen external sources, the output will be more accurate. And because you can engineer RAG to cite its source, it’s easy to trace how an output is formulated, which creates more transparency and builds trust.
Back to our example: If you were to build a chatbot that answers questions like, “What is your return policy?”, you could use a RAG architecture. You could connect your LLM to a document that details your company’s return policy and direct the chatbot to pull information from it. You could even instruct the chatbot to cite its source and provide a link for further reading. And if your return-policy document were to change, the RAG model would pull the most recent information and serve it to the user.
Use cases for RAG
RAG can source and organize information in a way that makes it simple for people to interact with data. With a RAG architecture, models can fetch insights and provide an LLM with context from both on-premise and cloud-based data sources. This means external data, internal documents, and even social media feeds can be used to answer questions, provide context, and inform decision making.
For example, you can create a RAG architecture that, when queried, provides specific answers regarding company policies, procedures, and documentation. This saves time that would otherwise be spent searching for and interpreting a document manually.
What is fine-tuning?
Think of fine-tuning as a way to communicate intent to the LLM so the model can tailor its output to fit your goals. Fine-tuning is the process of training a pretrained model further with a smaller, more targeted data set so it can more effectively perform domain-specific tasks. This additional training data is embedded into the model’s architecture.
Let’s return to our chatbot example. Say you want your chatbot to interact with patients in a medical context. It’s important that the model understands medical terminology related to your work. Using fine-tuning techniques, you can ensure that when a patient asks the chatbot about “PT services,” it will understand that as “physical therapy services” and direct them to the right resources.
Use cases for fine-tuning
Fine-tuning is most useful for training your model to interpret the information it has access to. For instance, you can train a model to understand the nuances and terminologies of your specific industry, such as acronyms and organizational values.
Fine-tuning is also useful for image-classification tasks. For example, if you’re working with magnetic resonance imagining (MRI), you can use fine-tuning to train your model to identify abnormalities.
Fine-tuning can also help your organization apply the right tone when communicating with others―especially in a customer-support context. It lets you train a chatbot to analyze the sentiment or emotion of the person it’s interacting with. Further, you can train the model to respond in a way that serves the user while upholding your organization’s values.
Considerations for choosing RAG vs. fine-tuning
Understanding the differences between RAG and fine-tuning can help you make strategic decisions about which AI resource to deploy to suit your needs. Here are some basic questions to consider:
What’s your team’s skill set?
Customizing a model with RAG requires coding and architectural skills. Compared to traditional fine-tuning methods, RAG provides a more accessible and straightforward way to get feedback, troubleshoot, and fix applications. Fine-tuning a model requires experience with natural language processing (NLP), deep learning, model configuration, data reprocessing, and evaluation. Overall, it can be more technical and time consuming.
Is your data static or dynamic?
Fine-tuning teaches a model to learn common patterns that don’t change over time. Because it’s based on static snapshots of training data sets, the model’s information can become outdated and require retraining. Conversely, RAG directs the LLM to retrieve specific, real-time information from your chosen sources. This means your model pulls the most up-to-date data to inform your application, promoting accurate and relevant output.
What’s your budget?
Traditionally, fine-tuning is a deep learning technique that requires a lot of data and computational resources. Historically, to inform a model with fine-tuning, you need to label data and run training on costly, high-end hardware. Additionally, the performance of the fine-tuned model depends on the quality of your data, and obtaining high-quality data can be expensive.
Comparatively, RAG tends to be more cost efficient than fine-tuning. To set up RAG, you build pipeline systems to connect your data to your LLM. This direct connection cuts down on resource costs by using existing data to inform your LLM, rather than spending time, energy, and resources to generate new data.
How Red Hat can help
Red Hat’s open source solutions and AI partner ecosystem can help you implement RAG and fine-tuning into your large language model operations (LLMOps) process.
Experiment with fine-tuning using Instructlab
Created by Red Hat and IBM, InstructLab is an open source community project for contributing to LLMs used in generative AI (gen AI) applications. It provides a framework that uses synthetic data to make LLM fine-tuning more accessible.
How InstructLab's synthetic data enhances LLMs
Create your own foundation model with Red Hat Enterprise Linux AI
When your enterprise is ready to build applications with gen AI, Red Hat® Enterprise Linux® AI provides the foundation model platform needed to address your use cases with your data, faster.
Red Hat Enterprise Linux AI unites the Granite family of open source-licensed LLMs and InstructLab model alignment tools, all in a single server environment. This means it’s more accessible for domain experts who don’t have a data science background to fine-tune and contribute to an AI model that’s scalable across the hybrid cloud.
Red Hat Enterprise Linux AI is also backed by the benefits of a Red Hat subscription, which includes trusted enterprise product distribution, 24x7 production support, extended model lifecycle support, and Open Source Assurance legal protections.
Scale your applications with Red Hat OpenShift AI
Once you train your model with Red Hat Enterprise Linux AI, you can scale it for production through Red Hat OpenShift® AI.
Red Hat OpenShift AI is a flexible, scalable machine learning operations (MLOps) platform with tools to help you build, deploy, and manage AI-enabled applications. It provides the underlying workload infrastructure, such as an LLM to create embeddings, the retrieval mechanisms required to produce outputs, and access to a vector database.