RAG vs. fine-tuning

Published September 17, 2024•5-minute read

Both RAG and fine-tuning aim to improve large language models (LLMs). RAG does this without modifying the underlying LLM, while fine-tuning requires adjusting the weights and parameters of an LLM. Often, you can customize a model by using both fine-tuning and RAG architecture.

Explore Red Hat AI

An LLM is a type of artificial intelligence (AI) that uses machine learning (ML) techniques to understand and produce human language. These ML models can generate, summarize, translate, rewrite, classify, categorize, and analyze text—and more. The most popular use for these models at an enterprise level is to create a question-answering system, like a chatbot.

LLM foundation models are trained with general knowledge to support a broad range of use cases. However, they likely aren’t equipped with domain-specific knowledge that’s unique to your organization. RAG and fine-tuning are 2 ways to adjust and inform the LLM with the data you want so it produces the output you want.

For example, let’s say you’re building a chatbot to interact with customers. In this scenario, the chatbot is a representative of your company, so you’ll want it to act like a high-performing employee. You’ll want the chatbot to understand nuances about your company, like the products you sell and the policies you uphold. Just as you’d train an employee by giving them documents to study and scripts to follow, you train a chatbot by using RAG and fine-tuning to build upon the foundation of knowledge it arrives with.

Watch: RAG vs. fine-tuning

RAG supplements the data within an LLM by retrieving information from sources of your choosing, such as data repositories, collections of text, and pre-existing documentation. After retrieving the data, RAG architectures process it into an LLM’s context and generate an answer based on the blended sources.

RAG is most useful for supplementing your model with information that’s regularly updated. By providing an LLM with a line of communication to your chosen external sources, the output will be more accurate. And because you can engineer RAG to cite its source, it’s easy to trace how an output is formulated, which creates more transparency and builds trust.

Back to our example: If you were to build a chatbot that answers questions like, “What is your return policy?”, you could use a RAG architecture. You could connect your LLM to a document that details your company’s return policy and direct the chatbot to pull information from it. You could even instruct the chatbot to cite its source and provide a link for further reading. And if your return-policy document were to change, the RAG model would pull the most recent information and serve it to the user.

Learn more about RAG

Use cases for RAG

RAG can source and organize information in a way that makes it simple for people to interact with data. With a RAG architecture, models can fetch insights and provide an LLM with context from both on-premise and cloud-based data sources. This means external data, internal documents, and even social media feeds can be used to answer questions, provide context, and inform decision making.

For example, you can create a RAG architecture that, when queried, provides specific answers regarding company policies, procedures, and documentation. This saves time that would otherwise be spent searching for and interpreting a document manually.

Learn how RAG is used in software engineering

Think of fine-tuning as a way to communicate intent to the LLM so the model can tailor its output to fit your goals. Fine-tuning is the process of training a pretrained model further with a smaller, more targeted data set so it can more effectively perform domain-specific tasks. This additional training data is embedded into the model’s architecture.

LoRA and QLoRA are both parameter-efficient fine-tuning (PEFT) techniques that can help users optimize costs and compute resources.

Let’s return to our chatbot example. Say you want your chatbot to interact with patients in a medical context. It’s important that the model understands medical terminology related to your work. Using fine-tuning techniques, you can ensure that when a patient asks the chatbot about “PT services,” it will understand that as “physical therapy services” and direct them to the right resources.

Use cases for fine-tuning

Fine-tuning is most useful for training your model to interpret the information it has access to. For instance, you can train a model to understand the nuances and terminologies of your specific industry, such as acronyms and organizational values.

Fine-tuning is also useful for image-classification tasks. For example, if you’re working with magnetic resonance imaging (MRI), you can use fine-tuning to train your predictive AI model to identify abnormalities.

Explore predictive AI use cases

Fine-tuning can also help your organization apply the right tone when communicating with others―especially in a customer-support context. It lets you train a chatbot to analyze the sentiment or emotion of the person it’s interacting with. Further, you can train your generative AI model to respond in a way that serves the user while upholding your organization’s values.

Explore generative AI use cases

Understanding the differences between RAG and fine-tuning can help you make strategic decisions about which AI resource to deploy to suit your needs. Here are some basic questions to consider:

What’s your team’s skill set?

Customizing a model with RAG requires coding and architectural skills. Compared to traditional fine-tuning methods, RAG provides a more accessible and straightforward way to get feedback, troubleshoot, and fix applications. Fine-tuning a model requires experience with natural language processing (NLP), deep learning, model configuration, data reprocessing, and evaluation. Overall, it can be more technical and time consuming.

Is your data static or dynamic?

Fine-tuning teaches a model to learn common patterns that don’t change over time. Because it’s based on static snapshots of training data sets, the model’s information can become outdated and require retraining. Conversely, RAG directs the LLM to retrieve specific, real-time information from your chosen sources. This means your model pulls the most up-to-date data to inform your application, promoting accurate and relevant output.

What’s your budget?

RAG is typically considered to be more cost efficient than fine-tuning. To implement a RAG architecture, you build pipeline systems to connect your data to your LLM. This approach saves on cost because it uses existing data to inform your LLM. This stands in contrast to the significant resources required by fine-tuning to perform specialized data labeling and the intensive computational power needed for repeated model training.

While fine-tuning is historically considered the more expensive option, developments like vLLM are helping to close the budget gap. vLLM is an inference server and engine that improves the cost efficiency of serving fine-tuned models.

Learn more about vLLM

Red Hat’s open source solutions and AI partner ecosystem can help you implement RAG and fine-tuning into your large language model operations (LLMOps) process.

Red Hat® Enterprise Linux® AI provides a platform for running LLMs in individual server environments. The solution includes Red Hat AI Inference Server, delivering fast, cost-effective inference across the hybrid cloud by maximizing throughput and minimizing latency.

Red Hat Enterprise Linux AI is also backed by the benefits of a Red Hat subscription, which includes trusted enterprise product distribution, 24x7 production support, extended model lifecycle support, and Open Source Assurance legal protections.

Scale your applications with Red Hat OpenShift AI

Once you train your model with Red Hat Enterprise Linux AI, you can scale it for production through Red Hat OpenShift® AI.

Red Hat OpenShift AI is a flexible, scalable machine learning operations (MLOps) platform with tools to help you build, deploy, and manage AI-enabled applications. It provides the underlying workload infrastructure, such as an LLM to create embeddings, the retrieval mechanisms required to produce outputs, and access to a vector database.

Keep reading

What is explainable AI?

Explainable AI (XAI) techniques, applied during the machine learning (ML) lifecycle, make AI outputs more understandable and transparent to humans.

Agentic AI vs. generative AI

Agentic AI and generative AI explained: Learn how each works, their unique strengths, and how they can collaborate for smarter solutions.

How vLLM accelerates AI inference: 3 enterprise use cases

This article highlights 3 real-world examples of how well-known companies are successfully using vLLM.

RAG vs. fine-tuning

Red Hat resources

Use cases for RAG

Use cases for fine-tuning

What’s your team’s skill set?

Is your data static or dynamic?

What’s your budget?

Scale your applications with Red Hat OpenShift AI

The official Red Hat blog

All Red Hat product trials

Keep reading

What is explainable AI?

Agentic AI vs. generative AI

How vLLM accelerates AI inference: 3 enterprise use cases

Artificial intelligence resources

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links