LLMs vs SLMs

Copy URL

Large language models (LLMs) and small language models (SLMs) are both types of artificial intelligence (AI) systems that are trained to interpret human language, including programming languages. The key differences between them are usually the size of the data sets they’re trained on, the different processes used to train them on those data sets, and the cost/benefit of getting started for different use cases.

As their names suggest, both LLMs and SLMs are trained on data sets consisting of language, which distinguishes them from models trained on images (e.g., DALL·E) or videos (e.g., Sora). A few examples of language-based data sets include webpage text, developer code, emails, and manuals.

One of the most well-known applications of both SLMs and LLMs is generative AI (gen AI), which can generate—hence the name—unscripted content responses to many different, unpredictable queries. LLMs in particular have become well known among the general public thanks to the GPT-4 foundation model and ChatGPT, a conversational chatbot trained on massive data sets using trillions of parameters to respond to a wide range of human queries. Though gen AI is popular, there are also non-generative applications of LLMs and SLMs too like predictive AI.

Top considerations for building a production-ready AI/ML environment

The scope of GPT-4/ChatGPT is an excellent example that demonstrates one common difference between LLMs and SLMs: the data sets they’re trained on.

LLMs are usually intended to emulate human intelligence at a very broad level, and thus are trained on a wide range of large data sets. In the case of GPT-4/ChatGPT, that includes the entire public internet(!) up to a certain date. This is how ChatGPT has gained notoriety for interpreting and responding to such a wide range of queries from general users. However, this is also why it has sometimes gained attention for potentially incorrect responses, colloquially referred to as “hallucinations”—it lacks the fine-tuning and domain-specific training to accurately respond to every industry-specific or niche query.

SLMs on the other hand are typically trained on smaller data sets tailored to specific industry domains (i.e. areas of expertise). For example, a healthcare provider could use an SLM-powered chatbot trained on medical data sets to inject domain-specific knowledge into a user’s non-expert query about their health, enriching the quality of the question and response. In this case, the SLM-powered chatbot doesn’t need to be trained on the entire internet—every blog post or fictional novel or poem ever written—because it’s irrelevant to the healthcare use case.

In short, SLMs typically excel in specific domains, but struggle compared to LLMs when it comes to general knowledge and overall contextual understanding.

Red Hat Resources

The size and scope of data sets isn’t the only factor in differentiating SLMs from LLMs, and importantly a model can actually be considered an SLM even if it’s trained on the same data sets as an LLM. That’s because the training parameters and overall process—not just the amount of data—are part of defining each model. In other words, what’s important isn’t just how much data a model is trained on, but also what it is designed to learn from that data.

 

Parameters

In machine learning, parameters are internal variables that determine what predictions a model will make. In other words, parameters are how models decide what to do with the raw material of the data set. During training, an AI model continuously adjusts its parameters to improve predictions—think of it like turning a knob on a radio to find the right channel. Beyond the total number of parameters, other factors in this immensely complicated process include how parameters are layered into a model, how they’re weighted against each other, and how they’re optimized for pattern recognition versus simple memorization.

There’s no clear industry definition for how many parameters equate to an SLM versus an LLM. Instead, what’s most relevant is that SLMs typically contain far fewer parameters than LLMs because their use cases are more focused on specific knowledge domains. In the case of the LLM GPT-4/ChatGPT, it was purportedly trained on trillions of parameters so that it could respond to almost any user input. It’s worth noting, though, that GPT-4 is a uniquely large example of an LLM. There are many examples of smaller LLMs (not quite SLMs), like IBM’s open source Granite models, which range in size from 3 to 35 billion parameters. SLMs typically boast fewer parameters (sometimes still ranging in the billions) because the expected applications are much narrower.

 

Fine-tuning

Fine-tuning, another aspect of model training that can differentiate SLMs and LLMs, is the process of adapting and updating a pretrained model with new data. Typically, the purpose of fine-tuning is to customize a pretrained model to a specific use case. This involves introducing new data sets to test whether the existing parameters can still produce acceptable results in a new context. In general, fine-tuning is harder, takes more time, and is more resource intensive the more parameters a model contains, meaning LLMs require a heavier lift than SLMs.

Beyond parameters and fine-tuning, the type and complexity of the training process are also usually different between SLMs and LLMs. Understanding different types of model training, like “self-attention mechanisms” or “encoder-decoder model schemes,” requires a high level of data science expertise. The basic differences between SLM and LLM training are that SLMs usually favor approaches that are more resource efficient and focused on specific use cases than their LLM counterparts.

 

Bias

Although every AI model undergoes some degree of fine-tuning, the scope of most LLMs makes it impossible to tune them to every possible inference. LLMs are also typically trained on openly accessible data sets like the internet, whereas SLMs often train on industry- or company-specific data sets. This can introduce biases, such as the underrepresentation or misrepresentation of certain groups and ideas, or factual inaccuracies. Because LLMs and SLMs are language models, they can also inherit language biases related to dialect, geographical location, and grammar.

In short, any language model can inherit bias, but LLMs in particular, given their scope, introduce more opportunities for bias. With SLMs, which are trained on smaller data sets, you can more easily mitigate the biases that will inevitably occur.

Training any model for a business use case, whether LLM or SLM, is a resource-intensive process. However, training LLMs is especially resource intensive. In the case of GPT-4, a total of 25,000 NVIDIA A100 GPUs ran simultaneously and continuously for 90-100 days. Again, GPT-4 represents the largest end of the LLM spectrum. Other LLMs like Granite didn’t require as many resources. Training an SLM still likely requires significant compute resources, but far fewer than an LLM requires.

 

Resource requirements for training vs inference

It’s also important to note the difference between model training and model inference. As discussed above, training is the first step in developing an AI model. Inference is the process a trained AI model follows to make predictions on new data. For example, when a user asks ChatGPT a question, that invokes ChatGPT to return a prediction to the user—that process of generating a prediction is an inference.

Some pretrained LLMs, like the Granite family of models, can make inferences using the resources of a single high-power workstation (e.g., Granite models can fit on one V100-32GB GPU2), although many require multiple parallel processing units to generate data. Furthermore, the greater the number of concurrent users accessing an LLM, the slower the model runs inferences. SLMs on the other hand are usually designed to make inferences with the resources of a smartphone or other mobile device.

There’s no answer to the question “which model is better?” Instead, it depends on your organization’s plans, resources, expertise, timetable, and other factors. It’s also important to decide whether your use case necessitates training a model from scratch or fine-tuning a pretrained model. Common considerations between LLMs and SLMs include:

 

Cost

In general, LLMs require far more resources to train, fine-tune, and run inferences. Importantly, training is a less frequent investment. Computing resources are only needed while a model is being trained, which is an intermittent and not continuous task. However, running inferences represents an ongoing cost, and the need can increase as the use of the model is scaled to more and more users. In most cases, this requires cloud computing resources at scale, a significant on-premise resource investment, or both.

SLMs are frequently evaluated for low-latency use cases, like edge computing. That’s because they can often run with just the resources available on a single mobile device without needing a constant, strong connection to more significant resources.

From the Red Hat blog: Tips for making LLMs less expensive 

 

Expertise

Many popular pre-trained LLMs―like Granite, Llama, and GPT-4― offer a more “plug-and-play” option for getting started with AI. These are often preferable for organizations looking to begin experimenting with AI since they don’t need to be designed and trained from scratch by data scientists. SLMs, on the other hand, typically require specialized expertise in both data science and industry knowledge domains to accurately fine-tune on niche data sets.

 

Security

One potential risk of LLMs is the exposure of sensitive data through application programming interfaces (APIs). Specifically, fine-tuning an LLM on your organization’s data requires careful attention to compliance and company policy. SLMs may present a lower risk of data leakage because they offer a greater degree of control.

Red Hat AI offers generative and predictive AI capabilities, along with MLOps support, for building flexible, trusted AI solutions at scale across hybrid cloud environments. It helps accelerate AI adoption, abstract the complexities of delivering AI solutions, and bring flexibility to develop and deploy wherever your data resides.

In combination with Red Hat’s open hybrid cloud infrastructure, Red Hat AI lets organizations build tailored AI solutions for the enterprise, manage model and application lifecycles, adapt to hardware acceleration requirements, and deploy, run, and operate alongside critical workloads in a singular platform.

Explore the Red Hat AI portfolio 

 

Machine learning and AI for beginners

If you’re new to exploring ML and AI models, you can try InstructLab, a community-driven solution for training LLMs. Here, you can experiment and contribute directly to your AI model’s development for free.

Check out InstructLab 

 

Easily access IBM’s Granite family models

If you’re ready to go a step further, Red Hat® Enterprise Linux® AI is a foundation model platform where you can develop, test, and run Granite family LLMs for enterprise applications. Granite is a family of open-source licensed AI models that are fully supported and indemnified by Red Hat. Its open source approach encourages generative AI innovation while maintaining trust and security.

Learn more about Red Hat Enterprise Linux AI 

 

Scale for the enterprise

Red Hat® OpenShift® AI is a platform that can support your models at scale across hybrid cloud environments. You can train, prompt-tune, fine tune, and serve AI models for your unique use case and with your own data.

Together, these products provide a unified solution that allows data scientists and developers to collaborate so teams can bring models from experiments to production faster.

Learn more about Red Hat OpenShift AI 

 

Grow with partners

Additionally, Red Hat’s partner integrations open the door to a growing ecosystem of trusted AI tools built to work with open source platforms.

Check out our AI partners

Hub

The official Red Hat blog

Get the latest information about our ecosystem of customers, partners, and communities.

All Red Hat product trials

Our no-cost product trials help you gain hands-on experience, prepare for a certification, or assess if a product is right for your organization.

Keep reading

Predictive AI vs generative AI

Both gen AI and predictive AI have significant differences and use cases. As AI evolves, distinguishing between these different types helps clarify their distinct capabilities.

What is agentic AI?

Agentic AI is a software system designed to interact with data and tools in a way that requires minimal human intervention.

What are Granite models?

Granite is a series of LLMs created by IBM for enterprise applications. Granite foundation models can support gen AI use cases that involve language and code.

AI/ML resources

Featured products

  • Red Hat Enterprise Linux AI

    A foundation model platform that lets you seamlessly develop, test, and run Granite family large language models (LLMs) for enterprise applications.

  • Red Hat OpenShift AI

    An artificial intelligence (AI) platform that provides tools to rapidly develop, train, serve, and monitor models and AI-enabled applications.

Related articles