What are small language models (SLMs)?
A small language model (SLM) is a smaller version of a large language model (LLM) that has more specialized knowledge, is faster to customize, and more efficient to run.
SLMs are trained to have domain-specific knowledge, unlike LLMs which have broad general knowledge. Due to their smaller size, SLMs require fewer computational resources for training and deployment, reducing infrastructure costs and enabling faster fine-tuning. The lightweight nature of SLMs makes them ideal for edge devices and mobile applications.
SLMs vs LLMs
SLMs and LLMs are both types of artificial intelligence (AI) systems that are trained to interpret human language, including programming languages. The key differences between LLMs and SLMs are usually the size of the data sets they’re trained on, the different processes used to train them on those data sets, and the cost/benefit of getting started for different use cases.
As their names suggest, both LLMs and SLMs are trained on data sets consisting of language, which distinguishes them from models trained on images (e.g., DALL·E) or videos (e.g., Sora). A few examples of language-based data sets include webpage text, developer code, emails, and manuals.
One of the most well-known applications of both SLMs and LLMs is generative AI (gen AI), which can generate—hence the name—unscripted content responses to many different, unpredictable queries. LLMs in particular have become well known among the general public thanks to the GPT-4 foundation model and ChatGPT, a conversational chatbot trained on massive data sets using trillions of parameters to respond to a wide range of human queries. Though gen AI is popular, there are also non-generative applications of LLMs and SLMs, like predictive AI.
LLMs and SLMs are usually trained on different data sets
The scope of GPT-4/ChatGPT is an excellent example that demonstrates one common difference between LLMs and SLMs: the data sets they’re trained on.
LLMs are usually intended to emulate human intelligence at a very broad level, and thus are trained on a wide range of large data sets. In the case of GPT-4/ChatGPT, that includes the entire public internet(!) up to a certain date. This is how ChatGPT has gained notoriety for interpreting and responding to such a wide range of queries from general users. However, this is also why it has sometimes gained attention for potentially incorrect responses, colloquially referred to as “hallucinations”—it lacks the fine-tuning and domain-specific training to accurately respond to every industry-specific or niche query.
SLMs on the other hand are typically trained on smaller data sets tailored to specific industry domains (i.e. areas of expertise). For example, a healthcare provider could use an SLM-powered chatbot trained on medical data sets to inject domain-specific knowledge into a user’s non-expert query about their health, enriching the quality of the question and response. In this case, the SLM-powered chatbot doesn’t need to be trained on the entire internet—every blog post or fictional novel or poem ever written—because it’s irrelevant to the healthcare use case.
In short, SLMs typically excel in specific domains, but struggle compared to LLMs when it comes to general knowledge and overall contextual understanding.
Red Hat resources
LLMs and SLMs require different resources
Training any model for a business use case, whether LLM or SLM, is a resource-intensive process. However, training LLMs is especially resource intensive. In the case of GPT-4, a total of 25,000 NVIDIA A100 GPUs ran simultaneously and continuously for 90-100 days. Again, GPT-4 represents the largest end of the LLM spectrum. Other LLMs like Granite didn’t require as many resources. Training an SLM still likely requires significant compute resources, but far fewer than an LLM requires.
Resource requirements for training vs inference
It’s also important to note the difference between model training and model inference. As discussed above, training is the first step in developing an AI model. Inference is the process a trained AI model follows to make predictions on new data. For example, when a user asks ChatGPT a question, that invokes ChatGPT to return a prediction to the user—that process of generating a prediction is an inference.
Some pretrained LLMs, like the Granite family of models, can make inferences using the resources of a single high-power workstation (e.g., Granite models can fit on one V100-32GB GPU2), although many require multiple parallel processing units to generate data. Furthermore, the greater the number of concurrent users accessing an LLM, the slower the model runs inferences. SLMs on the other hand are usually designed to make inferences with the resources of a smartphone or other mobile device.
Benefits of SLMs
There’s no answer to the question “which model is better?” Instead, it depends on your organization’s plans, resources, expertise, timetable, and other factors. It’s also important to decide whether your use case necessitates training a model from scratch or fine-tuning a pretrained model. Common considerations between LLMs and SLMs include:
Cost
In general, LLMs require far more resources to train, fine-tune, and run inferences. Importantly, training is a less frequent investment. Computing resources are only needed while a model is being trained, which is an intermittent and not continuous task. However, running inferences represents an ongoing cost, and the need can increase as the use of the model is scaled to more and more users. In most cases, this requires cloud computing resources at scale, a significant on-premise resource investment, or both.
SLMs are frequently evaluated for low-latency use cases, like edge computing. That’s because they can often run with just the resources available on a single mobile device without needing a constant, strong connection to more significant resources.
Expertise
Many popular pre-trained LLMs―like Granite, Llama, and GPT-4― offer a more “plug-and-play” option for getting started with AI. These are often preferable for organizations looking to begin experimenting with AI since they don’t need to be designed and trained from scratch by data scientists. SLMs, on the other hand, typically require specialized expertise in both data science and industry knowledge domains to accurately fine-tune on niche data sets.
Security
One potential risk of LLMs is the exposure of sensitive data through application programming interfaces (APIs). Specifically, fine-tuning an LLM on your organization’s data requires careful attention to compliance and company policy. SLMs may present a lower risk of data leakage because they offer a greater degree of control.
Limitations of SLMs
As businesses integrate SLMs into their workflows, it’s important to be aware of the limitations they present.
Bias
SLMs are trained on smaller data sets, meaning you can more easily mitigate the biases that will inevitably occur (compared to LLMs). However, as with language models of any size, training data can still introduce biases, such as the underrepresentation or misrepresentation of certain groups and ideas, or factual inaccuracies. Language models can also inherit biases related to dialect, geographical location, and grammar.
Teams should pay extra attention to quality of training data in order to limit biased outputs.
Narrow scope of knowledge
SLMs have a smaller pool of information to pull from as they generate responses. This makes them excellent for specific tasks, but less suitable for tasks that require a wide scope of general knowledge.
Teams might consider creating a collection of purpose-built SLMs to use alongside an LLM (or LLMs). This solution becomes especially interesting if teams are able to pair models with existing applications, creating an interconnected workflow of multiple language models working in tandem.
SLM use cases
The adaptability of SLMs makes them beneficial for a variety of use cases.
Chatbots
Use an SLM to train a chatbot on specialized materials. For example, a customer service chatbot might be trained with company-specific knowledge so it can answer questions and direct users to information.
Agentic AI
Integrate SLMs into an agentic AI workflow so they can complete tasks on behalf of a user.
Generative AI
SLMs can perform tasks such as generating new text, translating existing text, and summarizing copy.
How can Red Hat help?
Red Hat AI offers generative and predictive AI capabilities, along with MLOps support, for building flexible, trusted AI solutions at scale across hybrid cloud environments. It helps accelerate AI adoption, abstract the complexities of delivering AI solutions, and bring flexibility to develop and deploy wherever your data resides.
In combination with Red Hat’s open hybrid cloud infrastructure, Red Hat AI lets organizations build tailored AI solutions for the enterprise, manage model and application lifecycles, adapt to hardware acceleration requirements, and deploy, run, and operate alongside critical workloads in a singular platform.
Machine learning and AI for beginners
If you’re new to exploring ML and AI models, you can try InstructLab, a community-driven solution for training LLMs. Here, you can experiment and contribute directly to your AI model’s development for free.
Easily access IBM’s Granite family models
If you’re ready to go a step further, Red Hat® Enterprise Linux® AI is a foundation model platform where you can develop, test, and run Granite family LLMs for enterprise applications. Granite is a family of open-source licensed AI models that are fully supported and indemnified by Red Hat. Its open source approach encourages generative AI innovation while maintaining trust and security.
Scale for the enterprise
Red Hat® OpenShift® AI is a platform that can support your models at scale across hybrid cloud environments. You can train, prompt-tune, fine tune, and serve AI models for your unique use case and with your own data.
Together, these products provide a unified solution that allows data scientists and developers to collaborate so teams can bring models from experiments to production faster.
Grow with partners
Additionally, Red Hat’s partner integrations open the door to a growing ecosystem of trusted AI tools built to work with open source platforms.
The official Red Hat blog
Get the latest information about our ecosystem of customers, partners, and communities.