What are foundation models for AI?

Updated December 2, 2025•6-minute read

A foundation model is a type of machine learning (ML) model that is pretrained to perform a range of tasks.

Until recently, artificial intelligence (AI) systems were specialized tools, meaning that an ML model would be trained for a specific application or single use case. The term foundation model (also known as a base model) entered our lexicon when experts began noticing 2 trends within the field of machine learning:

A small number of deep learning architectures were being used to achieve results for a wide variety of tasks.
New concepts can emerge from an artificial intelligence (AI) model that were not originally intended in its training.

Foundation models, such as IBM's Granite models, have been programmed to function with a general contextual understanding of patterns, structures, and representations. This foundational comprehension of how to communicate and identify patterns creates a baseline of knowledge that can be further modified, or fine tuned, to perform domain specific tasks for just about any industry.

Explore Red Hat AI

Two defining characteristics that enable foundation models to function are transfer learning and scale. Transfer learning refers to the ability of a model to apply information about one situation to another and build upon its internal “knowledge.”

Scale refers to hardware–specifically, graphics processing units (GPUs), that allow the model to perform multiple computations simultaneously, also known as parallel processing. GPUs are critical for training and deploying deep learning models, including foundation models, because they offer an ability to quickly process data and make complex statistical calculations.

Deep learning and foundation models
Many foundation models, especially those used in natural language processing (NLP), computer vision, and audio processing, are pretrained using deep learning techniques. Deep learning is a technology that underpins many (but not all) foundation models and has been a driving force behind many of the advancements in the field. Deep learning, also known as deep neural learning or deep neural networking, teaches computers to learn through observation, imitating the way humans gain knowledge.

Transformers and foundation models
While not all foundation models use transformers, a transformer architecture has proven to be a popular way to build foundation models that involve text such as ChatGPT, BERT, and DALL-E 2. Transformers enhance the capability of ML models by allowing it to capture contextual relationships and dependencies between elements in a sequence of data. Transformers are a type of artificial neural network (ANN) and are used for NLP models, however, they are typically not utilized in ML models that singularly use computer vision or speech processing models.

After a foundation model has been trained, it can rely on the knowledge gained from the huge pools of data to help solving problems–a skill that can provide valuable insights and contributions to organizations in many ways. Some of the general tasks a foundation model can perform include:

Natural language processing (NLP)
Recognizing context, grammar, and linguistic structures, a foundation model trained in NLP can generate and extract information from the data they are trained with. Further fine-tuning an NLP model by training it to associate text with sentiment (positive, negative, neutral) could prove useful for companies looking to analyze written messages such as customer feedback, online reviews, or social media posts. NLP is a broader field that encompasses the development and application of large language models (LLMs).

Computer vision
When the model can recognize basic shapes and features, it can begin to identify patterns. Further fine-tuning a computer vision model can lead to automated content moderation, facial recognition, and image classification. Models can also generate new images based on learned patterns.

Audio/speech processing
When a model can recognize phonetic elements, it can derive meaning from our voices which can lead to more efficient and inclusive communication. Virtual assistants, multilingual support, voice commands, and features like transcription promote accessibility and productivity.

With additional fine-tuning, organizations can design further specialized machine learning systems to address industry specific needs such as fraud detection for financial institutions, gene sequencing for healthcare, chatbots for customer service, and so much more.

Foundation models provide accessibility and a level of sophistication within the realm of AI that many organizations do not have the resources to attain on their own. By adopting and building upon foundation models, companies can overcome common hurdles such as:

Limited access to quality data: Foundation models provide a model built on data that most organizations don’t have access to.

Model performance/accuracy: Foundation models provide a quality of accuracy as a baseline that might take months or even years of effort for an organization to build themselves.

Time to value: Training a machine learning model can take a long time and requires many resources. Foundation models provide a baseline of pretraining that organizations can then fine tune to achieve a bespoke result.

Limited talent: Foundation models provide a way for organizations to make use of AI/ML without having to invest heavily in data science resources.

Expense management: Using a foundation model reduces the need for expensive hardware that is required for initial training. While there is still a cost associated with serving and fine tuning the finalized model, it is only a fraction of what it would cost to train the foundation model itself.

Learn how AI can work for the enterprise

While there are many exciting applications for foundation models, there are also a number of potential challenges to be mindful of.

Cost
Foundation models require significant resources to develop, train, and deploy. The initial training phase of foundation models requires vast amounts of generic data, consumes tens of thousands of GPUs, and often requires a group of machine learning engineers and data scientists.

Organizations can skip the hassle of training a model from scratch by using Models-as-a-Service (MaaS). To put models to use in a live application, they can take advantage of an inference server like vLLM. vLLM provides efficient inference capabilities at scale, allowing developers to squeeze every drop of performance potential from their hardware.

For extra large foundation models with billions of parameters, techniques like distributed inference and llm-d can provide even more efficiency and resource saving potential.

Interpretability
“Black box” refers to when an AI program performs a task within its neural network and doesn’t show its work. This creates a scenario where no one–including the data scientists and engineers who created the algorithm–is able to explain exactly how the model arrived at a specific output. The lack of interpretability in black box models can create harmful consequences when used for high-stakes decision making, especially in industries like healthcare, criminal justice, or finance. This black box effect can occur with any neural-network based model, not just foundation models.

Privacy and security
Foundation models require access to a lot of information, and sometimes that includes customer information or proprietary business data. This is something to be especially cautious about if the model is deployed or accessed by third-party providers.

Learn more about AI security

Accuracy and bias
If a deep learning model is trained on data that is statistically biased, or doesn’t provide an accurate representation of the population, the output can be flawed. Unfortunately, existing human bias is often transferred to artificial intelligence, thus creating risk for discriminatory algorithms and bias outputs. As organizations continue to leverage AI for improved productivity and performance, it’s critical that strategies are put in place to minimize bias. This begins with inclusive design processes and a more thoughtful consideration of representative diversity within the collected data.

Organizations can improve the accuracy of their model output by standardizing the way models access and use external data through Model Context Protocol (MCP). By creating a structured communication flow between the model and external documents (like internal company documents or real-time APIs), the outputs can be traced, verified, and screened for bias. Creating a formalized connection with MCP also makes it easier for a retrieval-augmented generation (RAG) system to be scaled and maintained as new data sources are introduced.

Operationalize AI with Red Hat AI

Red Hat® AI is our portfolio of AI products built on solutions our customers already trust.

Red Hat AI can help organizations:

Adopt and innovate with AI quickly.
Break down the complexities of delivering AI solutions.
Deploy anywhere.

Explore Red Hat AI

Easy access to IBM’s Granite family models

If you’re ready to experiment with foundation models, but aren’t sure what your business use cases are yet, start out with Red Hat® Enterprise Linux® AI. Red Hat Enterprise Linux is a platform for running LLMs in individual server environments. The solution includes Red Hat AI Inference Server, delivering fast, cost-effective inference across the hybrid cloud by maximizing throughput and minimizing latency.

Developers get quick access to a single server environment, complete with LLMs and AI tooling. It provides everything needed to tune models and build gen AI applications.

Explore Red Hat Enterprise Linux AI

Keep reading

vLLM vs. Ollama: When to use each framework

When integrating large language models (LLMs) into an AI application, vLLM is great for high-performance production, and Ollama is great for local development.

What is explainable AI?

Explainable AI (XAI) techniques, applied during the machine learning (ML) lifecycle, make AI outputs more understandable and transparent to humans.

Agentic AI vs. generative AI

Agentic AI and generative AI explained: Learn how each works, their unique strengths, and how they can collaborate for smarter solutions.

What are foundation models for AI?

Red Hat resources

Easy access to IBM’s Granite family models

The official Red Hat blog

All Red Hat product trials

Keep reading

vLLM vs. Ollama: When to use each framework

What is explainable AI?

Agentic AI vs. generative AI

Artificial intelligence resources

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links