What are foundation models for AI?

Copy URL

A foundation model is a type of machine learning (ML) model that is pretrained to perform a range of tasks. 

Until recently, artificial intelligence (AI) systems were specialized tools, meaning that an ML model would be trained for a specific application or single use case. The term foundation model (also known as a base model) entered our lexicon when experts began noticing 2 trends within the field of machine learning:

  1. A small number of deep learning architectures were being used to achieve results for a wide variety of tasks.
  2. New concepts can emerge from an artificial intelligence (AI) model that were not originally intended in its training. 

Foundation models, such as IBM's Granite models, have been programmed to function with a general contextual understanding of patterns, structures, and representations. This foundational comprehension of how to communicate and identify patterns creates a baseline of knowledge that can be further modified, or fine tuned, to perform domain specific tasks for just about any industry.

Explore Red Hat AI

Two defining characteristics that enable foundation models to function are transfer learning and scale. Transfer learning refers to the ability of a model to apply information about one situation to another and build upon its internal “knowledge.”

 Scale refers to hardware–specifically, graphics processing units (GPUs), that allow the model to perform multiple computations simultaneously, also known as parallel processing. GPUs are critical for training and deploying deep learning models, including foundation models, because they offer an ability to quickly process data and make complex statistical calculations.

Deep learning and foundation models
Many foundation models, especially those used in natural language processing (NLP), computer vision, and audio processing, are pretrained using deep learning techniques. Deep learning is a technology that underpins many (but not all) foundation models and has been a driving force behind many of the advancements in the field. Deep learning, also known as deep neural learning or deep neural networking, teaches computers to learn through observation, imitating the way humans gain knowledge. 

Transformers and foundation models
While not all foundation models use transformers, a transformer architecture has proven to be a popular way to build foundation models that involve text such as ChatGPT, BERT, and DALL-E 2.  Transformers enhance the capability of ML models by allowing it to capture contextual relationships and dependencies between elements in a sequence of data. Transformers are a type of artificial neural network (ANN) and are used for NLP models, however, they are typically not utilized in ML models that singularly use computer vision or speech processing models.

Red Hat Resources

After a foundation model has been trained, it can rely on the knowledge gained from the huge pools of data to help solving problems–a skill that can provide valuable insights and contributions to organizations in many ways. Some of the general tasks a foundation model can perform include:

Natural language processing (NLP)
Recognizing context, grammar, and linguistic structures, a foundation model trained in NLP can generate and extract information from the data they are trained with. Further fine-tuning an NLP model by training it to associate text with sentiment (positive, negative, neutral) could prove useful for companies looking to analyze written messages such as customer feedback, online reviews, or social media posts. NLP is a broader field that encompasses the development and application of large language models (LLMs).

Computer vision
When the model can recognize basic shapes and features, it can begin to identify patterns. Further fine-tuning a computer vision model can lead to automated content moderation, facial recognition, and image classification. Models can also generate new images based on learned patterns. 

Audio/speech processing
When a model can recognize phonetic elements, it can derive meaning from our voices which can lead to more efficient and inclusive communication. Virtual assistants, multilingual support, voice commands, and features like transcription promote accessibility and productivity. 

With additional fine-tuning, organizations can design further specialized machine learning systems to address industry specific needs such as fraud detection for financial institutions, gene sequencing for healthcare, chatbots for customer service, and so much more.

Take the AI/ML assessment 

Foundation models provide accessibility and a level of sophistication within the realm of AI that many organizations do not have the resources to attain on their own. By adopting and building upon foundation models, companies can overcome common hurdles such as:

Limited access to quality data: Foundation models provide a model built on data that most organizations don’t have access to.

Model performance/accuracy: Foundation models provide a quality of accuracy as a baseline that might take months or even years of effort for an organization to build themselves. 

Time to value: Training a machine learning model can take a long time and requires many resources. Foundation models provide a baseline of pretraining that organizations can then fine tune to achieve a bespoke result. 

Limited talent: Foundation models provide a way for organizations to make use of AI/ML without having to invest heavily in data science resources. 

Expense management: Using a foundation model reduces the need for expensive hardware that is required for initial training. While there is still a cost associated with serving and fine tuning the finalized model, it is only a fraction of what it would cost to train the foundation model itself.

While there are many exciting applications for foundation models, there are also a number of potential challenges to be mindful of.

Cost
Foundation models require significant resources to develop, train, and deploy. The initial training phase of foundation models requires vast amounts of generic data, consumes tens of thousands of GPUs, and often requires a group of machine learning engineers and data scientists. 

Interpretability
“Black box” refers to when an AI program performs a task within its neural network and doesn’t show its work. This creates a scenario where no one–including the data scientists and engineers who created the algorithm–is able to explain exactly how the model arrived at a specific output. The lack of interpretability in black box models can create harmful consequences when used for high-stakes decision making, especially in industries like healthcare, criminal justice, or finance. This black box effect can occur with any neural-network based model, not just foundation models. 

Privacy and security 
Foundation models require access to a lot of information, and sometimes that includes customer information or proprietary business data. This is something to be especially cautious about if the model is deployed or accessed by third-party providers.

Accuracy and bias 
If a deep learning model is trained on data that is statistically biased, or doesn’t provide an accurate representation of the population, the output can be flawed. Unfortunately, existing human bias is often transferred to artificial intelligence, thus creating risk for discriminatory algorithms and bias outputs. As organizations continue to leverage AI for improved productivity and performance, it’s critical that strategies are put in place to minimize bias. This begins with inclusive design processes and a more thoughtful consideration of representative diversity within the collected data. 

Considerations for AI/ML

Red Hat® AI is our portfolio of AI products built on solutions our customers already trust. 

Red Hat AI can help organizations:

  • Adopt and innovate with AI quickly.
  • Break down the complexities of delivering AI solutions.
  • Deploy anywhere.

Explore Red Hat AI 

Easy access to IBM’s Granite family models

If you’re ready to experiment with foundation models, but aren’t sure what your business use cases are yet, start out with Red Hat® Enterprise Linux® AI. This foundation model platform helps develop, test, and run Granite family LLMs for enterprise applications.

Developers get quick access to a single server environment, complete with LLMs and AI tooling. It provides everything needed to tune models and build gen AI applications.

Red Hat AI also offers additional model alignment mechanisms to improve your LLM with a solution called InstructLab. Red Hat and IBM created InstructLab to  introduce an open source community-driven approach to enhancing LLM capabilities.

Explore Red Hat Enterprise Linux AI 

Hub

The official Red Hat blog

Get the latest information about our ecosystem of customers, partners, and communities.

All Red Hat product trials

Our no-cost product trials help you gain hands-on experience, prepare for a certification, or assess if a product is right for your organization.

Keep reading

Predictive AI vs generative AI

Both gen AI and predictive AI have significant differences and use cases. As AI evolves, distinguishing between these different types helps clarify their distinct capabilities.

What is agentic AI?

Agentic AI is a software system designed to interact with data and tools in a way that requires minimal human intervention.

What are Granite models?

Granite is a series of LLMs created by IBM for enterprise applications. Granite foundation models can support gen AI use cases that involve language and code.

AI/ML resources