What are large language models?

Updated August 15, 2025•8-minute read

A large language model (LLM) is a type of artificial intelligence that uses machine learning techniques to understand and generate human language. LLMs can be incredibly valuable for companies and organizations looking to automate and enhance various aspects of communication and data processing.

LLMs use neural network-based models and often employ natural language processing (NLP) techniques to process and calculate their output. NLP is a field of artificial intelligence (AI) that focuses on enabling computers to understand, interpret, and generate text, which in turn allows LLMs to perform tasks such as text analysis, sentiment analysis, language translation, and speech recognition.

Explore Red Hat AI

The full lifecycle of an LLM involves several stages, including:

Data preparation. Collecting, cleaning, and organizing raw data for the LLM to train on. This step involves data cleaning (removing duplicates and errors), filtering through data (removing biased, obscene, or copyrighted content), and tokenization (breaking the text into units the model can understand).

Training. LLMs form an understanding of language by building knowledge through training. The first stage in training an LLM is called pretraining, and involves a method called self-supervised learning (SSL). SSL is a type of unsupervised learning, which involves providing a machine learning model with raw data sets–hundreds of billions of words and phrases–to study and learn from.

Next, an LLM continues its training journey with fine-tuning and alignment. This is often done using methods such as:

Supervised learning: You give the model a dataset where all the input data is labelled with the correct answer. Its job is to study the relationship between the input data and its correct label. Supervised learning can help predict what will happen.
Reinforcement learning: You give the model a goal and a set of rules, but no labelled data. Its job is to learn by interacting and getting “rewarded” or “penalized” for its actions. Reinforcement learning can help make suggestions about what actions to take next.

During training, the computer draws information from the data, creates connections, and "learns" about language. The end result is a model that is able to capture intricate relationships between words and sentences.

Inference. Once the model is trained, it enters the inference phase. At this point, the LLM can process live data to make real-time predictions. This is when an inference server becomes critical.

Running within the cloud infrastructure, an inference server acts as a bridge between hardware and the user-facing application. Its role is to optimize the model by managing resource requests and making sure processing happens as quickly as possible.

A leading tool in this space is vLLM. vLLM is a memory-efficient inference server and engine, designed to improve the speed and processing power of large language models in a hybrid cloud setting.

LMMs require lots of resources

Because they are constantly calculating probabilities to find connections, LLMs require significant computational resources. One of the resources they draw computing power from are graphics processing units (GPUs). A GPU is a specialized piece of hardware designed to handle complex parallel processing tasks, making it perfect for ML and deep learning models that require lots of calculations, like an LLM.

Certain techniques can help compress your models to optimize for speed, without sacrificing accuracy. If you are tight on resources, LoRA and QLoRA are resource-efficient fine-tuning techniques that can help users optimize their time and compute resources.

vLLM is an inference server that helps LLMs use GPUs more efficiently. It uses techniques like continuous batching, PagedAttention technology, and quantization to make better use of LLM memory storage.

Find out how 3 well known organizations are using vLLM to scale more efficiently.

3 real-world vLLM use cases

LLMs and transformers

GPUs are also instrumental in accelerating the training and operation of transformers–a type of software architecture specifically designed for NLP tasks that most LLMs implement. Transformers are fundamental building blocks for popular LLM foundation models such as ChatGPT, Claude, and Gemini.

A transformer architecture enhances the capability of a machine learning model by efficiently capturing contextual relationships and dependencies between elements in a sequence of data, such as words in a sentence. It achieves this by employing self-attention mechanisms–also known as parameters–that enable the model to weigh the importance of different elements in the sequence, improving its understanding and performance. Parameters define boundaries, and boundaries are critical for making sense of the enormous amount of data that deep learning algorithms must process.

Transformer architecture involves millions or billions of parameters, which enable it to capture intricate language patterns and nuances. In fact, the term “large” in “large language model” refers to the extensive number of parameters necessary to operate an LLM.

LLMs and deep learning

The transformers and parameters that help guide the process of unsupervised learning with an LLM are part of a more broad structure referred to as deep learning. Deep learning is an artificial intelligence technique that teaches computers to process data using an algorithm inspired by the human brain. Also known as deep neural learning or deep neural networking, deep learning techniques allow computers to learn through observation, imitating the way humans gain knowledge.

The human brain contains many interconnected neurons, which act as information messengers when the brain is processing information (or data). These neurons use electrical impulses and chemical signals to communicate with one another and transmit information between different areas of the brain.

Artificial neural networks (ANNs)–the underlying architecture behind deep learning–are based on this biological phenomenon but formed by artificial neurons that are made from software modules called nodes. These nodes use mathematical calculations (instead of chemical signals as in the brain) to communicate and transmit information within the model.

Learn about the operations of LLMs

Modern LLMs can understand and utilize language in a way that has been historically unfathomable to expect from a personal computer. These machine learning models can generate text, summarize content, translate, rewrite, classify, categorize, analyze, and more. All of these abilities provide humans with a powerful toolset to augment our creativity and improve productivity to solve difficult problems.

What is Models-as-a-Service?

Some of the most common uses for LLMs in a business setting may include:

Automation and efficiency

LLMs can help supplement or entirely take on the role of language-related tasks such as customer support, data analysis, and content generation. This automation can reduce operational costs while freeing up human resources for more strategic tasks.

Generating insight

LLMs can quickly scan large volumes of text data, enabling businesses to better understand market trends and customer feedback by scraping sources like social media, reviews, and research papers, which can in turn help inform business decisions.

Creating a better customer experience

LLMs help businesses deliver highly personalized content to their customers, driving engagement and improving the user experience. This may look like implementing a chatbot to provide round-the-clock customer support, tailoring marketing messages to specific user personas, or facilitating language translation and cross-cultural communication.

Explore generative AI use cases

While there are many potential advantages to using an LLM in a business setting, there are also potential limitations to consider:

Cost

LLMs require significant resources to develop, train, and deploy. This is why many LLMs are built from foundation models, which are pretrained with NLP abilities and provide a baseline understanding of language from which more complex LLMs can be built on top of. Open source-licensed LLMs are free for use, making them ideal for organizations that otherwise wouldn't be able to afford to develop an LLM on their own.

Speed

LLM prompts can be complex and nonuniform. They typically require extensive computational resources and storage to process large amounts of data. An open source AI framework like llm-d allows developers to use techniques like distributed inference to support the increasing demands of sophisticated and larger reasoning models like LLMs.

Distributed inference and llm-d process AI workloads by distributing the labor of inference across a fleet of hardware with a modular architecture. This helps the model inference faster.

Privacy and security

LLMs require access to a lot of information, and sometimes that includes customer information or proprietary business data. This is something to be especially cautious about if the model is deployed or accessed by third-party providers.

Learn more about AI security

Accuracy and bias

If a deep learning model is trained on data that is statistically biased, or doesn’t provide an accurate representation of the population, the output can be flawed. Unfortunately, existing human bias is often transferred to artificial intelligence, thus creating risk for discriminatory algorithms and bias outputs. As organizations continue to leverage AI for improved productivity and performance, it’s critical that strategies are put in place to minimize bias. This begins with inclusive design processes and a more thoughtful consideration of representative diversity within the collected data.

Learn how AI can work for the enterprise

Advantages and limitations to LLMs

Large language models (LLMs) offer significant advantages in natural language understanding and generation, enabling versatile content creation, boosting developer productivity through code assistance, and performing tasks like summarization and translation. They excel at data analysis, provide scalable solutions, and enhance personalization. However, key limitations include the tendency for hallucinations and factual inaccuracies, a lack of real-time knowledge, and struggles with complex reasoning. They also present challenges regarding inherent biases, high computational costs, the "black box" problem (lack of transparency), and data privacy/security risks, alongside potential for non-deterministic behavior and over-reliance.

Governance and ethical considerations in the use of AI

Governance and ethical considerations present significant challenges for organizations utilizing LLMs, primarily due to their powerful capabilities and potential for harm. Ethically, a core concern is bias, as LLMs learn from vast datasets that can reflect and amplify societal prejudices, leading to discriminatory outputs. Hallucinations are another issue, where LLMs can convincingly present false information; ethical deployment demands mechanisms to minimize misinformation through disclaimers and factual accuracy checks, especially in critical fields like healthcare or finance.

Additional considerations include:

The "black box" nature of many LLMs hindering transparency and explainability
The risk of misuse and harmful content generation to produce toxic or illegal content
Concerns over intellectual property (IP) and copyright
Privacy and data leakage risks

AI governance

AI governance is crucial for the responsible development and oversight of LLMs, ensuring they align with organizational values and legal requirements. As AI regulations rapidly evolve, organizations must prioritize compliance with data privacy laws (like GDPR and HIPAA) and new AI-specific mandates, which often dictate strong risk management, data governance, human oversight, and robust cybersecurity for AI systems. Establishing clear accountability frameworks is also essential, defining who is responsible for LLM performance and impacts from development to deployment, with "human-in-the-loop" strategies vital for critical decisions.

If you want your LLMs to return outputs based on external data, you have several options:

Retrieval-Augmented Generation (RAG) is an architecture that adds to the knowledge base of an LLM by integrating data from your chosen knowledge sources. This can include data repositories, collections of text, or pre-existing documentation.
Agentic AI combines automation with the creative capabilities of an LLM. The way agents communicate with tools involves orchestration, with flows or graphs depending on the framework being used. This approach allows the LLM to “reason” and determine the best way to answer a question–such as deciding whether the query can be answered with available information or whether an external search is necessary.
Model Context Protocol (MCP) is a way for agentic AI to connect with external sources. MCP is an open source protocol that can supplement RAG and go a step further by enabling 2-way connection and communication between AI applications and external services.

Large language models (LLMs) and small language models (SLMs) are both types of artificial intelligence (AI) systems that are trained to interpret human language, including programming languages. The key differences between them are usually the size of the data sets they’re trained on, the different processes used to train them on those data sets, and the cost/benefit of getting started for different use cases.

A place to start with LLMs

If you’re ready to experiment with AI models, we provide support for LLMs, foundation models, generative models, and machine learning models.

A good place to start is Red Hat® Enterprise Linux® AI: our foundation model platform that helps develop, test, and run Granite family LLMs for enterprise applications. The AI platform gives developers quick access to a single server environment, complete with LLMs and AI tooling. It provides everything needed to tune models and build gen AI applications.

Explore Red Hat Enterprise Linux AI

Keep reading

What is explainable AI?

Explainable AI (XAI) techniques, applied during the machine learning (ML) lifecycle, make AI outputs more understandable and transparent to humans.

Agentic AI vs. generative AI

Agentic AI and generative AI explained: Learn how each works, their unique strengths, and how they can collaborate for smarter solutions.

How vLLM accelerates AI inference: 3 enterprise use cases

This article highlights 3 real-world examples of how well-known companies are successfully using vLLM.

What are large language models?

LMMs require lots of resources

LLMs and transformers

LLMs and deep learning

4 key considerations for implementing AI technology

Automation and efficiency

Generating insight

Creating a better customer experience

Cost

Speed

Privacy and security

Accuracy and bias

Advantages and limitations to LLMs

Governance and ethical considerations in the use of AI

AI governance

A place to start with LLMs

Artificial intelligence (AI)

The adaptable enterprise: Why AI readiness is disruption readiness

Keep reading

What is explainable AI?

Agentic AI vs. generative AI

How vLLM accelerates AI inference: 3 enterprise use cases

Artificial intelligence resources

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links