What is generative AI?

Published May 12, 2026•9-minute read

Generative AI is a kind of artificial intelligence technology that relies on deep learning models to create new content.

Generative AI applications can produce writing, pictures, code, and more. This is achieved during AI inference, the operational phase of AI, where the model is able to apply learning from training and apply it to real-world situations. Common use cases for generative AI include chatbots, image creation and editing, software code assistance, and scientific research.

Why you should care about AI inference

People are putting generative AI to use in professional settings to quickly visualize creative ideas and efficiently handle boring and time-consuming tasks. In areas like medical research and product design, generative AI can help professionals do their jobs better and significantly faster. However, generative AI also introduces new risks which users should understand and work to mitigate.

Explore Red Hat AI

If you’ve enjoyed a surprisingly coherent conversation with ChatGPT, or watched Midjourney render a realistic picture from a description you just made up, you know generative AI can feel like magic. What makes this sorcery possible?

Beneath the AI apps you use, deep learning models are recreating patterns they’ve learned from a vast amount of training data. Then they work within human-constructed parameters to make something new based on what they’ve learned.

Deep learning models do not store a copy of their training data, but rather an encoded version of it, with similar data points arranged close together. This representation can then be decoded to construct new, original data with similar characteristics.

Building a custom generative AI app requires a model, as well as adjustments such as human-supervised fine-tuning or a layer of data specific to a use case.

Most of today’s popular generative AI apps respond to user prompts. Describe what you want in natural language and the app returns whatever you asked for—like magic.

Learn how AI can work for the enterprise

Generative AI’s breakthroughs in writing and images have captured news headlines and people’s imaginations. Here are a few of the early use cases for this rapidly advancing technology.

Writing. Large language models (LLMs) are a type of generative AI used to process and generate text. LLMs are trained to learn the statistical relationship between words, grammar, and context to produce an output that mimics human writing.

Image generation. Generative AI image tools can synthesize high-quality pictures in response to prompts for countless subjects and styles. Some AI tools, such as Generative Fill in Adobe Photoshop, can add new elements to existing works.

Speech and music generation. Using written text and sample audio of a person’s voice, AI vocal tools can create narration or singing that mimic the sounds of real humans. Other tools can create artificial music from prompts or samples.

Video generation. New services are experimenting with various generative AI techniques to create motion graphics. For example, some are able to match audio to a still image and make a subject’s mouth and facial expression appear to talk.

Code generation and completion. Some generative AI tools can take a written prompt and output computer code on request to assist software developers.

Data augmentation. Generative AI can create a large amount of synthetic data when using real data is impossible or not preferable. For example, synthetic data can be useful if you want to train a model to understand healthcare data without including any personally identifiable information. It can also be used to stretch a small or incomplete data set into a larger set of synthetic data for training or testing purposes.

Agentic AI. Agentic AI and generative AI work collaboratively. Agentic AI systems may use gen AI to converse with a user, independently create content as part of a greater goal, or communicate with external tools. In other words, gen AI is a critical part of agentic AI's "cognitive process."

Explore generative AI use cases

Deep learning, which makes generative AI possible, is a machine learning technique for analyzing and interpreting large amounts of data. Also known as deep neural learning or deep neural networking, this process teaches computers to learn through observation, imitating the way humans gain knowledge. Deep learning is a critical concept in applying computers to the problem of understanding human language, or natural language processing (NLP).

It may help to think of deep learning as a type of flow chart, starting with an input layer and ending with an output layer. Sandwiched between these two layers are the “hidden layers” which process information at different levels, adjusting and adapting their behavior as they continuously receive new data. Deep learning models can have hundreds of hidden layers, each of which plays a part in discovering relationships and patterns within the data set.

Starting with the input layer, which is composed of several nodes, data is introduced to the model and categorized accordingly before it’s moved forward to the next layer. The path that the data takes through each layer is based upon the calculations set in place for each node. Eventually, the data moves through each layer, picking up observations along the way that ultimately create the output, or final analysis, of the data.

One technology that has sped the advancement of deep learning is the GPU, or graphics processing unit. GPUs were originally architected to accelerate the rendering of video game graphics. But as an efficient way to perform calculations in parallel, GPUs have proven to be well suited for deep learning workloads.

Breakthroughs in the size and speed of deep learning models led directly to the current wave of breakthrough generative AI apps.

A neural network is a way of processing information that mimics biological neural systems like the connections in our own brains. It’s how AI can forge connections among seemingly unrelated sets of information. The concept of a neural network is closely related to deep learning.

How does a deep learning model use the neural network concept to connect data points? Start with how the human brain works. Our brains contain many interconnected neurons, which act as information messengers when the brain is processing incoming data. These neurons use electrical impulses and chemical signals to communicate with one another and transmit information between different areas of the brain.

An artificial neural network (ANN) is based on this biological phenomenon, but formed by artificial neurons that are made from software modules called nodes. These nodes use mathematical calculations (instead of chemical signals as in the brain) to communicate and transmit information. This simulated neural network (SNN) processes data by clustering data points and making predictions.

Different neural network techniques are suited for different kinds of data. A recurrent neural network (RNN) is a model that uses sequential data, such as through learning words in order as a way to process language.

Building on the idea of the RNN, transformers are a specific kind of neural network architecture that can process language faster. Transformers learn the relationships of words in a sentence, which is a more efficient process compared to RNNs which ingest each word in sequential order.

Creating a neural network requires software frameworks capable of executing massive amounts of matrix math. Open-source libraries like PyTorch serve as the primary engine for this process. PyTorch allows developers to piece together code to create an AI “brain” that can learn to identify patterns in data. In more technical terms, PyTorch lets developers use modular layers to build and train complex neural networks for advanced machine learning.

A foundation model is a deep learning model trained on a huge amount of generic data. Once trained, foundation models can be refined for specialized use cases. As the name suggests, these models can form the foundation for many different applications.

Creating a new foundation model today is a substantial project. The process requires enormous amounts of training data, typically collected from scrapes of the internet, digital libraries of books, databases of scholarly articles, stock image collections, or other large data sets. Training a model on this much data takes immense infrastructure, including building or leasing a cloud of GPUs. The largest foundational models to date are reported to have cost hundreds of millions of dollars to build.

Because of the high effort required to train a foundation model from scratch, it’s common to rely on models trained by third parties, then apply customization. There are multiple techniques for customizing a foundation model. These can include fine-tuning, prompt-tuning, and adding customer-specific or domain-specific data. For example, IBM's Granite family foundation models are trained on curated data and then provide transparency into the data that’s used for training.

Fine-tuning is the process of training a pretrained model further with a more tailored data set so it can effectively perform unique tasks. This additional training data modifies the model’s parameters and creates a new version that replaces the original model.

Fine-tuning typically requires significantly less data and time than the initial training. However, the process of traditional fine-tuning is still compute-intensive.

Parameter-efficient fine-tuning (PEFT) is a set of techniques that has adjusts only a portion of parameters within an LLM to save resources. You can think of it as an evolution to traditional fine-tuning.

LoRA (Low-Rank adaptation) and QLoRA (quantized Low-Rank adaptation) are both PEFT techniques for training AI models. LoRA and QLoRA both help fine-tune LLMs more efficiently, but differ in how they manipulate the model and utilize storage to reach intended results.

LoRA vs QLoRA explained

Retrieval-augmented generation (RAG) is a method for getting better answers from a generative AI application by linking an LLM to an external resource.

Implementing RAG architecture into an LLM-based question-answering system (like a chatbot) provides a line of communication between an LLM and your chosen additional knowledge sources. This allows the LLM to cross-reference and supplement its internal knowledge, providing a more reliable and accurate output for the user making a query.

Learn more about RAG

As generative AI models become more sophisticated, they grow. Some LLMs can contain hundreds of billions of parameters. Parameters shape an LLM’s understanding of language, and the more parameters a model has, the more complex the tasks it can perform—with greater accuracy. However, more parameters require more processing power.

Rather than adding more GPUs (which can be costly), you can use techniques like vLLM and llm-d to make processing more efficient on your existing hardware.

vLLM is an inference server that speeds up the output of gen AI applications by making better use of the GPU memory.
llm-d is a Kubernetes-native, open source framework that speeds up distributed inference at scale. Both are designed to solve the challenge of serving large generative AI models by focusing on optimizing performance.

Find benefits and use cases of fast, efficient inference

Having come a long way in a short time, generative AI technology has attracted more than its share of hype, both positive and negative. The benefits and downsides of this technology are still emerging. Here we provide a brief look at some prominent concerns about generative AI.

Enabling harm. There are immediate and obvious risks of bad actors using generative AI tools for malicious goals, such as large-scale disinformation campaigns on social media, or nonconsensual deepfake images that target real people.

Reinforcing harmful societal bias. Generative AI tools have been shown to regurgitate the human biases that are present in training data, including harmful stereotypes and hate speech.

Supplying wrong information. Generative AI tools can produce made-up and plainly wrong information and scenes, sometimes called “hallucinations.” Some generated content mistakes are harmless, such as a nonsense response to a chat question, or an image of a human hand with too many fingers. But there have been serious cases of AI gone wrong, such as a chatbot that gave harmful advice to people with questions about eating disorders.

Security and legal risks. Generative AI systems can pose security risks, including from users entering sensitive information into apps that were not designed to be secure. Generative AI responses may introduce legal risks by reproducing copyrighted content or appropriating a real person’s voice or identity without their consent. Additionally, some generative AI tools may have usage restrictions.

Unexplainable outputs. Sometimes, an AI model is too complex for a human to understand or interpret–this is called a black-box model. Black-box models can create harmful consequences when used for high-stakes decision making, especially in high-risk industries like healthcare, transportation, security, military, legal, aerospace, criminal justice, or finance. To help solve this, explainable AI (XAI) techniques can be applied throughout the machine learning lifecycle to make outputs more transparent and understandable to humans.

Learn more about explainable AI

Red Hat® AI is built for fast, flexible, and efficient inference through its vLLM-powered server. It reliably connects models to your data to unify the customization and development of specialized agents on a single platform. Built on an open source foundation, our products give you full control of AI workflows from end-to-end at any scale.

The Red Hat AI portfolio includes Red Hat AI Inference, an inference stack that provides the operational control to run any model on any accelerator across the hybrid cloud. Get fast, efficient, and cost-effective inference at scale.

Explore Red Hat AI

Keep reading

AIOps explained

AIOps (AI for IT operations) is an approach to automating IT operations with machine learning and other advanced AI techniques.

What is PyTorch?

PyTorch is an open source software library that lets developers piece together code to create an AI “brain” that can learn to identify patterns in data.

What is Docling?

Docling is an open source project and tool that converts documents into structured data a large language model (LLM) can use and learn from.

What is generative AI?

Red Hat resources

The official Red Hat blog

The adaptable enterprise: Why AI readiness is disruption readiness

Keep reading

AIOps explained

What is PyTorch?

What is Docling?

Artificial intelligence resources

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links