The AI Terms Glossary: 150+ Definitions From Agents to Zero-Shot

AI moves fast, and the language around it moves even faster. This glossary covers the terms you actually need. From agent orchestration and MCP to RAG, function calling, and evals, ordered by what matters most in practice.

Stan Sedberry

•March 19, 2026•25 min read•42 views

The AI Terms Glossary: 150+ Definitions From Agents to Zero-Shot

AI has developed its own language.

Not just the obvious terms like LLM or prompt, but a growing layer of insider vocabulary: agents, RAG, MCP, function calling, fine-tuning, evals, context engineering, and dozens more. The problem is that these terms are often used loosely, inconsistently, and sometimes flat-out wrong.

That creates a gap. People want to understand how modern AI actually works, but the language around it is fragmented, technical, and filled with jargon that gets repeated faster than it gets explained.

This glossary is meant to close that gap.

It starts with the terms that matter most in today’s AI products, especially agents, tools, orchestration, and retrieval, then moves into the deeper model, training, evaluation, and infrastructure concepts underneath them. The goal is simple: make the vocabulary of AI clear enough that you can actually follow the conversation.

1. Agents, Tools, and Orchestration

AI Agent

An AI agent is a system that uses a model plus tools, memory, and decision logic to pursue a goal and take actions. A chatbot that only answers questions is not really an agent. A system that can read your inbox, decide what matters, call tools, and follow through on a task is.

Agent Orchestration

Agent orchestration is the coordination layer that manages how models, tools, memory, and workflows work together. It is what keeps an agent from being just a smart text generator and turns it into a system that can actually complete tasks.

Agent Loop

The agent loop is the repeating cycle of observing the situation, reasoning about what to do next, taking an action, and updating context. This loop is the core operating pattern of most agent systems.

Tool Use

Tool use means the model can call external systems instead of relying only on its own internal knowledge. This could mean searching the web, querying a calendar, writing to a CRM, or sending an email.

Function Calling

Function calling is a structured way for a model to choose a tool and pass arguments to it in a format software can reliably execute. It is one of the key bridges between language models and real product functionality.

Tool Schema

A tool schema is the formal definition of what a tool does and what inputs it accepts. It tells the model how a tool should be called.

Tool Router

A tool router is the logic that decides which tool or toolset should be used for a task. In simple systems this is rule-based. In more advanced systems a model may help make the decision.

Planner

A planner is the part of an agent system that breaks a task into steps. If a user asks something complex, the planner decides how to approach it rather than trying to do everything in one jump.

Executor

The executor is the part of the system that carries out a planned step, often by calling tools, running prompts, or triggering sub-agents.

Controller

The controller governs the overall flow of the agent. It decides when to continue, when to stop, when to retry, and when to escalate to a human.

Workflow

A workflow is a defined sequence of steps used to complete a task. Some workflows are fixed. Others are dynamic and adapt based on what happens along the way.

Multi-Agent System

A multi-agent system is a setup where multiple specialized agents collaborate. One agent might handle planning, another retrieval, another calendar operations, and another messaging.

Sub-Agent

A sub-agent is a specialized agent invoked by another agent for a narrower responsibility. It is like delegating a task to a specialist.

Human in the Loop (HITL)

Human in the loop means a human reviews, approves, or corrects certain AI actions. This is especially important for sensitive tasks like sending emails, making purchases, or changing important records.

Computer Use

Computer use refers to an agent's ability to interact with graphical interfaces like a human, such as clicking buttons, filling forms, or navigating websites.

Toolformer Pattern

This is the general pattern where models learn or are designed to decide when and how to use tools during task execution.

Context Bridge

A context bridge is the mechanism that passes relevant information between the model and external systems. It helps the model stay aware of the state of the world outside its own context window.

Model Context Protocol (MCP)

MCP is an open protocol for connecting models to external tools and data sources in a standardized way. It is basically an interoperability layer for model-powered systems.

MCP Server

An MCP server exposes tools or resources over the MCP standard so a model-enabled client can use them.

MCP Client

An MCP client is the application layer that connects a model runtime to one or more MCP servers.

Agentic RAG

Agentic RAG is a retrieval-augmented system where the agent dynamically decides what information to retrieve, when to retrieve it, and how to use it across multiple steps.

Context Engineering

Context engineering is the design of how instructions, tools, memory, retrieved information, and system state are assembled so an agent can perform reliably. It is one of the most important ideas in modern AI product building.

Harness

In AI, a harness usually refers to an evaluation or testing framework used to run prompts, tasks, and measurements against models or agents in a repeatable way.

2. Prompting and Interaction Terms

Prompt

A prompt is the input given to a model. It can include instructions, examples, context, and formatting constraints.

Prompt Engineering

Prompt engineering is the practice of designing prompts to improve output quality, reliability, structure, or behavior.

System Prompt

The system prompt contains high-priority instructions that define the model's role, rules, and behavioral constraints.

User Prompt

The user prompt is the direct input from the user.

Assistant Message

The assistant message is the model's reply in a conversational exchange.

Prompt Template

A prompt template is a reusable prompt structure with variable placeholders.

Few-Shot Prompting

Few-shot prompting means giving the model a small number of examples to show it what kind of answer or behavior you want.

Zero-Shot Prompting

Zero-shot prompting means asking the model to perform a task without giving examples.

Chain-of-Thought Prompting

Chain-of-thought prompting encourages the model to reason through intermediate steps. In practice, teams often want the benefits of structured reasoning without exposing all of it directly to users.

Role Prompting

Role prompting assigns the model a role or perspective, such as "act like a recruiter" or "act like a product strategist."

Structured Prompting

Structured prompting means organizing a prompt into clear sections, such as task, context, constraints, tools, and output format.

Delimiter

A delimiter is a marker used to separate parts of a prompt, such as XML tags, headings, or clear section breaks.

Output Schema

An output schema is a predefined response structure the model is expected to follow.

Structured Outputs

Structured outputs are responses constrained to a defined schema so the result is predictable and machine-readable.

JSON Mode

JSON mode is a generation mode that pushes the model to return valid JSON.

Grounding

Grounding means tying the model's answer to trusted context, documents, or external sources rather than letting it guess freely.

Hallucination

A hallucination is when a model generates false, fabricated, or unsupported information as if it were true.

Prompt Injection

Prompt injection is when malicious or unintended instructions are hidden inside user input or retrieved content to manipulate the model's behavior.

Jailbreak

A jailbreak is a prompt designed to bypass safety policies or behavioral constraints.

Stop Sequence

A stop sequence is a token pattern that tells the model when to stop generating.

Streaming

Streaming means returning output incrementally as the model generates it instead of waiting for the full answer.

Temperature

Temperature controls how random or conservative the output is. Lower temperature is usually more predictable. Higher temperature is usually more varied.

Top-k Sampling

Top-k sampling limits the next-token choices to the top k most likely options before selecting one.

Top-p Sampling

Top-p sampling limits the next-token choices to the smallest set of tokens whose combined probability passes a threshold.

Greedy Decoding

Greedy decoding always picks the single most likely next token.

Decoding

Decoding is the process of turning model probability distributions into actual output tokens.

Determinism

Determinism refers to how consistently the same input produces the same output.

Stochastic Output

Stochastic output means the answer may vary across runs because of randomness in sampling.

Max Tokens

Max tokens is the limit on how many tokens the model can generate in a response.

Token Budget

The token budget is the total amount of input and output text that can fit within a request.

Context Window

The context window is the maximum amount of text a model can consider at once.

Context Truncation

Context truncation happens when part of the prompt or conversation is cut off to fit the context window.

Context Compaction

Context compaction means summarizing or compressing information so more relevant context fits into the model's window.

Get insights like this in your inbox

Join our newsletter for deep dives on AI, technology, and building the future. No spam, unsubscribe anytime.

3. Retrieval, Search, and Knowledge Systems

Retrieval-Augmented Generation (RAG)

RAG is an approach where a model retrieves external information and uses it to generate a more grounded answer.

Retriever

The retriever is the component that finds relevant chunks of information for a query.

Re-Ranker

A re-ranker reorders retrieved results to improve relevance before passing them to the model.

Knowledge Base

A knowledge base is the collection of indexed information an AI system can retrieve from.

Vector Database

A vector database stores embeddings and supports semantic search across them.

Vector Search

Vector search finds semantically similar items based on embeddings rather than exact keyword matches.

Embedding Model

An embedding model converts text, images, or other data into vectors that capture meaning.

Embedding

An embedding is the numeric vector representation of content.

Semantic Search

Semantic search finds results based on meaning rather than exact word overlap.

Similarity Score

A similarity score measures how close two vectors or pieces of content are.

Cosine Similarity

Cosine similarity is a common metric used to compare the directional similarity of vectors.

Nearest Neighbor Search

Nearest neighbor search finds the most similar vectors to a query vector.

Approximate Nearest Neighbor (ANN)

ANN is a faster form of nearest neighbor search that trades a bit of exactness for speed.

Hybrid Search

Hybrid search combines semantic search with keyword or lexical search.

Chunking

Chunking is the process of splitting documents into smaller pieces before embedding or retrieval.

Chunk Size

Chunk size is the amount of text included in each chunk.

Chunk Overlap

Chunk overlap repeats some text between adjacent chunks to preserve continuity.

Metadata Filtering

Metadata filtering narrows retrieval using attributes like date, author, source, or file type.

Ground Truth

Ground truth is the verified reference data used to judge whether retrieval or output is correct.

Context Hydration

Context hydration means loading the most relevant retrieved information into the prompt at the right time.

4. Core AI and ML Foundations

Artificial Intelligence (AI)

AI is the broad field focused on building systems that perform tasks associated with human intelligence.

Machine Learning (ML)

Machine learning is the part of AI where systems learn patterns from data instead of being explicitly programmed with every rule.

Deep Learning

Deep learning is a branch of machine learning that uses multi-layer neural networks.

Foundation Model

A foundation model is a large pretrained model that can be adapted to many tasks.

Large Language Model (LLM)

An LLM is a model trained on large text datasets to understand and generate language.

Generative AI

Generative AI refers to systems that create new content such as text, images, code, audio, or video.

Transformer

The transformer is the neural network architecture behind most modern language models.

Neural Network

A neural network is a model made of interconnected computational units that transform inputs into outputs.

Model

A model is a trained system that maps inputs to predictions or generated outputs.

Base Model

A base model is the pretrained model before additional alignment or instruction tuning.

Instruction-Tuned Model

An instruction-tuned model has been further trained to follow prompts and tasks more reliably.

Chat Model

A chat model is a model tuned for conversational interaction.

Multimodal Model

A multimodal model can work with more than one data type, such as text and images.

Autoregressive Model

An autoregressive model predicts the next token based on previous tokens.

Next-Token Prediction

Next-token prediction is the core training objective used by many language models.

Token

A token is a chunk of text processed by the model.

Tokenizer

The tokenizer converts raw text into tokens and back again.

Vocabulary

The vocabulary is the full set of tokens recognized by the tokenizer.

Attention

Attention is the mechanism that lets a model focus on relevant parts of the input.

Attention Mask

An attention mask controls which tokens the model is allowed to pay attention to.

Positional Encoding

Positional encoding gives the model information about token order.

Hidden State

A hidden state is an internal vector representation created as the model processes input.

Parameter

A parameter is a learned weight inside the model.

Logit

A logit is the raw score the model assigns to each possible next token before normalization.

Log Probability (logprob)

Log probability is the logged probability assigned to a token or sequence.

5. LLM and Transformer Basics

Training

Training is the process of adjusting model parameters using data.

Inference

Inference is the process of running a trained model on new input to get an output.

Dataset

A dataset is the collection of examples used for training, validation, or testing.

Feature

A feature is an input variable or representation used by a model.

Label

A label is the target output a supervised model is trained to predict.

Supervised Learning

Supervised learning uses labeled examples with known outputs.

Unsupervised Learning

Unsupervised learning finds structure in data without labels.

Semi-Supervised Learning

Semi-supervised learning combines labeled and unlabeled data.

Self-Supervised Learning

Self-supervised learning creates training signals from the data itself.

Reinforcement Learning (RL)

Reinforcement learning optimizes behavior based on reward from interacting with an environment.

Classification

Classification predicts a category.

Regression

Regression predicts a continuous value.

Clustering

Clustering groups similar data points without predefined labels.

Dimensionality Reduction

Dimensionality reduction compresses data into fewer variables while preserving useful structure.

Pretraining

Pretraining is the initial large-scale training stage on broad data.

Fine-Tuning

Fine-tuning is additional training on task-specific data.

Supervised Fine-Tuning (SFT)

SFT fine-tunes a model using prompt-response examples with known good answers.

Post-Training

Post-training refers to optimization steps after the main pretraining phase.

Transfer Learning

Transfer learning reuses knowledge learned in one domain or task for another.

Domain Adaptation

Domain adaptation improves model performance for a specific domain such as finance, law, or medicine.

Distillation

Distillation trains a smaller model to imitate a larger one.

Alignment

Alignment is the effort to make model behavior match intended goals, values, or rules.

RLHF

RLHF stands for Reinforcement Learning from Human Feedback, where human preference data is used to improve model behavior.

RLAIF

RLAIF stands for Reinforcement Learning from AI Feedback, where AI-generated judgments are used in the training loop.

Constitutional AI

Constitutional AI is an alignment approach where a model critiques and revises outputs according to explicit principles.

Preference Model

A preference model scores which outputs humans are more likely to prefer.

Reward Model

A reward model estimates how good a given output is for reinforcement learning.

Policy Model

The policy model is the model being optimized to choose actions or outputs.

Hyperparameter

A hyperparameter is a setting chosen by developers rather than learned during training.

Learning Rate

The learning rate controls how large each training update is.

Epoch

An epoch is one full pass through the training dataset.

Checkpoint

A checkpoint is a saved snapshot of model weights.

Generalization

Generalization is how well a model performs on unseen data.

Overfitting

Overfitting happens when a model learns the training data too specifically and performs poorly on new data.

Underfitting

Underfitting happens when a model fails to learn enough from the training data.

6. Training, Alignment, and Adaptation

PEFT

PEFT stands for Parameter-Efficient Fine-Tuning, a family of methods that adapts a model while changing only a small subset of its weights.

LoRA

LoRA stands for Low-Rank Adaptation, a PEFT method that learns small update matrices instead of changing the full model.

Adapter

An adapter is a lightweight trainable module inserted into a model for task-specific adaptation.

Quantization

Quantization reduces numeric precision to make models smaller and faster.

Post-Training Quantization (PTQ)

PTQ quantizes a trained model without fully retraining it.

Pruning

Pruning removes less important weights or connections to reduce size or cost.

Sparsity

Sparsity refers to a model where many weights or activations are zero or near zero.

Compression

Compression is the general process of reducing model or data size.

Model Parallelism

Model parallelism splits a model across multiple devices.

Data Parallelism

Data parallelism replicates the model across devices and splits the training data between them.

Mixed Precision

Mixed precision uses different numeric precisions together for better efficiency.

KV Cache

The KV cache stores attention states during generation so the model can respond faster.

Context Length Scaling

Context length scaling refers to techniques that help models handle longer inputs more effectively.

Inference Optimization

Inference optimization includes techniques that reduce runtime cost or latency.

Serving Stack

The serving stack is the software layer that hosts and delivers model inference.

Model Serving

Model serving means running the model in production so applications can call it reliably.

GPU

A GPU is a highly parallel processor commonly used for training and inference.

TPU

A TPU is Google's specialized hardware for machine learning workloads.

Accelerator

An accelerator is any hardware optimized for AI or high-performance compute.

Memory Footprint

Memory footprint is the amount of memory the system needs during execution.

Latency

Latency is how long it takes to get a response.

Throughput

Throughput is how much inference work can be completed in a given amount of time.

Batching

Batching means processing multiple requests together for efficiency.

Deployment

Deployment is the act of releasing a model or AI system into production.

Production

Production is the live environment where real users depend on the system.

Canary Release

A canary release rolls out a change to a small subset of traffic first.

A/B Test

An A/B test compares two versions of a model, workflow, or experience to see which performs better.

Rollback

A rollback reverts to an earlier version after a bad change.

Versioning

Versioning tracks different versions of models, prompts, workflows, or datasets.

Rate Limit

A rate limit caps how many requests can be made in a given time period.

Quota

A quota is the allowed amount of usage, such as tokens, requests, compute, or storage.

7. Efficient Tuning and Model Optimization

Evaluation (Eval)

An eval is a test used to measure whether a model or AI system performs as intended.

Benchmark

A benchmark is a standardized task set used to compare systems or models.

Offline Eval

An offline eval is run on stored datasets instead of live traffic.

Online Eval

An online eval is performed in production or with live users.

Human Eval

A human eval uses people to judge quality, correctness, or preference.

Model Grader

A model grader is a model used to score or critique another model's output.

Pass Rate

Pass rate is the percentage of cases that meet a defined success standard.

Precision

Precision measures how many predicted positives or retrieved results are actually correct.

Recall

Recall measures how many of the truly relevant items were successfully found.

Accuracy

Accuracy measures the fraction of overall predictions that are correct.

F1 Score

F1 score balances precision and recall into a single metric.

Reliability

Reliability is how consistently a system behaves as expected.

Regression

A regression is a performance drop introduced by a change.

Failure Mode

A failure mode is a repeatable way a system breaks.

Trace

A trace is a record of the steps, tool calls, and decisions taken during execution.

Observability

Observability is the ability to inspect system behavior through logs, metrics, traces, and events.

Guardrail

A guardrail is a control layer that constrains outputs, tool use, or behavior.

Fallback

A fallback is a backup behavior used when the preferred path fails.

Latency SLA

A latency SLA is the target or guarantee for response time.

8. Evaluation, Reliability, and Observability

Safety Policy

A safety policy defines what the system is allowed or not allowed to generate or do.

Moderation

Moderation is the detection and handling of harmful or policy-violating content.

Content Filter

A content filter screens unsafe or disallowed inputs or outputs.

Red Teaming

Red teaming is adversarial testing designed to expose weaknesses or abuse paths.

Prompt Security

Prompt security is the practice of defending prompts, tools, and context against attacks such as injection or data exfiltration.

Data Leakage

Data leakage is the unintended exposure of sensitive or private information.

PII

PII stands for Personally Identifiable Information, meaning data that can identify a person.

Privacy-Preserving AI

Privacy-preserving AI uses methods that reduce exposure or misuse of sensitive data.

Access Control

Access control governs who or what can use a model, tool, or dataset.

Policy Enforcement

Policy enforcement automatically applies rules during model operation.

Alignment Risk

Alignment risk is the risk that a model optimizes for something other than the intended objective or values.

Governance

Governance is the broader system of policies, controls, accountability, and oversight used to manage AI responsibly.

9. Safety, Governance, and Deployment

Artificial Neural Network

Another name for a neural network, especially in more formal ML language.

Foundation Model Stack

The set of components around a foundation model, including serving, safety, orchestration, retrieval, and evaluation layers.

Model Stack

The broader application stack that includes the model plus all the systems around it.

Feature Engineering

Feature engineering is the process of selecting or transforming inputs for classical machine learning models.

Training Split

The training split is the portion of the data used to fit the model.

Validation Split

The validation split is used during development to tune the model and compare options.

Test Split

The test split is used at the end to estimate real-world performance.

Loss Function

The loss function measures how wrong the model's predictions are during training.

Optimization

Optimization is the process of updating model parameters to reduce loss.

Gradient

A gradient tells the model how to adjust its parameters to reduce error.

Backpropagation

Backpropagation is the algorithm used to send error information backward through a neural network during training.

Weights

Weights are the learned numeric values that shape model behavior.

Bias Term

A bias term is an additional learned value that helps the model shift outputs.

Activation Function

An activation function determines how a neural network node transforms its input.

Regularization

Regularization is a technique used to reduce overfitting.

Data Drift

Data drift happens when the input data in production changes from what the model saw during training.

Concept Drift

Concept drift happens when the relationship between inputs and outputs changes over time.

Distribution Shift

Distribution shift is the general change between training-time and real-world data distributions.

Calibration

Calibration is how well a model's confidence matches actual correctness.

Ensemble

An ensemble combines multiple models or methods to improve performance or robustness.

Baseline

A baseline is the simple reference system you compare improvements against.

Oracle

In evaluation, an oracle is the ideal or best-possible reference outcome.

Latent Space

Latent space is the internal representational space where models encode patterns and meaning.

Representation Learning

Representation learning is the process of learning useful internal encodings of data automatically.

Foundation Model Adaptation

Foundation model adaptation is the broader set of methods used to tailor a base model to a specific use case.

Why this glossary matters

Most people think AI is mainly about the model.

That is outdated.

In practice, the most important part of modern AI products is not just the model. It is the system around the model: the orchestration layer, the retrieval layer, the tool layer, the evaluation layer, and the safety layer.

That is why terms like agent orchestration, MCP, RAG, function calling, context engineering, and evals matter so much right now. They are the language of turning raw model capability into something useful, reliable, and commercially valuable.

If you understand those terms first, the rest of the field starts to make a lot more sense.

Get insights like this in your inbox

Join our newsletter for deep dives on AI, technology, and building the future. No spam, unsubscribe anytime.