The Complete AI Agents Glossary: Essential Terminology and Concepts
The AI agent ecosystem has developed its own vocabulary at a rapid pace. Terms like “ReAct,” “chain-of-thought,” and “tool calling” are tossed around in documentation and discussions, often without clear definitions. This glossary provides a comprehensive reference for the terminology you’ll encounter when building, deploying, and reasoning about AI agents.
Whether you’re new to agent development or a seasoned practitioner looking for precise definitions, this guide organizes concepts from foundational to advanced, with practical examples and connections between related terms.
Core Concepts
Agent
An AI system that can perceive its environment, make decisions, and take actions to achieve goals. Unlike simple chatbots that only respond to queries, agents can use tools, maintain state across interactions, and execute multi-step plans autonomously. Modern AI agents typically use large language models (LLMs) as their reasoning engine.
Example: An agent that researches a topic by searching the web, reading documents, synthesizing information, and producing a summary report.
Related terms: Autonomous agent, Agentic system, AI assistant
Agentic Workflow
A task execution pattern where an LLM operates with a degree of autonomy, making decisions about which actions to take and in what order. Agentic workflows contrast with deterministic pipelines where every step is predefined. The agent determines the path through the workflow based on intermediate results.
Example: A coding agent that reads an error message, hypothesizes a fix, implements it, runs tests, and iterates until tests pass—with no human specifying each step.
Autonomous Agent
An agent capable of operating independently over extended periods without human intervention. Fully autonomous agents can set their own subgoals, recover from errors, and adapt to unexpected situations. Most current agents are semi-autonomous, requiring human oversight for critical decisions.
Example: An autonomous research agent that continuously monitors news sources, identifies relevant stories, and generates reports without human prompts.
Large Language Model (LLM)
A neural network trained on vast text corpora that can understand and generate human language. LLMs serve as the “brain” of most modern AI agents, providing reasoning capabilities, world knowledge, and natural language understanding. Examples include GPT-4, Claude, Llama, and Gemini.
Context Window
The maximum amount of text (measured in tokens) that an LLM can process in a single request. Context windows range from 4K tokens to over 200K tokens depending on the model. The context window constrains how much conversation history, retrieved documents, and instructions can be included in each agent interaction.
Example: With a 100K token context window, an agent might include the last 50 conversations turns, 10 retrieved documents, system instructions, and the current user query.
Reasoning and Planning
Chain-of-Thought (CoT)
A prompting technique where the model is encouraged to show its reasoning step-by-step before arriving at a final answer. CoT improves performance on complex reasoning tasks by making intermediate steps explicit and allowing error correction during reasoning.
Example prompt: “Think through this step-by-step: What’s the best way to sort a list of 1 million integers?”
ReAct (Reasoning + Acting)
An agent architecture that interleaves reasoning traces with action execution. The agent thinks about what to do (Reason), takes an action (Act), observes the result, and repeats. ReAct patterns make agent behavior more interpretable and often more effective than pure action sequences.
Pattern: Thought → Action → Observation → Thought → Action → …
Planning
The process of breaking down a complex goal into a sequence of achievable subgoals and actions. Planning can be done upfront (creating a complete plan before execution) or interleaved with execution (planning the next step based on current state). Advanced agents use dynamic replanning to adapt when initial plans fail.
Reflection
A technique where agents analyze their own outputs, reasoning, or actions to identify errors and improve. Reflection enables self-correction without external feedback. Some architectures use separate “critic” models to evaluate the primary agent’s work.
Example: An agent generates code, then reviews it for bugs and security issues before presenting the final version.
Tree of Thoughts (ToT)
An extension of chain-of-thought where the agent explores multiple reasoning paths simultaneously, evaluating and pruning branches to find optimal solutions. ToT is particularly useful for problems with multiple valid approaches or high uncertainty.
Self-Consistency
A technique where the model generates multiple reasoning paths and selects the answer that appears most frequently. Self-consistency improves reliability by reducing dependence on any single reasoning chain.
The mechanism by which agents invoke external functions or APIs to extend their capabilities beyond text generation. The LLM generates structured output specifying which tool to call and with what parameters. The framework executes the tool and returns results to the model. Learn how to build your own in Creating Custom Tools for LangChain Agents.
Example tools: Web search, code execution, database queries, file operations, API calls
A discrete capability made available to an agent, typically defined with a name, description, and parameter schema. Tools bridge the gap between the LLM’s text-based reasoning and real-world actions. Well-designed tools have clear purposes and atomic functionality.
# Example tool definition
{
"name": "search_web",
"description": "Search the web for current information",
"parameters": {
"query": {"type": "string", "description": "Search query"}
}
}
Action Space
The complete set of actions available to an agent, including all tools and their possible parameters. A larger action space provides more capability but increases the challenge of selecting appropriate actions. Constraining the action space can improve agent reliability.
Grounding
Connecting agent outputs to verifiable external sources. Grounded agents cite their sources, retrieve factual information rather than relying solely on training data, and can verify claims against authoritative sources. Grounding reduces hallucination and increases trustworthiness.
Observation
The result returned to an agent after executing an action. Observations inform the next reasoning step and may include structured data, text responses, error messages, or environmental state changes. The quality of observations significantly impacts agent decision-making.
Memory and State
Short-Term Memory (Working Memory)
Information held in the current context window, including recent conversation turns, active task state, and retrieved documents. Short-term memory is immediately accessible but limited by context window size. Content management strategies (summarization, pruning) help manage limited space. For a deeper exploration, see Understanding Agent Memory Systems.
Long-Term Memory
Persistent storage of information that survives beyond individual conversations. Long-term memory typically uses vector databases for semantic retrieval, allowing agents to recall relevant facts, user preferences, and domain knowledge across sessions.
Episodic Memory
Storage of complete experiences or interaction episodes, including the situation, actions taken, and outcomes. Episodic memory enables agents to learn from past successes and failures by recalling similar situations.
Vector Database (Vector Store)
A database optimized for storing and retrieving high-dimensional vectors (embeddings). Vector databases power semantic search, enabling retrieval based on meaning rather than keyword matching. Common vector databases include Pinecone, Weaviate, Chroma, and Qdrant.
Embedding
A numerical representation of text (or other data) as a vector in high-dimensional space. Semantically similar content produces similar embeddings, enabling similarity search. Embedding models (like OpenAI’s text-embedding-3 or sentence-transformers) convert text to embeddings.
Checkpointing
Saving agent state at specific points during execution, enabling resumption after interruption, debugging of past states, and human review of intermediate steps. Checkpointing is essential for reliable production agent systems.
Multi-Agent Systems
Multi-Agent System
An architecture where multiple specialized agents collaborate to solve complex problems. Agents may have different roles, tools, or expertise. Coordination patterns include hierarchical control, peer-to-peer communication, and shared workspaces. For implementation patterns, see our guide on Multi-Agent Collaboration Patterns.
Agent Orchestration
The coordination layer that manages multiple agents, routing tasks, aggregating results, and handling dependencies between agent activities. Orchestration can be centralized (a manager agent coordinates others) or decentralized (agents negotiate directly).
Supervisor Agent
An agent responsible for delegating tasks to worker agents, monitoring progress, and synthesizing results. Supervisor patterns enable hierarchical decomposition of complex tasks.
Swarm Intelligence
An approach where many simple agents with limited individual capability produce sophisticated collective behavior through local interactions. Inspired by biological systems like ant colonies. Swarm approaches can be more robust and scalable than single-agent systems.
Agent Communication Protocol
The conventions by which agents exchange information, make requests, and share results. Protocols may be natural language, structured messages, or specialized formats. Clear protocols are essential for reliable multi-agent coordination.
Retrieval and Knowledge
RAG (Retrieval-Augmented Generation)
A pattern that enhances LLM responses by retrieving relevant documents before generation. RAG reduces hallucination, provides access to current information, and grounds responses in authoritative sources. The retrieval step typically uses vector similarity search. For a hands-on implementation guide, see Building a RAG Agent with LangChain.
Flow: Query → Retrieve relevant documents → Augment prompt with documents → Generate response
Knowledge Base
A structured or semi-structured repository of information that agents can query. Knowledge bases may include documentation, FAQs, product catalogs, or domain-specific data. Unlike training data, knowledge bases can be updated without retraining the model.
Semantic Search
Search based on meaning rather than exact keyword matching. Semantic search uses embeddings to find content that is conceptually similar to the query, even if different words are used. Essential for effective RAG and long-term memory retrieval.
Chunking
The process of dividing documents into smaller segments for embedding and retrieval. Chunk size affects retrieval precision (smaller chunks) versus context (larger chunks). Common strategies include fixed-size chunks, paragraph-based splitting, and semantic chunking.
Reranking
A second-stage retrieval process that reorders initial search results using a more sophisticated model. Reranking improves retrieval quality by considering query-document relevance more carefully than initial embedding similarity.
Prompting and Instructions
System Prompt
Instructions provided to the model that establish its role, capabilities, constraints, and behavior guidelines. System prompts persist across conversation turns and shape the agent’s overall personality and approach.
Few-Shot Prompting
Including examples of desired input-output pairs in the prompt to guide model behavior. Few-shot learning enables rapid task adaptation without fine-tuning. The examples demonstrate the expected format and reasoning style.
Zero-Shot Prompting
Asking the model to perform a task without providing examples, relying on instructions and the model’s training to produce appropriate outputs.
Prompt Engineering
The craft of designing prompts that elicit desired behavior from language models. Prompt engineering encompasses instruction clarity, example selection, format specification, and constraint communication.
Structured Output
Model responses in a specific format (JSON, XML, etc.) that can be reliably parsed by code. Structured outputs enable programmatic processing of model responses and tool calling.
Evaluation and Safety
Hallucination
When a model generates plausible-sounding but factually incorrect information. Hallucinations are a persistent challenge for LLM-based agents. Mitigation strategies include RAG, grounding, and verification steps.
Guardrails
Constraints and checks that prevent agents from taking harmful, unauthorized, or undesirable actions. Guardrails may include content filters, action allowlists, rate limits, and human approval requirements.
Human-in-the-Loop (HITL)
An architecture where humans review and approve certain agent decisions before execution. HITL provides safety and control at the cost of latency and scalability. Common for high-stakes actions like financial transactions or code deployment.
Agent Evaluation
The process of measuring agent performance on defined tasks or benchmarks. Evaluation may assess accuracy, efficiency, safety, and user satisfaction. Robust evaluation is challenging due to the open-ended nature of agent tasks.
Red Teaming
Adversarial testing where evaluators attempt to elicit harmful or undesired behavior from agents. Red teaming identifies vulnerabilities before deployment.
Frameworks and Infrastructure
Agent Framework
A software library providing building blocks for agent development, including LLM integration, tool definitions, memory management, and orchestration. Major frameworks include LangChain, LlamaIndex, AutoGen, and CrewAI. For detailed comparisons and guidance on choosing between them, see our Complete Guide to AI Agent Frameworks.
Observability
The ability to understand agent behavior through logging, tracing, and monitoring. Observability tools capture decision-making processes, tool calls, and performance metrics. Essential for debugging and improving production agents.
Tracing
Recording the sequence of operations, decisions, and data flows during agent execution. Traces enable replay, debugging, and performance analysis.
LangSmith / LangFuse / Phoenix
Observability platforms specifically designed for LLM applications. These tools provide tracing, evaluation, and monitoring capabilities tailored to agent workflows.
Advanced Patterns
Constitutional AI
An approach where AI systems are trained to follow explicit principles or “constitution” that guides behavior. Constitutional AI aims to create agents that can explain their reasoning in terms of underlying principles.
Mixture of Experts (MoE)
An architecture using multiple specialized sub-models where a gating mechanism routes inputs to the most relevant experts. MoE can provide efficiency by activating only relevant model capacity.
Fine-Tuning
Additional training of a base model on specific data to improve performance on particular tasks. Fine-tuned models may be more effective and efficient for specialized agent applications.
RLHF (Reinforcement Learning from Human Feedback)
A training technique where human preferences guide model improvement. RLHF helps align model behavior with human values and expectations.
Prompt Chaining
Connecting multiple LLM calls where the output of one call becomes input to the next. Prompt chains decompose complex tasks into manageable steps. Contrast with agents, which dynamically determine call sequences.
Emerging Concepts
Model Context Protocol (MCP)
A standardized protocol for connecting LLMs to external tools and data sources. MCP aims to provide a universal interface for agent-tool integration.
Computer Use
Agent capability to interact with computer interfaces (clicking, typing, navigating) as a human would. Computer use enables agents to operate existing software without API integration.
Cognitive Architecture
The overall design of an agent’s mental processes, including perception, memory, reasoning, and action selection. Cognitive architectures draw inspiration from human cognition to create more capable and robust agents.
Conclusion
This glossary provides a foundation for understanding AI agent terminology, but the field evolves rapidly. New patterns, architectures, and concepts emerge regularly as researchers and practitioners push the boundaries of what agents can accomplish.
Use this reference as a starting point, and keep in mind that precise definitions may vary between frameworks and communities. The most important skill is understanding the underlying concepts—specific terminology will continue to shift as the field matures.
For hands-on exploration of these concepts, check out our Complete Guide to AI Agent Frameworks or dive into our tutorial on Building Your First AI Agent with LangGraph.