Intro
We’ve all heard the script and the hype. But I wanted to understand what it actually is and how it’s structured.
Because to me it sounds like program that delegates logic flow to a LLM and then interacts with APIs.
What is agentic AI?
Agentic AI is a system that uses an LLM plus tools, memory, and a control loop to pursue goals over multiple steps with some “autonomy”, rather than just answering a single prompt.
// Emphasis on the control loop, without this it becomes a passive system awaiting instructions.
The LLM acts as a controller (planner/decider) wrapped in a runtime that can call external tools/APIs, maintain state, and iteratively plan–act–observe–adjust until a goal is achieved.
Core properties
-
Goal-driven: Operates on tasks like “draft a report,” “resolve this issue,” “file these tickets,” not just “answer this question.”
-
Multi-step: Decomposes goals into sub-tasks and executes them over several tool calls / iterations.
-
Tool-using: Calls APIs, databases, scripts, and services to perform actual work.
-
Stateful: Maintains short-term state and often long-term memory to inform decisions.
-
Autonomous within a boundary: Makes its own next-step decisions inside policies, constraints, and guardrails.
High-level mental model
-
Non-agentic LLM app: model answers a prompt, maybe with a tool call, then stops.
-
Agentic AI system: LLM-powered software worker that uses APIs and tools as, running in a loop until the job is done.
Typical architecture
Core loop
The loop is what gives the program the “autonomy”. Goes something like this.
- Receive or update the current goal and context.
- Ask the LLM: “Given this goal and current state, what should the agent do next?”
- Interpret the LLM output:
- If it is a tool call, execute the tool and capture the result.
- If it is a direct answer / finalization, return or store the result and possibly terminate.
- Append the tool result to the context (state, memory).
- Repeat until a stop condition is met (goal reached, time/step limit, external interruption, failure policy).
Take a look at this Pseudocode:
while (!done) {
const state = loadState(taskId);
const llmInput = buildPrompt(goal, state, toolsSpec, policies);
const llmOutput = callLLM(llmInput);
const action = parseAction(llmOutput); // e.g., {"type": "tool", "name": "searchTickets", "args": {...}}
if (action.type === "tool") {
const result = callTool(action.name, action.args);
appendToState(taskId, { action, result });
} else if (action.type === "finish") {
saveResult(taskId, action.output);
done = true;
} else {
handlePolicyViolationOrError(action);
}
}
Key components
LLM “brain”
Primary responsibilities:
-
Interpret goals and user input.
-
Plan multi-step strategies (high-level and/or low-level).
-
Select tools to call and synthesize tool arguments.
-
Interpret tool outputs and decide the next step.
-
Produce final outputs.
Implementation patterns:
-
System prompt encoding role, objectives, constraints, and tool usage patterns.
-
Output formatting enforced by JSON / function-calling / tool-calling schemas.
-
Sometimes multiple LLMs or multiple roles (planner, executor, critic).
Tooling / APIs
Tools are API capabilities the agent can invoke to act on the world or fetch information, for example:
- Data access: SQL/NoSQL DB queries, search APIs, vector stores, logs.
- Business systems: CRM, ticketing, HR, ERP, billing APIs.
- Infrastructure: Kubernetes, cloud provider SDKs, CI/CD, monitoring.
- General utilities: web search, HTTP client, filesystem, code execution, email, chat.
Each tool typically has:
- Name (unique identifier).
- Input schema (JSON / typed arguments).
- Output schema (structured result).
- Description (used in tool selection prompt).
- Policy / permission metadata.
Example tool spec (TypeScript-ish):
const tools = [
{
name: "getUserTickets",
description: "Fetches open support tickets for a given user.",
inputSchema: {
type: "object",
properties: {
userId: { type: "string" },
status: { type: "string", enum: ["open", "pending", "closed"] }
},
required: ["userId"]
}
},
{
name: "postSlackMessage",
description: "Send a Slack message to a channel or user.",
inputSchema: {
type: "object",
properties: {
channel: { type: "string" },
text: { type: "string" }
},
required: ["channel", "text"]
}
}
];The LLM is prompted with these specs and asked to select and call tools by emitting structured JSON.
Memory and state
Common tiers:
- Short-term task state:
- The current conversation, actions taken in this run, intermediate results.
- Stored in a task/session record (DB, cache, or in-memory if short-lived).
- Long-term knowledge and history:
- User preferences, past tasks, learned patterns, domain documents.
- Often stored in relational DBs, object stores, and/or vector DBs.
- Episodic memory patterns:
- After a task completes, the agent writes a summary of what happened (goal, approach, pitfalls, final result) that the agent can later retrieve for similar tasks.
Example memory operations:
-
retrieveRelevantDocs(goal, currentState) → list<doc>(via embeddings/search). -
saveEpisodeSummary(taskId, summaryEmbedding, metadata)for future retrieval.
Planner and executor separation
A common architectural refinement is to split:
- Planner:
- Produces a high-level plan and sometimes updates it as execution progresses.
- Example output: a numbered list of steps, with which tools to use for each.
- Executor:
- Executes one step at a time, calling tools and dealing with low-level detail.
Example pattern:
- Executes one step at a time, calling tools and dealing with low-level detail.
[Planner LLM]
- Given goal + capabilities, output:
1. Step-by-step plan.
2. For each step, suggested tool and success criteria.
[Executor LLM]
- Take one step description + tools, perform it, record outcome.
- Ask: is this step complete? if yes, move to next; if not, refine/retry.
[Planner LLM]
- Given goal + capabilities, output:
1. Step-by-step plan.
2. For each step, suggested tool and success criteria.
[Executor LLM]
- Take one step description + tools, perform it, record outcome.
- Ask: is this step complete? if yes, move to next; if not, refine/retry.
This separation can improve reliability and troubleshooting.
Policies and guardrails
Policies constrain what the agent can do:
- Capability policies:
- Which tools can be used, with which arguments, under what conditions.
- Example: “Never call
deleteUserwithout an approval token.”
// sheer number of horror stories don't paint a good picture...
- Safety/ethics policies:
- Content filtering, privacy rules, compliance constraints.
- Operational limits:
- Max steps per task, timeouts, budget limits, rate limits.
Enforcement mechanisms:
- Pre-checks on tool calls (schema validation, allowlists).
- Post-checks on LLM outputs (e.g., classification, regex/JSON validation).
- Dedicated “policy check” passes, using either rule-based logic or another model.
Orchestrator / runtime
This is “normal” software that:
- Manages agent loops (iterations, retries, back-off).
- Tracks tasks, steps, and logs for observability.
- Handles concurrency and multi-agent orchestration (if you have multiple agents).
- Integrates with queues, schedulers, and your infra.
This is typically built in your language of choice (TypeScript, Python, Go, etc.), using:
- HTTP APIs to the LLM provider.
- Internal SDKs/clients for tools.
- Messaging/queueing systems for long-running tasks (e.g., Kafka, SQS, RabbitMQ).
- Databases for state and memory.
Single-agent vs multi-agent
Single-agent
- One agent with a broad role, e.g., “Customer Support Agent.”
- Same LLM instance (logically) handles planning, tool selection, and execution.
- Simpler to build, easier to reason about, good for many use cases.
Multi-agent
- Multiple specialized agents, each with a narrower role and possibly different tools or prompts, for example:
- Planner Agent: decomposes the high-level goal into sub-tasks.
- Research Agent: does retrieval, reading, summarizing.
- Coder Agent: writes or modifies code, runs tests.
- Reviewer Agent: critiques and improves outputs.
Communication patterns:
- Shared state in a central store, where each agent reads/writes.
// This must have some locking mechanism, can't believe the issues and Race-Condition-like issues that'd arise. - Messages via queues (agent A posts a task, agent B consumes it).
- Coordinator that assigns work to the right agent based on metadata.
“LLM + API vs Agent
Fundamentally, an agentic system is an LLM with:
- A well-defined tool surface: typed APIs the LLM can call.
- A persistent control loop: run until goal completion, not just one turn.
- State and memory: so it can reason across multiple steps and past context.
- Policies: so it doesn’t call everything all the time or in unsafe ways.
However, the main differentiating features are:
- A simple integration:
- You decide when to call the LLM, when to call APIs, and how to combine them.
- You mainly use the LLM for text completion / classification / one-off tool calls.
- You decide when to call the LLM, when to call APIs, and how to combine them.
- An agentic system:
- The LLM decides which tools to call and in what order to achieve a goal.
- The runtime keeps calling the LLM and tools in a loop until the goal is done or a limit is reached.
Design considerations for developers
Prompt and tool design
- Make tools narrow and well-described so the LLM can choose correctly.
- Use structured function/tool calling where possible to reduce parsing fragility.
- Include:
- Clear tool descriptions.
- Example calls in the prompt (few-shot).
- Constraints and “don’t do X” examples.
Observability and debugging
- Log:
- LLM prompts and outputs (sanitized for sensitive data).
- All tool calls and results.
- State transitions (which step, which agent, which decision).
- Provide:
- Traces per task with all intermediate steps.
- Reproducible runs or replay tools for debugging.
Reliability strategies
- Limit the action surface at first; expose only a small, safe set of tools.
- Add step limits and cost/time caps.
- Implement validation and guardrails around any destructive or expensive actions.
- Use “critique” or “reflection” passes for important tasks (e.g., reviewer agent).
Example architecture
Imagine a “DevOps Agent” that can investigate alerts and suggest mitigations:
- Tools written as define APIs and integrations:
getAlerts,getLogs,runQuery,scaleService,createTicket. - Flow:
- Alert comes in → creates agent task with goal “Investigate alert X.”
- Agent loop:
- LLM inspects alert context and decides to call
getLogs. - Runtime calls
getLogs, stores result, feeds back to LLM. - LLM identifies probable cause, calls
createTicketwith details. - Depending on policy, may suggest but not execute
scaleServiceautomatically. - LLM emits
finishwith final summary + ticket link.
- LLM inspects alert context and decides to call
- Human can review summary, logs, and proposed actions.
Risks
Since AI agents rely on a continuous loop of prompts, there are many risks inherent to this cycle.
“Autonomy = increased risk”
Under Specification
No explicit instructions
Long-Term Planning
Goal Directness
Directness of Impact
Governance
There’s a serious need to place policies to govern agentic AI.
This includes
- Safeguards
- Process Controls
- Structures
Safeguards
Can we pause and/or shutdown requests and systems when certain tasks violate policy?
Will we be able to even catch these requests in time?
There also needs to be thoughtful consideration for when a human must be in the loop, able to stop and wait for the approval
Handling confidential info, and increasing privacy protection measures. These systems must be able to
- Sanitize inputs
- Detect PII
Process Controls
These are risk-based permissions policies that determine which actions Agents can undertake and which they NEVER should.
Audibility
When an agent arrives at a decision, can we audit that decision in the future if need be?
Monitoring & Evaluation
There should be oversight at all times, by systems that are always online even when a human isn’t available.
Organizational Structures
Organizations need to be accountable for what their agents undertake, and not hide behind an agent’s supposed autonomy.
Regulations need to be in place for task specific behaviors
Guard Rails
Model Level
Checking for bad faith actors using the models by submitting harmful requests that violate organizational policies and human ethical values.
Orchestration Layer
This layer is concerned with detecting infinite loops that if left unchecked will be resource heavy and lead to wasted compute and time.
Tool Layer
Limit each tool per agent, allowing only the appropriate permissions and access. This also limits agents from venturing beyond their specified areas.
Similar to role-based access control groups and policies in IT administration.
Testing
Rigorous test, including red hat testing.
Monitoring
Like any system, automated monitoring systems need to be in place to check for compliance violations and hallucinations.
References
- IBM – Agentic AI overview and motivations, including autonomy and plan–act–observe loops.
- https://www.ibm.com/think/topics/agentic-ai-vs-generative-ai
- https://www.workday.com/en-ca/topics/ai/agentic-ai.html
- https://aws.amazon.com/what-is/agentic-ai/
- https://www.redhat.com/en/topics/ai/what-is-agentic-ai
- Risks of Agentic AI: What You Need to Know About Autonomous AI
- Generative vs Agentic AI: Shaping the Future of AI Collaboration