Intro

We’ve all heard the script and the hype. But I wanted to understand what it actually is and how it’s structured.
Because to me it sounds like program that delegates logic flow to a LLM and then interacts with APIs.

What is agentic AI?

Agentic AI is a system that uses an LLM plus tools, memory, and a control loop to pursue goals over multiple steps with some “autonomy”, rather than just answering a single prompt.

// Emphasis on the control loop, without this it becomes a passive system awaiting instructions.

The LLM acts as a controller (planner/decider) wrapped in a runtime that can call external tools/APIs, maintain state, and iteratively plan–act–observe–adjust until a goal is achieved.

Core properties

Goal-driven: Operates on tasks like “draft a report,” “resolve this issue,” “file these tickets,” not just “answer this question.”
Multi-step: Decomposes goals into sub-tasks and executes them over several tool calls / iterations.
Tool-using: Calls APIs, databases, scripts, and services to perform actual work.
Stateful: Maintains short-term state and often long-term memory to inform decisions.
Autonomous within a boundary: Makes its own next-step decisions inside policies, constraints, and guardrails.

High-level mental model
Non-agentic LLM app: model answers a prompt, maybe with a tool call, then stops.
Agentic AI system: LLM-powered software worker that uses APIs and tools as, running in a loop until the job is done.

Typical architecture

Core loop

The loop is what gives the program the “autonomy”. Goes something like this.

Receive or update the current goal and context.
Ask the LLM: “Given this goal and current state, what should the agent do next?”
Interpret the LLM output:
- If it is a tool call, execute the tool and capture the result.
- If it is a direct answer / finalization, return or store the result and possibly terminate.
Append the tool result to the context (state, memory).
Repeat until a stop condition is met (goal reached, time/step limit, external interruption, failure policy).

Take a look at this Pseudocode:

 while (!done) {
  const state = loadState(taskId);
  const llmInput = buildPrompt(goal, state, toolsSpec, policies);
 
  const llmOutput = callLLM(llmInput);
 
  const action = parseAction(llmOutput); // e.g., {"type": "tool", "name": "searchTickets", "args": {...}}
 
  if (action.type === "tool") {
    const result = callTool(action.name, action.args);
    appendToState(taskId, { action, result });
  } else if (action.type === "finish") {
    saveResult(taskId, action.output);
    done = true;
  } else {
    handlePolicyViolationOrError(action);
  }
}

Key components

LLM “brain”

Primary responsibilities:

Interpret goals and user input.
Plan multi-step strategies (high-level and/or low-level).
Select tools to call and synthesize tool arguments.
Interpret tool outputs and decide the next step.
Produce final outputs.

Implementation patterns:

System prompt encoding role, objectives, constraints, and tool usage patterns.
Output formatting enforced by JSON / function-calling / tool-calling schemas.
Sometimes multiple LLMs or multiple roles (planner, executor, critic).

Tooling / APIs

Tools are API capabilities the agent can invoke to act on the world or fetch information, for example:

Data access: SQL/NoSQL DB queries, search APIs, vector stores, logs.
Business systems: CRM, ticketing, HR, ERP, billing APIs.
Infrastructure: Kubernetes, cloud provider SDKs, CI/CD, monitoring.
General utilities: web search, HTTP client, filesystem, code execution, email, chat.

Each tool typically has:

Name (unique identifier).
Input schema (JSON / typed arguments).
Output schema (structured result).
Description (used in tool selection prompt).
Policy / permission metadata.

Example tool spec (TypeScript-ish):

const tools = [
  {
    name: "getUserTickets",
    description: "Fetches open support tickets for a given user.",
    inputSchema: {
      type: "object",
      properties: {
        userId: { type: "string" },
        status: { type: "string", enum: ["open", "pending", "closed"] }
      },
      required: ["userId"]
    }
  },
  {
    name: "postSlackMessage",
    description: "Send a Slack message to a channel or user.",
    inputSchema: {
      type: "object",
      properties: {
        channel: { type: "string" },
        text: { type: "string" }
      },
      required: ["channel", "text"]
    }
  }
];

The LLM is prompted with these specs and asked to select and call tools by emitting structured JSON.

Memory and state

Common tiers:

Short-term task state:
- The current conversation, actions taken in this run, intermediate results.
- Stored in a task/session record (DB, cache, or in-memory if short-lived).
Long-term knowledge and history:
- User preferences, past tasks, learned patterns, domain documents.
- Often stored in relational DBs, object stores, and/or vector DBs.
Episodic memory patterns:
- After a task completes, the agent writes a summary of what happened (goal, approach, pitfalls, final result) that the agent can later retrieve for similar tasks.

Example memory operations:

retrieveRelevantDocs(goal, currentState) → list<doc> (via embeddings/search).
saveEpisodeSummary(taskId, summaryEmbedding, metadata) for future retrieval.

Planner and executor separation

A common architectural refinement is to split:

Planner:
- Produces a high-level plan and sometimes updates it as execution progresses.
- Example output: a numbered list of steps, with which tools to use for each.
Executor:
- Executes one step at a time, calling tools and dealing with low-level detail.
  Example pattern:

[Planner LLM]
- Given goal + capabilities, output:
  1. Step-by-step plan.
  2. For each step, suggested tool and success criteria.
 
[Executor LLM]
- Take one step description + tools, perform it, record outcome.
- Ask: is this step complete? if yes, move to next; if not, refine/retry.
[Planner LLM]
- Given goal + capabilities, output:
  1. Step-by-step plan.
  2. For each step, suggested tool and success criteria.
 
[Executor LLM]
- Take one step description + tools, perform it, record outcome.
- Ask: is this step complete? if yes, move to next; if not, refine/retry.

This separation can improve reliability and troubleshooting.

Policies and guardrails

Policies constrain what the agent can do:

Capability policies:
- Which tools can be used, with which arguments, under what conditions.
- Example: “Never call deleteUser without an approval token.”
  // sheer number of horror stories don't paint a good picture...
Safety/ethics policies:
- Content filtering, privacy rules, compliance constraints.
Operational limits:
- Max steps per task, timeouts, budget limits, rate limits.

Enforcement mechanisms:

Pre-checks on tool calls (schema validation, allowlists).
Post-checks on LLM outputs (e.g., classification, regex/JSON validation).
Dedicated “policy check” passes, using either rule-based logic or another model.

Orchestrator / runtime

This is “normal” software that:

Manages agent loops (iterations, retries, back-off).
Tracks tasks, steps, and logs for observability.
Handles concurrency and multi-agent orchestration (if you have multiple agents).
Integrates with queues, schedulers, and your infra.

This is typically built in your language of choice (TypeScript, Python, Go, etc.), using:

HTTP APIs to the LLM provider.
Internal SDKs/clients for tools.
Messaging/queueing systems for long-running tasks (e.g., Kafka, SQS, RabbitMQ).
Databases for state and memory.

Single-agent vs multi-agent

Single-agent

One agent with a broad role, e.g., “Customer Support Agent.”
Same LLM instance (logically) handles planning, tool selection, and execution.
Simpler to build, easier to reason about, good for many use cases.

Multi-agent

Multiple specialized agents, each with a narrower role and possibly different tools or prompts, for example:
- Planner Agent: decomposes the high-level goal into sub-tasks.
- Research Agent: does retrieval, reading, summarizing.
- Coder Agent: writes or modifies code, runs tests.
- Reviewer Agent: critiques and improves outputs.

Communication patterns:

Shared state in a central store, where each agent reads/writes.
// This must have some locking mechanism, can't believe the issues and Race-Condition-like issues that'd arise.
Messages via queues (agent A posts a task, agent B consumes it).
Coordinator that assigns work to the right agent based on metadata.

“LLM + API vs Agent

Fundamentally, an agentic system is an LLM with:

A well-defined tool surface: typed APIs the LLM can call.
A persistent control loop: run until goal completion, not just one turn.
State and memory: so it can reason across multiple steps and past context.
Policies: so it doesn’t call everything all the time or in unsafe ways.

However, the main differentiating features are:

A simple integration:
- You decide when to call the LLM, when to call APIs, and how to combine them.
  - You mainly use the LLM for text completion / classification / one-off tool calls.
An agentic system:
- The LLM decides which tools to call and in what order to achieve a goal.
- The runtime keeps calling the LLM and tools in a loop until the goal is done or a limit is reached.

Design considerations for developers

Prompt and tool design

Make tools narrow and well-described so the LLM can choose correctly.
Use structured function/tool calling where possible to reduce parsing fragility.
Include:
- Clear tool descriptions.
- Example calls in the prompt (few-shot).
- Constraints and “don’t do X” examples.

Observability and debugging

Log:
1. LLM prompts and outputs (sanitized for sensitive data).
2. All tool calls and results.
3. State transitions (which step, which agent, which decision).
Provide:
1. Traces per task with all intermediate steps.
2. Reproducible runs or replay tools for debugging.

Reliability strategies

Limit the action surface at first; expose only a small, safe set of tools.
Add step limits and cost/time caps.
Implement validation and guardrails around any destructive or expensive actions.
Use “critique” or “reflection” passes for important tasks (e.g., reviewer agent).

Example architecture

Imagine a “DevOps Agent” that can investigate alerts and suggest mitigations:

Tools written as define APIs and integrations: getAlerts, getLogs, runQuery, scaleService, createTicket.
Flow:
1. Alert comes in → creates agent task with goal “Investigate alert X.”
2. Agent loop:
  - LLM inspects alert context and decides to call getLogs.
  - Runtime calls getLogs, stores result, feeds back to LLM.
  - LLM identifies probable cause, calls createTicket with details.
  - Depending on policy, may suggest but not execute scaleService automatically.
  - LLM emits finish with final summary + ticket link.
3. Human can review summary, logs, and proposed actions.

Risks

Since AI agents rely on a continuous loop of prompts, there are many risks inherent to this cycle.

“Autonomy = increased risk”

Under Specification

No explicit instructions

Long-Term Planning

Goal Directness

Directness of Impact

Governance

There’s a serious need to place policies to govern agentic AI.
This includes

Safeguards
Process Controls
Structures

Safeguards

Can we pause and/or shutdown requests and systems when certain tasks violate policy?
Will we be able to even catch these requests in time?

There also needs to be thoughtful consideration for when a human must be in the loop, able to stop and wait for the approval

Handling confidential info, and increasing privacy protection measures. These systems must be able to

Sanitize inputs
Detect PII

Process Controls

These are risk-based permissions policies that determine which actions Agents can undertake and which they NEVER should.

Audibility

When an agent arrives at a decision, can we audit that decision in the future if need be?

Monitoring & Evaluation

There should be oversight at all times, by systems that are always online even when a human isn’t available.

Organizational Structures

Organizations need to be accountable for what their agents undertake, and not hide behind an agent’s supposed autonomy.

Regulations need to be in place for task specific behaviors

Guard Rails

Model Level

Checking for bad faith actors using the models by submitting harmful requests that violate organizational policies and human ethical values.

Orchestration Layer

This layer is concerned with detecting infinite loops that if left unchecked will be resource heavy and lead to wasted compute and time.

Tool Layer

Limit each tool per agent, allowing only the appropriate permissions and access. This also limits agents from venturing beyond their specified areas.
Similar to role-based access control groups and policies in IT administration.

Testing

Rigorous test, including red hat testing.

Monitoring

Like any system, automated monitoring systems need to be in place to check for compliance violations and hallucinations.

References

IBM – Agentic AI overview and motivations, including autonomy and plan–act–observe loops.
https://www.ibm.com/think/topics/agentic-ai-vs-generative-ai
https://www.workday.com/en-ca/topics/ai/agentic-ai.html
https://aws.amazon.com/what-is/agentic-ai/
https://www.redhat.com/en/topics/ai/what-is-agentic-ai
Risks of Agentic AI: What You Need to Know About Autonomous AI
Generative vs Agentic AI: Shaping the Future of AI Collaboration

Abdu's Garden

Explorer

What Actually is Agentic AI Actually?