AgentGuards

Getting started

AgentGuards checks every message before it reaches your AI agent, and every response before it reaches your users. This guide walks you through signup, installation, and what to do if a check blocks something unexpectedly.

Dashboard setup

Everything you need is in the dashboard — no CLI tools required to configure checks.

1

Sign up

Go to agentguards.co and click Start free. No credit card required. You get 5,000 checked requests per month on the free plan.

2

Copy your API key

In the dashboard, open Settings → API Keys. Click New key, give it a name (e.g. "local-dev"), and copy the ag_... token. You will not be able to see it again after closing the dialog.

3

Configure checks

Go to Checks. Every check is on by default. Toggle any check off, or click a check to tune its threshold — useful if a particular check is too sensitive for your use case.

4

Set up an integration

Follow the install steps below for your agent. Paste your API key where prompted. Restart your agent and send a test message — you should see the request appear in your dashboard logs.

5

Check your usage

The dashboard shows how many requests you have checked and how much of your monthly quota is consumed. Per-request logs are not available yet — they are coming in a future update.

Supported agents

AgentGuards works with any agent that accepts MCP servers, hooks, or can call a REST API.

Claude Code

Hooks (recommended)MCP serverAPI proxy

Hooks run at the OS level — the check happens before Claude processes your message. MCP runs inside the session and requires a CLAUDE.md instruction to call check_input.

Setup guide →

OpenAI Codex

MCP server

Add AgentGuards as an MCP server in ~/.codex/config.toml. Codex calls check_input on each turn via the MCP protocol.

Setup guide →

VS Code Copilot

MCP server

Add the MCP server to .vscode/mcp.json in your workspace. Requires VS Code 1.99+ with agent mode enabled.

Setup guide →

Any LLM app (API)

REST APIGateway proxy

Call /v1/guardrails/evaluate-input before sending the prompt to your model, or route through the Gateway to check and forward in one step.

Setup guide →

Install

Pick the method that matches how you use your agent. Hooks are the strongest option for Claude Code — they run before Claude sees the message.

Claude Code — Hooks (recommended)

The hook script runs as a system process. Claude never processes your message if the hook blocks it — there is no way for an injected instruction to bypass it. The env vars go in settings.json not your shell profile, so every session sees them.

Download the hook script
# 1. Download the hook script
curl -o ~/.claude/agentguards_hook.py \
  https://prod.agentguards.co/static/agentguards_hook.py

# 2. Add your API key to Claude Code settings
# Open ~/.claude/settings.json and add the block below
~/.claude/settings.json
{
  "env": {
    "AGENTGUARDS_URL": "https://prod.agentguards.co",
    "AGENTGUARDS_API_KEY": "ag_YOUR_TOKEN_HERE"
  },
  "hooks": {
    "UserPromptSubmit": [
      {
        "hooks": [{
          "type": "command",
          "command": "python3 ~/.claude/agentguards_hook.py UserPromptSubmit"
        }]
      }
    ],
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [{
          "type": "command",
          "command": "python3 ~/.claude/agentguards_hook.py PreToolUse"
        }]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Bash|WebFetch|WebSearch",
        "hooks": [{
          "type": "command",
          "command": "python3 ~/.claude/agentguards_hook.py PostToolUse"
        }]
      }
    ]
  }
}

Restart Claude Code. The hook fires on every message and every Bash tool call.

Claude Code — MCP server

The MCP approach runs inside the Claude session. It works well alongside a CLAUDE.md that instructs Claude to call check_input on each turn. Less strong than hooks for blocking attacks, but useful for building guardrails into your own agent workflows.

Add MCP server
claude mcp add agentguards \
  --env AGENTGUARD_URL=https://prod.agentguards.co \
  --env AGENTGUARD_API_KEY=ag_YOUR_TOKEN_HERE \
  -- npx -y @agentguards/mcp

OpenAI Codex

Add the MCP server entry to your Codex config. Codex will call AgentGuards on each turn via the MCP protocol.

~/.codex/config.toml
# ~/.codex/config.toml
[mcp_servers.agentguards]
command = "npx"
args = ["-y", "@agentguards/mcp"]
env = { AGENTGUARD_URL = "https://prod.agentguards.co", AGENTGUARD_API_KEY = "ag_YOUR_TOKEN_HERE" }

Any app — REST API

Call the check endpoint directly before passing a prompt to your model. Works from any language or framework.

Check a prompt
curl -X POST https://prod.agentguards.co/v1/guardrails/evaluate-input \
  -H "X-API-Key: ag_YOUR_TOKEN_HERE" \
  -H "Content-Type: application/json" \
  -d '{"text": "your prompt here"}'

See the Gateway docs to check and forward to an LLM in a single call.

What gets approved, what gets blocked

AgentGuards returns one of four decisions on every request. The worst decision across all checks wins.

allow

Normal traffic — no issues found

The request passes through unchanged. Your agent receives it and responds as normal.

redact

PII detected (email, phone, SSN, credit card)

The matched text is replaced with a placeholder (e.g. [EMAIL]) before the prompt reaches the model. Your agent still runs — it just does not see the raw sensitive value.

block

Prompt injection, jailbreak, secret, data exfiltration, restricted topic, or toxicity

The request is stopped. The agent never receives the message. The user sees a formatted block message explaining which check triggered.

escalate

Borderline — matched but below the block threshold

The request is allowed, but counted separately in your usage stats. Useful for tuning thresholds before enabling hard blocks.

To see which checks produce which decisions, read the supported checks reference.

What happens to your prompt data

The short version: we check it, then discard it.

Processed in AWS — EU region

All checks run on infrastructure hosted in AWS eu-north-1 (Stockholm). Data does not leave the EU.

Not stored by default

Prompt content is evaluated in memory and discarded. We store the decision, the check results, and metadata (timestamp, tenant ID, use-case) — not the prompt text itself. Log-level prompt storage can be enabled per tenant for debugging and is opt-in.

Not used for training

We do not use your prompt content to train models, fine-tune classifiers, or improve AgentGuards checks. The ML models we use are pre-trained and run locally in our inference environment.

Not sold or shared

Prompt data is not shared with third parties, sold, or used for any purpose outside of providing the service to you.

API keys are encrypted at rest

Your tenant API key is stored encrypted. The AgentGuards system key used to call upstream model providers is never exposed in logs or API responses.

Something got blocked unexpectedly

False positives happen, especially with domain-specific language or code that superficially resembles an attack pattern. Here is how to fix it.

Check which rule fired

When a block happens, the agent displays a message naming which check triggered (e.g. "prompt_injection") and the reason. Per-request logs in the dashboard are coming soon — for now, use this message to identify the check to tune.

Tune the threshold

In dashboard → Checks, click the check that fired. Raise the threshold (e.g. injection score from 0.7 to 0.85) to reduce sensitivity. Changes take effect immediately — no restart needed.

Disable a specific check

If a check is consistently producing false positives for your use case, toggle it off in dashboard → Checks. You can re-enable it at any time.

Temporarily allow all traffic for debugging

Set AGENTGUARDS_FAIL_OPEN=true in your settings.json env block. This lets traffic through if a check returns an error, but does NOT bypass explicit block decisions. Use only while diagnosing — remove it when done.

Disable all checks temporarily (env flag)
# In your dashboard → Checks, toggle off the check causing the block.
# Or tune the threshold for that check to be less sensitive.
# To temporarily allow everything while debugging, you can set:
AGENTGUARDS_FAIL_OPEN=true
# in your settings.json "env" block — this allows traffic if the check
# returns an error, but does NOT bypass blocks that explicitly matched.

Still stuck? Email support with the correlation ID from the log entry and we will look at the specific request.

Uninstall

Remove AgentGuards from your agent in a few steps.

Claude Code — Hooks

Remove the hook entries from ~/.claude/settings.json. You can also remove the env block if you added it only for AgentGuards.

Remove hook script
# Remove the hook entries from ~/.claude/settings.json
# Then optionally delete the script
rm ~/.claude/agentguards_hook.py

Claude Code — MCP

Remove MCP server
claude mcp remove agentguards

OpenAI Codex

Remove the [mcp_servers.agentguards] section from ~/.codex/config.toml.

REST API / Gateway

Remove the evaluate-input call from your application code. No client software to uninstall.

To revoke your API key, go to dashboard → Settings → API Keys and delete the key. The key will stop working immediately.