Documentation
About AgentGuards
Who we are, why we built it, and who it is designed for.
Getting started
Signup, dashboard setup, install, data privacy, and how to handle unexpected blocks.
Integrations
Claude Code, VS Code Copilot, OpenAI Codex, and the REST API — full code examples.
Gateway API
Route any LLM call through AgentGuards from Node.js, Python, or any HTTP client.
How hooks work
Which commands are approved, denied, or need your approval — and what each hook event does.
AgentGuards docs
AgentGuards is an LLM security guardrail platform. It screens every prompt before it reaches your model and every response before it reaches your users — blocking prompt injection, jailbreaks, data exfiltration, and more in under 50 ms.
You connect via proxy, MCP server, shell hooks, or REST API. AgentGuards sits transparently in the request path: clean traffic flows through unchanged; threats are blocked, redacted, or escalated before they can do harm.
Supported checks
Checks run in parallel on every request. Each returns a decision — the worst decision across all checks wins.
Input guardrails
Instruction override and system message hijacking attempts.
DAN mode, roleplay exploits, and unrestricted persona requests.
SSN, email, phone, credit card — matched content is redacted automatically.
AWS keys, GitHub tokens, JWTs, API keys, and private keys.
Training data extraction, system prompt leakage, env var theft.
Insider trading, illegal content, and custom tenant-defined topics.
Violent, hateful, and weapon-related content.
Hidden text, zero-width chars, and base64-encoded payloads in fetched content.
Semantic second-pass that catches paraphrased injections missed by regex.
ML classifier (Llama-Prompt-Guard-2-86M) for high-confidence detection.
Output validation
Detects unsupported dates, figures, and claims not grounded in context.
Flags excessive hedging that signals low-confidence responses.
Blocks forbidden disclosures such as model metadata leakage.
Validates JSON output against your expected schema.
Catches harmful instructions or exploitation guides in responses.
Detects boilerplate AI filler that adds no value.
Ensures grounded responses include required citations.
Customising checks
Every check is independently configurable per tenant. No code changes required for most customisations — the dashboard exposes the controls directly.
Enable / disable checks
Toggle any check on or off from the dashboard without redeployment.
Tune thresholds
Adjust sensitivity — injection score, LLM confidence, PromptGuard score — to reduce false positives in your domain.
Custom patterns
Add PII patterns (regexes), secret types, and restricted topic keywords specific to your use case.
Policy rules
Write allow / deny / warn / escalate rules in YAML, scoped by role, channel, or use-case identifier.
Manage your checks in the dashboard → Checks.
Integrations
AgentGuards fits into your existing AI toolchain — no proxy to operate, no model to retrain.
Claude Code
Proxy, MCP server, or shell hooks — covers API key and Claude Pro/Max users.
Gateway API
Route any LLM call through AgentGuards from Node.js, Python, or any HTTP client.
OpenAI Codex
Add AgentGuards as an MCP server in OpenAI Codex CLI.
REST API
Call the guardrails endpoint directly from any language or framework.
Video guides
Prefer watching to reading? Our YouTube channel has quick-start walkthroughs, how-to videos, and real-world integration examples.
AgentGuards on YouTube
Quick-start videos, how-to guides, and integration examples.
Watch the channel →