Documentation

About AgentGuards

Who we are, why we built it, and who it is designed for.

Getting started

Signup, dashboard setup, install, data privacy, and how to handle unexpected blocks.

Integrations

Claude Code, VS Code Copilot, OpenAI Codex, and the REST API — full code examples.

Gateway API

Route any LLM call through AgentGuards from Node.js, Python, or any HTTP client.

How hooks work

Which commands are approved, denied, or need your approval — and what each hook event does.

AgentGuards docs

AgentGuards is an LLM security guardrail platform. It screens every prompt before it reaches your model and every response before it reaches your users — blocking prompt injection, jailbreaks, data exfiltration, and more in under 50 ms.

You connect via proxy, MCP server, shell hooks, or REST API. AgentGuards sits transparently in the request path: clean traffic flows through unchanged; threats are blocked, redacted, or escalated before they can do harm.

Your App / Agent→AgentGuards(input guardrails)→LLM→AgentGuards(output validation)→User

Supported checks

Checks run in parallel on every request. Each returns a decision — the worst decision across all checks wins.

Input guardrails

Prompt Injectionblock

Instruction override and system message hijacking attempts.

Jailbreakblock

DAN mode, roleplay exploits, and unrestricted persona requests.

PII Detectionredact

SSN, email, phone, credit card — matched content is redacted automatically.

Secret Detectionblock

AWS keys, GitHub tokens, JWTs, API keys, and private keys.

Data Exfiltrationblock

Training data extraction, system prompt leakage, env var theft.

Restricted Topicsblock

Insider trading, illegal content, and custom tenant-defined topics.

Toxicityblock

Violent, hateful, and weapon-related content.

Web Content Injectionblock

Hidden text, zero-width chars, and base64-encoded payloads in fetched content.

LLM-as-Judgeadvancedblock

Semantic second-pass that catches paraphrased injections missed by regex.

PromptGuard MLadvancedblock

ML classifier (Llama-Prompt-Guard-2-86M) for high-confidence detection.

Output validation

Hallucinationrepair

Detects unsupported dates, figures, and claims not grounded in context.

Confidenceescalate

Flags excessive hedging that signals low-confidence responses.

Policy Compliancerepair

Blocks forbidden disclosures such as model metadata leakage.

Schema Validityrepair

Validates JSON output against your expected schema.

Unsafe Languagereject

Catches harmful instructions or exploitation guides in responses.

Genericityrepair

Detects boilerplate AI filler that adds no value.

Citation Checkrepair

Ensures grounded responses include required citations.

Customising checks

Every check is independently configurable per tenant. No code changes required for most customisations — the dashboard exposes the controls directly.

Enable / disable checks

Toggle any check on or off from the dashboard without redeployment.

Tune thresholds

Adjust sensitivity — injection score, LLM confidence, PromptGuard score — to reduce false positives in your domain.

Custom patterns

Add PII patterns (regexes), secret types, and restricted topic keywords specific to your use case.

Policy rules

Write allow / deny / warn / escalate rules in YAML, scoped by role, channel, or use-case identifier.

Manage your checks in the dashboard → Checks.

Integrations

AgentGuards fits into your existing AI toolchain — no proxy to operate, no model to retrain.

Claude Code

Proxy, MCP server, or shell hooks — covers API key and Claude Pro/Max users.

ProxyMCPHooks

Gateway API

Route any LLM call through AgentGuards from Node.js, Python, or any HTTP client.

Node.jsPythonREST

OpenAI Codex

Add AgentGuards as an MCP server in OpenAI Codex CLI.

MCPOpenAI

REST API

Call the guardrails endpoint directly from any language or framework.

RESTcurl

See full setup guides with code examples →

Video guides

Prefer watching to reading? Our YouTube channel has quick-start walkthroughs, how-to videos, and real-world integration examples.

AgentGuards on YouTube

Quick-start videos, how-to guides, and integration examples.

Watch the channel →

Get help

Email support

Send us a question or bug report and we'll get back to you within one business day.

support@agentguards.co

Join our Slack

Request an invite to our community Slack for faster back-and-forth, early previews, and direct access to the team.

Request an invite →