AgentGuards

Documentation

AgentGuards docs

AgentGuards is an LLM security guardrail platform. It screens every prompt before it reaches your model and every response before it reaches your users — blocking prompt injection, jailbreaks, data exfiltration, and more in under 50 ms.

You connect via proxy, MCP server, shell hooks, or REST API. AgentGuards sits transparently in the request path: clean traffic flows through unchanged; threats are blocked, redacted, or escalated before they can do harm.

Your App / AgentAgentGuards(input guardrails)LLMAgentGuards(output validation)User

Supported checks

Checks run in parallel on every request. Each returns a decision — the worst decision across all checks wins.

Input guardrails

Prompt Injectionblock

Instruction override and system message hijacking attempts.

Jailbreakblock

DAN mode, roleplay exploits, and unrestricted persona requests.

PII Detectionredact

SSN, email, phone, credit card — matched content is redacted automatically.

Secret Detectionblock

AWS keys, GitHub tokens, JWTs, API keys, and private keys.

Data Exfiltrationblock

Training data extraction, system prompt leakage, env var theft.

Restricted Topicsblock

Insider trading, illegal content, and custom tenant-defined topics.

Toxicityblock

Violent, hateful, and weapon-related content.

Web Content Injectionblock

Hidden text, zero-width chars, and base64-encoded payloads in fetched content.

LLM-as-Judgeadvancedblock

Semantic second-pass that catches paraphrased injections missed by regex.

PromptGuard MLadvancedblock

ML classifier (Llama-Prompt-Guard-2-86M) for high-confidence detection.

Output validation

Hallucinationrepair

Detects unsupported dates, figures, and claims not grounded in context.

Confidenceescalate

Flags excessive hedging that signals low-confidence responses.

Policy Compliancerepair

Blocks forbidden disclosures such as model metadata leakage.

Schema Validityrepair

Validates JSON output against your expected schema.

Unsafe Languagereject

Catches harmful instructions or exploitation guides in responses.

Genericityrepair

Detects boilerplate AI filler that adds no value.

Citation Checkrepair

Ensures grounded responses include required citations.

Customising checks

Every check is independently configurable per tenant. No code changes required for most customisations — the dashboard exposes the controls directly.

Enable / disable checks

Toggle any check on or off from the dashboard without redeployment.

Tune thresholds

Adjust sensitivity — injection score, LLM confidence, PromptGuard score — to reduce false positives in your domain.

Custom patterns

Add PII patterns (regexes), secret types, and restricted topic keywords specific to your use case.

Policy rules

Write allow / deny / warn / escalate rules in YAML, scoped by role, channel, or use-case identifier.

Manage your checks in the dashboard → Checks.

Integrations

AgentGuards fits into your existing AI toolchain — no proxy to operate, no model to retrain.

See full setup guides with code examples →

Video guides

Prefer watching to reading? Our YouTube channel has quick-start walkthroughs, how-to videos, and real-world integration examples.

AgentGuards on YouTube

Quick-start videos, how-to guides, and integration examples.

Watch the channel →

Get help