Policies

How to create and configure policies.

A policy defines how Guards evaluates chat messages. It determines what to look for and how to respond when issues are detected.

Policy

A policy is a collection of rules. When you call the API, you specify which policy to use by its handle.

Here are some examples of policies you may create:

chatbot-user-input

Screen user messages before sending to the LLM. Block PII, jailbreak attempts, and off-topic requests.

chatbot-response

Screen LLM responses before showing to users. Catch hallucinations, inappropriate content, or leaked system prompts.

support-agent

Policy for customer support assistants. Monitor for sensitive topics and ensure responses stay on-brand.

internal-copilot

Relaxed policy for internal tools. Allow more flexibility while still logging for compliance.

Rules

A rule pairs a detector with actions. You configure what action to take for each label the detector can emit.

Detector → Label → Allow / Monitor / Block

Here are some example rules:

PII Detection

Config: detect emails, phone numbers, credit cards

Labels: EMAIL → Block PHONE_NUMBER → Monitor CREDIT_CARD → Block

Keyword Filter

Config: keywords = ["competitor", "lawsuit", "confidential"]

Labels: KEYWORD_FOUND → Block

Language

Config: allowed = ["en", "fr"]

Labels: LANGUAGE_NOT_ALLOWED → Block

Detectors

Detectors analyze content and emit labels describing what they found. For example, the PII detector might emit EMAIL, PHONE_NUMBER, or CREDIT_CARD.

See all available detectors →

Actions

Allow

Message should pass through, as it respects the policy.

Monitor

Message is flagged for review and will be logged.

Block

Message should be blocked. Your application should handle this accordingly.

Creating a policy

Go to Policies in your dashboard
Click Create Policy and give it a name and handle
Add rules by selecting detectors
Configure the action for each label the detector can emit
Save your policy

Ready to create your first policy? Head to the Policies dashboard to get started.