Detectors

Detectors analyze chat messages and emit labels describing what they found.

Each detector runs against incoming messages and emits labels when it finds something. You configure what action to take for each label in your policy rules.

Instant <10ms Fast ~50ms Deep ~150ms

Security

Protect against attacks and malicious content.

Known Attacks

Fast

Detect jailbreak attempts and prompt injections using similarity matching.

Learn more

Input Safety

Deep

Classify harmful content using a specialized model.

Learn more

Obfuscation Detection

Instant

Detect unicode tricks, homoglyphs, and encoding bypasses.

Learn more

Unknown URLs

Instant

Detect URLs not in your approved domain list.

Learn more

Rego Policy

Fast

Evaluate messages with a custom Rego policy.

Learn more

Content

Filter sensitive data and unwanted content.

PII Detection

Fast

Detect names, emails, phone numbers, credit cards, and IDs.

Learn more

Keyword Filter

Instant

Detect specific keywords or regex patterns.

Learn more

Quality

Ensure responses are relevant, grounded, and well-formed.

Language Filter

Instant

Detect message language and filter by allowed languages.

Learn more

Task Adherence

Deep

Ensure conversations stay on topic.

Learn more

Guidelines

Deep

Check conversations against custom behavior guidelines.

Learn more

Groundedness

Deep

Check if responses are grounded in provided context.

Learn more

Chat Size

Instant

Enforce message length and conversation size limits.

Learn more

Add detectors to your policy

Create rules in your policy to use these detectors. Configure parameters and set actions for each label.

Go to Policies →