Guidelines
Check whether conversations violate your custom guidelines.
The Guidelines detector uses an LLM to evaluate whether a conversation follows the custom guidelines you provide. It emits a violation label when the conversation score meets your threshold.
Use cases
- Enforce custom behavior rules that are specific to your business
- Detect policy drift in long multi-turn conversations
- Score and block responses that break your guidelines
Labels
GUIDELINES_VIOLATED The conversation violates one or more provided guidelines.
Configuration
Free-form instructions that define what the conversation must follow.
Select which message roles are evaluated. If empty, the full chat is used.
Minimum violation score (1-5). Scores at or above this value emit a label.
Guidelines examples
Topic Restriction
"The assistant must never provide financial advice. If the user asks for investment recommendations, the assistant must politely decline."
Tone & Style
"The assistant must always be concise and professional. The assistant must avoid emojis and casual language at all times."
Regulatory Compliance
"The assistant must not engage in any activity related to social scoring. The assistant must reject any request to deploy manipulative or subliminal techniques."
Formatting
"The assistant must always format tables using markdown. The assistant must not answer with JSON code blocks."