Guidelines

Check whether conversations violate your custom guidelines.

The Guidelines detector uses an LLM to evaluate whether a conversation follows the custom guidelines you provide. It emits a violation label when the conversation score meets your threshold.

Recommended for Input & Output

Use cases

  • Enforce custom behavior rules that are specific to your business
  • Detect policy drift in long multi-turn conversations
  • Score and block responses that break your guidelines

Labels

GUIDELINES_VIOLATED

The conversation violates one or more provided guidelines.

Configuration

Guidelines required

Free-form instructions that define what the conversation must follow.

Roles default: all messages

Select which message roles are evaluated. If empty, the full chat is used.

Threshold default: 3

Minimum violation score (1-5). Scores at or above this value emit a label.

Guidelines examples

Topic Restriction

"The assistant must never provide financial advice. If the user asks for investment recommendations, the assistant must politely decline."

Tone & Style

"The assistant must always be concise and professional. The assistant must avoid emojis and casual language at all times."

Regulatory Compliance

"The assistant must not engage in any activity related to social scoring. The assistant must reject any request to deploy manipulative or subliminal techniques."

Formatting

"The assistant must always format tables using markdown. The assistant must not answer with JSON code blocks."