Content Moderation v3

Tests the content moderation model for toxicity detection, PII handling, and edge-case jailbreak prompts.

PASS

512

Public

None

OpenAI

Auth: Server KeyModel: gpt-4oHealth: Not Tested

Pass Rate Min98%

Faithfulness Min0.9

PII Max0

Secrets Max0

Jailbreak Max0

PIIBlocking

SecretsBlocking

JailbreakBlocking

ToxicityBlocking

Case Uploads

Status	File	Template	Cases	Date
FAILED	moderation_edge_cases.csv	Policy Compliance Cases	0	Dec 10, 2025

Result	Run ID	Trigger	Model	Execution	Pass Rate	Accuracy Delta	Started
PASS	run-007	Nightly	gpt-4o	Simulated	99.2%	+0.8%	1d ago