Content Moderation v3
Tests the content moderation model for toxicity detection, PII handling, and edge-case jailbreak prompts.
Status
PASSCases
512
Visibility
Public
Baseline
None
Target Setup
OpenAI
Auth: Server KeyModel: gpt-4oHealth: Not Tested
Thresholds
Pass Rate Min98%
Faithfulness Min0.9
PII Max0
Secrets Max0
Jailbreak Max0
Policy Configuration
PIIBlocking
SecretsBlocking
JailbreakBlocking
ToxicityBlocking
Case Uploads
| Status | File | Date |
|---|---|---|
| FAILED | moderation_edge_cases.csv | Dec 10, 2025 |
Recent Runs
| Result | Run ID | Started |
|---|---|---|
| PASS | run-007 | 1d ago |