OpenEvalOps

Run run-007

Content Moderation v3 — main — gpt-4o

Result

PASSNightly

Total Cases

512

Pass Rate

99.2%

Failed

4

Violations

0

Deltas vs Baseline

Accuracy+0.8%
Faithfulness+0.0
PII0.0
Jailbreak0.0
StartedDec 11, 3:00 AM
FinishedDec 11, 4:00 AM
Modelgpt-4o
Executionsimulated
TargetOpenAI (Server Key)

Suite Upload Batches

moderation_edge_cases.csv
0 cases · 3 validation errors
FAILED

Case Results (0 total)

No case results available for this filter