Result
FAILPR
Total Cases
64
Pass Rate
75%
Failed
16
Violations
5
Deltas vs Baseline
Accuracy-8.5%
Faithfulness-0.1
PII0.0
Jailbreak+3.0
StartedDec 12, 2:00 PM
FinishedDec 12, 2:30 PM
Modelgpt-4o-mini
Executionsimulated
TargetOpenAI (Server Key)
Suite Upload Batches
code_review_new_cases.csv
32 cases