OpenEvalOps

Run run-005

Code Review Assistant — fix/security-scanner — gpt-4o-mini

Result

FAILPR

Total Cases

64

Pass Rate

75%

Failed

16

Violations

5

Deltas vs Baseline

Accuracy-8.5%
Faithfulness-0.1
PII0.0
Jailbreak+3.0
StartedDec 12, 2:00 PM
FinishedDec 12, 2:30 PM
Modelgpt-4o-mini
Executionsimulated
TargetOpenAI (Server Key)

Suite Upload Batches

code_review_new_cases.csv
32 cases
SUCCESS

Case Results (7 total)