OpenEvalOps

Run run-006

Code Review Assistant — main — gpt-4o

Result

WARNNightly

Total Cases

64

Pass Rate

84.4%

Failed

10

Violations

2

Deltas vs Baseline

Accuracy-3.1%
Faithfulness-0.1
PII0.0
Jailbreak+1.0
StartedDec 11, 2:00 AM
FinishedDec 11, 2:25 AM
Modelgpt-4o
Executionsimulated
TargetOpenAI (Server Key)

Suite Upload Batches

code_review_new_cases.csv
32 cases
SUCCESS

Case Results (0 total)

No case results available for this filter