OpenEvalOps

Run run-001

Customer Support Bot v2 — main — gpt-4o

Result

PASSNightly

Total Cases

128

Pass Rate

95.3%

Failed

6

Violations

0

Deltas vs Baseline

Accuracy+1.2%
Faithfulness+0.0
PII0.0
Jailbreak0.0
StartedDec 10, 2:00 AM
FinishedDec 10, 2:45 AM
Modelgpt-4o
Executionsimulated
TargetOpenAI (Server Key)

Suite Upload Batches

support_cases_batch_1.csv
45 cases
SUCCESS

Case Results (3 total)