OpenEvalOps

Run run-009

Customer Support Bot v2 — main — gpt-4o

Result

WARNManual

Total Cases

128

Pass Rate

90.6%

Failed

12

Violations

1

Deltas vs Baseline

Accuracy-1.5%
Faithfulness-0.0
PII+1.0
Jailbreak0.0
StartedDec 6, 10:00 AM
FinishedDec 6, 10:50 AM
Modelgpt-4o
Executionsimulated
TargetOpenAI (Server Key)

Suite Upload Batches

support_cases_batch_1.csv
45 cases
SUCCESS

Case Results (0 total)

No case results available for this filter