Loading run…
Keyboard: ←/→ navigate · Esc back

Question

Context files (input data)

Gold answer + script

Output (gold.csv)

Script (gold_scripts/<task_id>.py)


    

Agent runs

Gemini

Prediction
Final answer

        
Tool trace

Qwen

Prediction
Final answer

        
Tool trace

EXP overview

Accuracy breakdown

Qwen agent

Gemini agent

Tool catalog (shared by both agents)