CSE Colloquium – Evaluating AI Agents in the Real World: Lessons from Two Benchmarks
Presenter: Tanya Roosta, AMD Abstract: Autonomous research and web-navigation agents — OpenAI Deep Research, Gemini Deep Research Max, OpenAI Operator — are now shipping to millions of users. Yet independent […]