← Home Latent Space: The AI Engineer Podcast

⚡️The End of SWE-Bench Verified — Mia Glaese & Olivia Watkins, OpenAI Frontier Evals & Human Data

Latent Space: The AI Engineer Podcast

Published: February 23, 2026
Duration: 26:12
Summary source: description
Last updated: Jul 5, 2026

Discusses openai, evals.

Summary

Olivia Watkins (Frontier Evals team) and Mia Glaese (VP of Research at OpenAI, leading the Codex, human data, and alignment teams) discuss a new blog post (https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/) arguing that SWE-Bench Verified—long treated as a key “North Star” coding benchmark—has become saturated and highly contaminated,…

Intelligent Report

Show notes

Themes

openai
evals

openai evals

Episode on publisher's site ↗Original audio (RSS) ↗Apple Podcasts (show) ↗Official site ↗