Cover art for Latent Space: The AI Engineer Podcast

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

Latent Space: The AI Engineer Podcast

Published
June 4, 2026
Duration
1h 15m
Summary source
description
Last updated
Jun 10, 2026

Discusses evals.

Summary

The new AIEWF website is live! Get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets!Most industry benchmarks compress intelligence and reasoning ability into scores.SWE-Bench Pro, MMLU, Humanity’s Last Exam, etc. These metrics are useful, but don’t always represent the full ex…

Intelligent report

Sign in to read teasers, or upgrade to Research Pro to commission a new dossier for this episode. Learn more →

Show notes

The new AIEWF website is live! Get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets!Most industry benchmarks compress intelligence and reasoning ability into scores.SWE-Bench Pro, MMLU, Humanity’s Last Exam, etc. These metrics are useful, but don’t always represent the full extent of how a model performs in the real world. Some of the most interesting evals today look less like exams and more like operating businesse

Themes

  • evals