Cover art for Latent Space: The AI Engineer Podcast

[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI

Latent Space: The AI Engineer Podcast

Published
December 31, 2025
Duration
27:33
Summary source
description
Last updated
May 31, 2026

Discusses openai, safety-alignment.

Summary

From pre-training data curation to shipping GPT-4o, o1, o3, and now GPT-5 thinking and the shopping model, Josh McGrath has lived through the full arc of OpenAI's post-training evolution—from the PPO vs DPO debates of 2023 to today's RLVR era, where the real innovation isn't optimization methods but data quality, signal trust, and token efficiency. We sat…

Intelligent report

Sign in to read teasers, or upgrade to Research Pro to commission a new dossier for this episode. Learn more →

Show notes

From pre-training data curation to shipping GPT-4o, o1, o3, and now GPT-5 thinking and the shopping model, Josh McGrath has lived through the full arc of OpenAI's post-training evolution—from the PPO vs DPO debates of 2023 to today's RLVR era, where the real innovation isn't optimization methods but data quality, signal trust, and token efficiency. We sat down with Josh at NeurIPS 2025 to dig into the state of post-training heading into 2026: why RLHF and RLVR are both just policy gradient metho

Themes

  • openai
  • safety-alignment
[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI | Latent Space: The AI Engineer Podcast | Vagelintel