
Maohao Shen
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen
- Published
- April 8, 2025
- Duration
- 51:45
- Summary source
- description
- Last updated
- Jun 7, 2026
Discusses llm.
Summary
Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternat…
Intelligent report
Sign in to read teasers, or upgrade to Research Pro to commission a new dossier for this episode. Learn more →
Show notes
Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the mo
Themes
- llm