Cover art for The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Maohao Shen

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen

Published
April 8, 2025
Duration
51:45
Summary source
description
Last updated
Jun 7, 2026

Discusses llm.

Summary

Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternat…

Intelligent report

Sign in to read teasers, or upgrade to Research Pro to commission a new dossier for this episode. Learn more →

Show notes

Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the mo

Themes

  • llm
Maohao Shen: Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen | The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) | Vagelintel