Cover art for The Vergecast

How to train your data

The Vergecast

Published
June 25, 2026
Duration
26:41
Summary source
description
Last updated
Jul 5, 2026

Discusses openai, anthropic, google-ai.

Summary

Training data is the raw material of the AI industry. Claude, ChatGPT, Gemini, and the rest are built on top of oceans of stuff. What is that stuff? Books. Blog posts. YouTube videos. Reddit comments. All of it and more, in virtually incomprehensible quantities. Alex Reisner, a staff writer at The Atlantic who has been investigating training data, explain…

Intelligent Report

Sign in to read teasers, or upgrade to Research Pro to commission intelligent report for this episode. Learn more →

Show notes

Training data is the raw material of the AI industry. Claude, ChatGPT, Gemini, and the rest are built on top of oceans of stuff. What is that stuff? Books. Blog posts. YouTube videos. Reddit comments. All of it and more, in virtually incomprehensible quantities. Alex Reisner, a staff writer at The Atlantic who has been investigating training data, explains how AI companies get all this data, why they'd really prefer you not know what's in it, and whether training data could ever be a fair trade.

Themes

  • openai
  • anthropic
  • google-ai
How to train your data | The Vergecast | Vagelintel