
Kelly Hong
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Generative Benchmarking with Kelly Hong
- Published
- April 23, 2025
- Duration
- 54:17
- Summary source
- description
- Last updated
- Jun 28, 2026
Discusses rag, embeddings.
Summary
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks of…
Intelligent Report
Sign in to read teasers, or upgrade to Research Pro to commission intelligent report for this episode. Learn more →
Show notes
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on rele
Themes
- rag
- embeddings