Cover art for The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Kelly Hong

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Generative Benchmarking with Kelly Hong

Published
April 23, 2025
Duration
54:17
Summary source
description
Last updated
Jun 28, 2026

Discusses rag, embeddings.

Summary

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks of…

Intelligent Report

Sign in to read teasers, or upgrade to Research Pro to commission intelligent report for this episode. Learn more →

Show notes

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on rele

Themes

  • rag
  • embeddings