Beyond Benchmarks: Building & Assessing Generative AI Products in High-Stakes Domains

Name: Beyond%20Benchmarks:%20Building%20&%20Assessing%20Generative%20AI%20Products%20in%20High-Stakes%20Domains
Uploaded: 2025-10-01T18:45:18.370Z

Posted Oct 01, 2025 | Views 37

# AI

# Metrics

# Productivity

Zachary Lipton

Co-founder and CTO @ Abridge

Zachary Lipton is Cofounder & CTO at Abridge, the leading platform for AI-based ambient listening technology in healthcare. Abridge’s industry-leading product listens to doctor-patient conversations and ingests reams of content from the EHR, leveraging its intelligent reasoning engine to generate high-quality drafts of after-visit notes & other artifacts. This automation frees up doctors to focus on their patients. He is also the Raj Reddy Associate Professor of Machine Learning at Carnegie Mellon University, where he directs the Approximately Correct Machine Intelligence (ACMI) lab. Research focuses include the theoretical and engineering foundations of robust and adaptive machine learning algorithms, applications to both prediction and decision-making problems in clinical medicine, natural language processing, and the impact of machine learning systems on society.

A key theme in his research is to take advantage of causal structure underlying the observed data while producing algorithms that are compatible with the modern deep learning power tools that dominate practical applications. He is the founder of the Approximately Correct blog and a co-author of Dive Into Deep Learning, an interactive open-source book drafted entirely through Jupyter notebooks that has reached millions of readers. He can be found on X (@zacharylipton, GitHub (@zackchase), or his lab's website (acmilab.org).

+ Read More

Will Reed

General Partner @ Spark Capital

Will is a General Partner at Spark Capital, where he focuses on investing in emerging category leaders at the Series B and Series C. His investments at Spark include Abridge, Baseten, Scale AI, Discord, Benchling, Handshake, and Mercury. He joined Spark in 2015 shortly after the firm raised $375M for its first Growth fund, a strategy that has gone on to raise $5B in the subsequent decade. Before Spark, Will was an investor at Welsh, Carson, Anderson & Stowe, a NY-based private equity firm, and an investment banker at BofA Merrill Lynch focused on high-yield credit.

+ Read More

SUMMARY

Traditional AI lived on simple benchmarks: accuracy, precision, BLEU scores. Generative AI broke that mold. Now, outputs are open-ended, there’s no unique gold standards, and as GenAI has been industrialized, neither datasets nor evaluation suites are shared across vendors. In this session, we’ll look at the science and strategy of evaluation in this new era: how to balance human adjudication with automated metrics, how Goodhart’s Law plays out in practice, and how evaluation itself shapes product development. Drawing on examples from healthcare, we’ll show why getting evaluation right isn’t just an academic concern, it’s the foundation for building products that customers can trust.

+ Read More

Comments (0)

Popular

Watch More

The New Requirements of Technology Leaders in the Generative AI World

Posted Oct 09, 2023 | Views 528

# ELC Annual 2023

# Leadership

# Technology

# AI

# Change Management

Product Taste in the Age of AI: Building Products People Actually Love

Posted Oct 01, 2025 | Views 132

# AI

# Product Engineering

# Engineering Leadership

Eliminate Almost All Efforts on Test Automation with Breakthroughs in Generative AI

Posted Oct 09, 2023 | Views 346

# ELC Annual 2023

# AI

# Engineering Process

# Product Engineering

# Productivity

# Technology

Beyond Benchmarks: Building & Assessing Generative AI Products in High-Stakes Domains

Speakers

SUMMARY

Watch More