ELC
+00:00 GMT
Sign in or Join the community to continue

Beyond Benchmarks: Building & Assessing Generative AI Products in High-Stakes Domains

Posted Oct 01, 2025 | Views 9
# AI
# Metrics
# Productivity
Share

speakers

user's Avatar
Zachary Lipton
Co-founder and CTO @ Abridge

Zachary Lipton is Cofounder & CTO at Abridge, the leading platform for AI-based ambient listening technology in healthcare. Abridge’s industry-leading product listens to doctor-patient conversations and ingests reams of content from the EHR, leveraging its intelligent reasoning engine to generate high-quality drafts of after-visit notes & other artifacts. This automation frees up doctors to focus on their patients. He is also the Raj Reddy Associate Professor of Machine Learning at Carnegie Mellon University, where he directs the Approximately Correct Machine Intelligence (ACMI) lab. Research focuses include the theoretical and engineering foundations of robust and adaptive machine learning algorithms, applications to both prediction and decision-making problems in clinical medicine, natural language processing, and the impact of machine learning systems on society.

A key theme in his research is to take advantage of causal structure underlying the observed data while producing algorithms that are compatible with the modern deep learning power tools that dominate practical applications. He is the founder of the Approximately Correct blog and a co-author of Dive Into Deep Learning, an interactive open-source book drafted entirely through Jupyter notebooks that has reached millions of readers. He can be found on X (@zacharylipton, GitHub (@zackchase), or his lab's website (acmilab.org).

+ Read More
user's Avatar
Will Reed
General Partner @ Spark Capital

Will is a General Partner at Spark Capital, where he focuses on investing in emerging category leaders at the Series B and Series C. His investments at Spark include Abridge, Baseten, Scale AI, Discord, Benchling, Handshake, and Mercury. He joined Spark in 2015 shortly after the firm raised $375M for its first Growth fund, a strategy that has gone on to raise $5B in the subsequent decade. Before Spark, Will was an investor at Welsh, Carson, Anderson & Stowe, a NY-based private equity firm, and an investment banker at BofA Merrill Lynch focused on high-yield credit.

+ Read More

SUMMARY

Traditional AI lived on simple benchmarks: accuracy, precision, BLEU scores. Generative AI broke that mold. Now, outputs are open-ended, there’s no unique gold standards, and as GenAI has been industrialized, neither datasets nor evaluation suites are shared across vendors. In this session, we’ll look at the science and strategy of evaluation in this new era: how to balance human adjudication with automated metrics, how Goodhart’s Law plays out in practice, and how evaluation itself shapes product development. Drawing on examples from healthcare, we’ll show why getting evaluation right isn’t just an academic concern, it’s the foundation for building products that customers can trust.

+ Read More
Comments (0)
Popular
avatar


Watch More

The New Requirements of Technology Leaders in the Generative AI World
Posted Oct 09, 2023 | Views 518
# ELC Annual 2023
# Leadership
# Technology
# AI
# Change Management
Eliminate Almost All Efforts on Test Automation with Breakthroughs in Generative AI
Posted Oct 09, 2023 | Views 337
# ELC Annual 2023
# AI
# Engineering Process
# Product Engineering
# Productivity
# Technology
Terms of Service