Beyond Velocity: 5 Surprising Truths About Measuring AI's Impact in Engineering

# AI

# Velocity

# Engineering Leadership

# Roundtable

Insights from our ELC Annual roundtable on AI observability dashboards for executives

November 18, 2025

Priyanka Halder

At this year’s ELC Annual, I had the privilege of hosting a closed-door roundtable with engineering, platform, AI, and quality leaders who are all wrestling with the same tension:

“We’re spending aggressively on AI. Why can’t our dashboards prove the ROI?”

Most organizations are still using metrics designed for human-only workflows—DORA, SPACE, story points, commit counts. These were never meant to track AI-augmented systems and can be easily inflated by synthetic output, masking risk as progress.

This post captures key insights from that roundtable conversation and offers a practical lens for anyone designing AI observability dashboards for executives. A huge thank you to the ELC community leaders and attendees in the session whose candid stories, failures, and experiments shaped the themes below.

1. We’re Chasing Velocity, But We Need Fluency

The room aligned quickly on one uncomfortable truth: most current AI dashboards are obsessed with activity—how many prompts, how many suggestions accepted, how many lines of AI-generated code.

These are motion metrics, not impact metrics.

What we actually need to measure is AI Fluency: an organization’s ability to reason about model behavior, interpret outputs, challenge AI confidently, and turn insights into reliable outcomes. Signs of AI Fluency include:

Reduction in rework and defect leakage from AI-assisted changes.

Faster delivery of meaningful features, not just tasks closed.

Decrease in manual review time for repetitive decisions.

Teams knowing when not to trust AI—and having patterns for escalation.

Executives don’t just need a usage graph. They need to see:

“Do my teams understand what AI cannot yet explain?”

A fluent organization is one where AI amplifies judgment, not just speed.

2. We’re Stuck in the “Faster Horse” Trap

Multiple leaders admitted their first wave of AI adoption looked impressive on paper—but shallow in reality. AI was layered on top of old workflows: faster tickets, faster code, faster documentation. A classic “faster horse” moment.

The strategic value of AI isn’t doing the same work faster; it’s enabling work that was previously impossible or impractical. Examples discussed in the room included:

Reimagining incident response with predictive signals instead of reactive fire drills.

Shifting from “write requirements → write tests → write code” to intent-first, AI-orchestrated flows.

Building internal copilots that reason across product, ops, legal, and compliance data to shape decisions in real time.

Your AI dashboard should make this distinction obvious. It should help answer:

Where are we merely accelerating legacy workflows?

Where have we unlocked net-new capabilities, revenue, resilience, or customer outcomes?

If your metrics can’t tell the difference, you’re still riding the horse.

3. Quality Engineering is the Canary in the Coal Mine

The most emotional parts of our roundtable came from quality leaders. AI-generated code and content are flooding pipelines. PR queues are spiking. Expectations are rising. Human review capacity is not.

“Humanly impossible” wasn’t a dramatic line—it was the shared lived reality.

This is where irresponsible velocity shows up first. When AI output scales and your verification model doesn’t, risk compounds silently. AI reviewing AI without rigor or visibility can create a closed loop of confidently wrong decisions.

An executive-grade AI observability dashboard must illuminate:

Where AI-generated changes are concentrated (services, domains, teams).

Defect density and incident correlation for AI-assisted vs non-AI changes.

Gaps in review coverage—where neither humans nor AI are effectively validating.

The impact of AI-led changes on security, compliance, and reliability posture.

Fast isn’t progress if you’ve just moved the blast radius downstream.

4. A Counter-Intuitive Fix: Make AIs Argue with Each Other

One of my favorite approaches shared at ELC Annual was adversarial AI review. Instead of trusting a single model, teams:

Use multiple LLMs to cross-review code, tests, policies, or architecture.

Compare where models disagree or flag uncertainty.

Elevate those divergence zones to humans as priority review hotspots.

Because different models fail differently, this creates an AI-native form of peer review. It reduces blind spots, surfaces subtle issues, and gives executives a richer trust signal than “Model said OK.”

On an observability dashboard, this can look like:

Agreement scores between models.

Heatmaps of high-divergence areas in the codebase or decision flows.

Trends where adversarial review reduced incidents or regressions.

This is how we move from blind acceleration to instrumented, explainable speed.

5. Soon, Tracking “AI Usage” Will Be Obsolete

A key consensus from the discussion: “AI usage” is a transitional metric.

As AI becomes ambient—from design to deployment—asking “Did AI touch this?” will sound as odd as asking “Did the internet touch this feature?”

Instead of celebrating number of prompts or seats, executives should be looking at:

How AI shortens iteration loops from idea → impact.

How AI changes incident patterns, MTTR, and customer experience.

Where AI enables entirely new product capabilities or service models.

Whether AI-driven work holds up under compliance, reliability, and security scrutiny.

The mature question is no longer “Are we using AI?” It’s “Is AI meaningfully upgrading how we build, operate, and decide?”

The Emerging Nerve Center: AI Observability Platforms

In our roundtable, we also talked about platforms that can turn this philosophy into something leaders can actually see. Modern AI observability platforms—such as Arize, D2, and others in this evolving ecosystem—are becoming the nerve center for AI-augmented engineering.

The most useful capabilities we discussed included:

Monitoring model behavior, data drift, prompt drift, and anomalies across environments.

Tracing AI-assisted actions (like code changes or decisions) to business outcomes, incidents, and user impact.

Detecting where AI is amplifying risk—hallucinations, bias, insecure patterns, or unreviewed automation.

Providing executive-ready narratives, not just charts: “Here’s what changed, here’s why, here’s the recommended action.”

When integrated into your SDLC, quality gates, and production stack, these platforms help you move from “trust me, AI is helping” to “here is the evidence.”

Conclusion: From Monitoring Machines to Elevating Judgment

The ELC Annual roundtable reinforced one core idea I deeply believe in:

An AI dashboard that only tracks machines is table stakes. An AI dashboard that upgrades human judgment is a strategic advantage.

By shifting from velocity to fluency, from faster horses to new highways, from invisible risk to observable signals, we give executives what they actually need: clarity, not theater.

As you evolve your own AI observability strategy, ask yourself:

Are we measuring real capability, or just prettier motion?

Are our dashboards courageous enough to show where AI is hurting us, not just helping?

Are we still decorating old metrics, or finally instrumenting the future?

Because the organizations that win won’t just adopt AI faster. They’ll be the ones who can see it, question it, govern it, and grow with it—in public, on the dashboard, with nothing to hide.

Comments (0)

Popular