Your Agent Doesn't Know What It Doesn't Know—with Heather Lutz (Datasite)

About the episode

If you plug an AI agent into your data, how do you know it's giving you the right answer—and not just a confident one?

In this episode of Futureproof, Prakash Chandran sits down with Heather Lutz, Director of Engineering at Datasite, the provider of AI-powered solutions that enable private market investment, including virtual data rooms for mergers and acquisitions. Together, they unpack what happens when you point an agent at a massive data set without doing the foundational work first, why data readiness and governance are non-negotiable prerequisites for any AI initiative, and how Datasite is layering semantic views, verified queries, skills, and a "data doorman" to make agents actually useful. 

About Datasite

Datasite provides the infrastructure that enables information flow for private market transactions, with purpose-built tools to optimize outcomes. Datasite’s innovative product portfolio, spanning sell-side virtual data rooms, buy-side intelligence, agentic AI applications, and an open data infrastructure layer, drives execution across the full investment lifecycle while generating unique data insights to empower investors, advisors, and deal professionals worldwide. Trusted by top private equity firms, investment banks, and consultancies, Datasite is built on 26 years of enterprise-grade security, compliance, and reliability. For more information, visit www.datasite.com 


Topics covered include:

  • Agents are confident interns, not seasoned analysts: Why an AI agent querying your data won't know about data quality issues, duplicate revenue tables, or missing filters—and why confidence without context is worse than no answer at all.
  • Data readiness as CI/CD: Why testing data should follow the same discipline as testing software—with checks at every stage of the pipeline—and why continuous data quality monitoring barely exists as a standard practice yet.
  • Data governance makes agents work: How domain ownership, shared metric definitions, and semantic layers turn an unnavigable ocean of tables into a surface an agent can actually be expert on.
  • Don't work with the ocean: Why starting with your top ten metrics, your most important structures, and a bounded consumable layer is the only practical path to making AI-over-data work at scale.
Chapters

00:00

Meet Heather Lutz
Prakash introduces Heather's unconventional path from English teacher to engineering leader, and the core tension of the episode: the foundation most teams are skipping in the rush to plug in AI.

02:03

From English Teacher to Data Infrastructure Lead
Heather traces her journey from teaching in Japan to a coding bootcamp, getting hired as a junior engineer, raising her hand for MongoDB on day one, and building out Datasite's entire data stack over the next decade.

05:35

What Happens When You Point an Agent at Your Data
A discussion on why the old model—analysts who understood the nuances and worked around data quality issues—provided a human safety net that AI agents don't have, and what goes wrong when a business user queries raw data through an MCP layer.

08:50

The 22,000-Table Experiment
Heather describes what happened when they gave a business user agent access to their analyst layer in Snowflake—the agent gave a confident answer, but it wasn't even close, because filters weren't applied and terminology didn't match column names.

10:50

Segmenting Agent Access by Domain
She explains how to shrink the data surface so agents can actually be useful—curating by domain and role, building a governed consumable layer, adding semantic views, and treating your data product like an API.

14:45

Data Readiness: Test Your Data Like Software
Heather makes the case for CI/CD discipline applied to data—null checks, enum validation, join range expectations—and why continuous monitoring over time catches the drift that landing-zone tests miss.

18:15

Why Data Testing Isn't a Standard Yet
A discussion on why data engineering still lacks the test-driven development culture that software engineering takes for granted, and why AI accessing your data makes testing non-optional.

21:55

Data Governance: Definitions, Ownership, and Lineage
Heather defines governance as making data accessible, understandable, and consistent across the organization—starting with agreeing on what your metrics actually mean and tracking where the data comes from.

24:45

The Layered Agent Architecture
A walkthrough of Datasite's approach: semantic views as the foundation, verified queries to train agent accuracy, skills for business user context and synonyms, and a data doorman that routes users to the right agent, data set, or dashboard.

28:10

The Data Doorman and Cross-Functional Routing
Heather explains how the doorman concept directs users to the right source—Snowflake, Pendo, the operational data store, or an existing dashboard—and why different agents serve different domains with role-based access layered on top.

33:05

Where to Start If You Have Legacy Data
Practical advice for organizations with years of accumulated data: start by knowing what you're actually asking for, define your top metrics, build a bounded consumable layer, add basic monitoring and testing, and don't try to govern the ocean all at once.

38:30

Garbage In, Garbage Out Still Applies
They discuss why data is having a renaissance—everyone now agrees the foundation has to be right—and why Heather is most excited about organizations finally investing in the governance and cleanliness work that makes real clarity possible.

42:35

Communities of Practice and Closing Advice
Heather shares how Datasite uses a centralized data governance department, a data management office, and cross-team communities of practice to share experiments and set standards—and closes with advice to experiment in a sandbox, not in production.

Hosted by
Prakash Chandran
Prakash Chandran
CEO, Xano

Listen on any platform

Get all episodes of Futureproof on your favorite platform.