Beyond the Hype: Our Experiment in Building an AI We Can Actually Trust

Content Series:

The AI Thesis

August 12, 2025

Experiment in building a trustworthy AI for tax law: We overcome "hallucination" by using curated, verifiable sources via NotebookLM and Gemini, resulting in a "grounded AI" that provides accurate, cited answers for professional services.

Listen on:

Apple Podcasts

Spotify

Youtube

Highlights/Summary

The Genesis: A Crisis of Confidence

Let's be honest. The term "AI" is saturated. It's a buzzword plastered on everything from toasters to trading platforms, often with little substance to back it up. For our team, this wasn't just marketing noise; it was a crisis of confidence. We were building complex systems, but the "black box" nature of many large language models (LLMs) left us uneasy. How could we confidently deploy a tool whose reasoning was opaque? How could we trust its outputs when we couldn't verify its sources?

This wasn't an academic debate. We needed a tool for internal knowledge management, one that could sift through thousands of pages of our own documentation, research papers, and project reports. The stakes were high; a misinterpretation or a "hallucinated" fact could lead to significant engineering setbacks. Standard LLMs were powerful, but their propensity to invent information made them a non-starter for a mission-critical knowledge base. We needed an AI we could argue with, one that would "show its work."

The Experiment: Grounding an LLM in Our Reality

Our hypothesis was simple: Trust in AI is directly proportional to its verifiability.

We decided to build a system around this principle. The goal was not to create a new foundational model, but to architect a new way of interacting with an existing one. We chose to use Google's NotebookLM, not just for its power, but for its core design philosophy: grounding.

Our Methodology:

Curated Knowledge Base: We didn't point the AI to the entire internet. We meticulously uploaded a specific corpus of documents: our internal technical specifications, project post-mortems, and a library of trusted, peer-reviewed research papers relevant to our field. This was our "source material." The AI was not allowed to learn from or cite anything outside this walled garden.
Source-of-Truth as a Mandate: Every query processed by the system had to be answered directly from the uploaded documents. More importantly, every single statement, summary, or data point generated by the AI had to be accompanied by citations. These weren't just links; they were direct quotations and references to the specific page and paragraph in the source material.
"Red Teaming" for Trust: We assembled a dedicated team whose sole purpose was to break the system's trust. They asked ambiguous questions, loaded contradictory documents, and actively looked for instances of hallucination or misinterpretation. Every failure was logged, analyzed, and used to refine the system's prompting and grounding mechanisms.

The Results: What We Learned About Trust

The outcome was a system that felt less like a magical oracle and more like an incredibly diligent, superhuman research assistant.

The End of Hallucination: By strictly limiting the AI's world to our source material, we virtually eliminated hallucinated facts. When the AI couldn't find an answer in the documents, it said so. This was a crucial feature: an admission of ignorance is infinitely more trustworthy than a confident falsehood.
Speed of Verification: The "show its work" mandate was a game-changer. A junior engineer could ask a complex question about a legacy system and get a summarized answer with five citations. They could then click on each citation, read the original context, and verify the AI's interpretation in seconds. This built confidence and dramatically accelerated the research process.
Nuance and Contradiction: One of the most surprising benefits was how the system handled nuance. When we uploaded two documents with conflicting information, the AI didn't just pick one. It would often present both viewpoints, citing each. For example: "Source A states that the system's latency is under 50ms, while Source B, from a later date, notes that post-update latency can spike to 80ms." This allowed us to see the evolution of our own knowledge.

Externalities and Unexpected Benefits

Our experiment yielded insights that went beyond our initial goal of building a trustworthy knowledge base.

A New Way of Onboarding

We discovered our AI was an exceptional onboarding tool. New hires could "converse" with our entire project history. Instead of asking a senior engineer a basic question, they could ask the AI and get a sourced, verified answer. This freed up senior staff and empowered new team members to become self-sufficient faster.

Breaking Down Language Barriers

A significant unexpected benefit emerged from the system's multilingual capabilities. Team members who are not native English speakers found they could query the knowledge base in their own language. The AI, having processed the English source material, could provide summarized, trustworthy answers in Spanish, Japanese, or French, complete with citations pointing back to the original English documents. This effectively created a verifiable bridge across language divides, making our core knowledge accessible and trustworthy for everyone, regardless of their native tongue.

Conclusion: Trust Isn't a Feature, It's an Architecture

Our journey taught us that AI trust isn't something you can sprinkle on at the end. It's not about a more "confident" sounding model. It's an architectural choice. By grounding our AI in a verifiable source of truth and demanding that it cite its work, we didn't just build a better tool; we built a new relationship with AI. One based not on blind faith, but on verifiable, transparent, and ultimately, trustworthy collaboration.

Transcription

About atQuo

atQuo is a creative partner that operates at the intersection of design, technology, and marketing strategy. Our **Insights and Talks** exist to demystify this intersection, sharing the expert knowledge required to make smarter decisions about the tools and tactics that drive growth. This same expertise fuels our services, where we execute on that strategy to build powerful digital experiences that help brands scale with clarity and confidence.

More to enjoy...

An Analysis of Syllabic Trends in SaaS Company Naming

Beyond the Hype: Our Experiment in Building an AI We Can Actually Trust

Study on elements that constitute navigation and footer blocks in eCommerce stores with a wide catalog

The USP in the landing pages: a superior value proposition

Key moments

23. Closing

About the

The AI Thesis

We believe the best way to understand AI is to build with it. The AI Thesis is our collection of real-world experiments, where our team tests a new hypothesis and shares the process, the results, and the practical lessons learned along the way.

No items found.

Listen on:

Apple Podcasts

Youtube

Spotify

About this content series:

The AI Thesis

You are reading:

Beyond the Hype: Our Experiment in Building an AI We Can Actually Trust

The Genesis: A Crisis of Confidence

The Experiment: Grounding an LLM in Our Reality

Our hypothesis was simple: Trust in AI is directly proportional to its verifiability.

Our Methodology:

Curated Knowledge Base: We didn't point the AI to the entire internet. We meticulously uploaded a specific corpus of documents: our internal technical specifications, project post-mortems, and a library of trusted, peer-reviewed research papers relevant to our field. This was our "source material." The AI was not allowed to learn from or cite anything outside this walled garden.
Source-of-Truth as a Mandate: Every query processed by the system had to be answered directly from the uploaded documents. More importantly, every single statement, summary, or data point generated by the AI had to be accompanied by citations. These weren't just links; they were direct quotations and references to the specific page and paragraph in the source material.
"Red Teaming" for Trust: We assembled a dedicated team whose sole purpose was to break the system's trust. They asked ambiguous questions, loaded contradictory documents, and actively looked for instances of hallucination or misinterpretation. Every failure was logged, analyzed, and used to refine the system's prompting and grounding mechanisms.

The Results: What We Learned About Trust

The outcome was a system that felt less like a magical oracle and more like an incredibly diligent, superhuman research assistant.

The End of Hallucination: By strictly limiting the AI's world to our source material, we virtually eliminated hallucinated facts. When the AI couldn't find an answer in the documents, it said so. This was a crucial feature: an admission of ignorance is infinitely more trustworthy than a confident falsehood.
Speed of Verification: The "show its work" mandate was a game-changer. A junior engineer could ask a complex question about a legacy system and get a summarized answer with five citations. They could then click on each citation, read the original context, and verify the AI's interpretation in seconds. This built confidence and dramatically accelerated the research process.
Nuance and Contradiction: One of the most surprising benefits was how the system handled nuance. When we uploaded two documents with conflicting information, the AI didn't just pick one. It would often present both viewpoints, citing each. For example: "Source A states that the system's latency is under 50ms, while Source B, from a later date, notes that post-update latency can spike to 80ms." This allowed us to see the evolution of our own knowledge.

Externalities and Unexpected Benefits

Our experiment yielded insights that went beyond our initial goal of building a trustworthy knowledge base.

A New Way of Onboarding

Breaking Down Language Barriers

Conclusion: Trust Isn't a Feature, It's an Architecture

A word about this series

Beyond the Hype: Our Experiment in Building an AI We Can Actually Trust

Highlights/Summary

The Genesis: A Crisis of Confidence

The Experiment: Grounding an LLM in Our Reality

The Results: What We Learned About Trust

Externalities and Unexpected Benefits

A New Way of Onboarding

Breaking Down Language Barriers

Conclusion: Trust Isn't a Feature, It's an Architecture

Transcription

About atQuo

More to enjoy...

Key moments

About the

The AI Thesis

Beyond the Hype: Our Experiment in Building an AI We Can Actually Trust

The AI Thesis

The Genesis: A Crisis of Confidence

The Experiment: Grounding an LLM in Our Reality

The Results: What We Learned About Trust

Externalities and Unexpected Benefits

A New Way of Onboarding

Breaking Down Language Barriers

Conclusion: Trust Isn't a Feature, It's an Architecture

The AI Thesis

Continue reading...

An Analysis of Syllabic Trends in SaaS Company Naming

Study on elements that constitute navigation and footer blocks in eCommerce stores with a wide catalog