Khazad Dum - Delving for Data with a Text2SQL Agent

date: 2025-11-30

📅 30/11/2025

In Tolkien’s Middle-earth, the dwarves of Khazad Dum delved deep into the mountains. They were mining mithril—the most precious metal in the realm. My text2sql agent borrows that name because it does something similar: it digs into your database and surfaces the insights you’re looking for, no pickaxe (or SQL expertise) required.

The Problem

Writing SQL is a powerful skill, but doing it effectively requires deep knowledge of the underlying database structure. Which table holds customer emails—users, customers, or contacts? What’s the exact column name for order dates? These questions slow you down and create a barrier, especially for those who aren’t living in the database day-to-day.

What if you could just ask for the data you need in plain English? That’s where LLMs come in—their flexibility lets us bridge the gap between natural language and structured queries. I am running the experimentation in spider 2.0 Snowflake DB.

How Khazad Dum Works

Khazad Dum is built on a simple but flexible stack: - lisette handles LLM tool-calling - FastHTML powers a lightweight chat interface - SQLite stores conversation history

When you ask a question, the agent gets to work: it inspects the database schema, translates your request into SQL, and runs the query. If something goes wrong — a typo in a table name, a syntax hiccup — it doesn’t give up. Like the dwarves who kept mining through hard rock, the agent takes the error, learns from it, and tries again with a corrected query.

The database layer follows a clean interface pattern: connect, check if the connection is alive, extract metadata, and execute queries. Swapping from SQLite to Oracle or Snowflake just means implementing this same interface for the new dialect — the rest of the agent stays untouched.

Metadata Building Before the agent can write accurate SQL, it needs to understand the lay of the land. When Khazad Dum connects to a database, it automatically extracts metadata — table names, column names, data types, and relationships. Think of it as the agent drawing a map of the mines before it starts digging. This schema knowledge is what allows it to turn your plain English question into a query that actually runs.

Complete code is available on Khazad Dum GitHub

A Quick Example

User: “Which airport has the most departures?”

Khazad Dum gets to work. Its first attempt:

SELECT GET(PARSE_JSON("airport_name"), 'en')::VARCHAR AS "airport_name", 
       COUNT(*) AS "departure_count"
FROM AIRLINES.AIRLINES."FLIGHTS"
GROUP BY "departure_airport"
ORDER BY "departure_count" DESC
LIMIT 1

Error: invalid identifier '"airport_name"'

The agent reads the error, realizes airport_name lives in a different table, and retries with a join:

SELECT GET(PARSE_JSON("airport_name"), 'en')::VARCHAR AS "airport_name", 
       COUNT(*) AS "departure_count"
FROM AIRLINES.AIRLINES."FLIGHTS"
JOIN AIRLINES.AIRLINES."AIRPORTS_DATA" 
  ON "FLIGHTS"."departure_airport" = "AIRPORTS_DATA"."airport_code"
GROUP BY "airport_name"
ORDER BY "departure_count" DESC
LIMIT 1

Result: Domodedovo International Airport — 3,217 departures.

One plain English question, one self-correction, one answer.

What I Learned

Small models can punch above their weight. I was surprised how well a modest LLM like gpt-4o-mini handled the task — running on a Mac, it solved queries effectively. The key was clear tool definitions: when the interface is well-defined, the model knows exactly what to do and can self-correct on failures.

Metadata is crucial — and worth caching. Extracting column names and data types was straightforward, but inferring foreign key relationships required prompting the LLM. That’s a costly operation, so I added caching to avoid repeating it. Lesson learned: invest in good metadata upfront, then reuse it.

Simple tools make fast iteration possible. Both lisette and FastHTML stayed out of my way, which meant I could focus on what mattered: finding the right prompts. And that’s where the real work was — iterating quickly on prompts until the agent behaved the way I wanted.

Model fails to answer for ambiguous questions. As question get complex the model does not perform well and generalize.

What’s Next

Khazad Dum already supports multi-turn conversations — you can ask follow-up questions and refine your queries naturally. But there’s more mining to do:

More databases. The current interface works with SQLite, Oracle, and Snowflake, but I’d like to extend support to Postgres, MySQL, and beyond. The clean interface pattern should make this straightforward.

Recursive Language Models. I’m exploring recursive approaches where the agent can break complex questions into smaller sub-queries, solve them step by step, and combine the results. Think of it as the dwarves digging multiple tunnels that eventually connect.

The mines of Khazad Dum go deep — there’s plenty more mithril to uncover.