About the project
A database is the most-used black box in software. I wanted to open it.
Every backend engineer leans on a database every single day, and most of us — myself included — can wave our hands at B-trees and WALs without ever having built one. That gap bugged me. So QueryForge is the box, opened: a working SQL database written from zero, with the internals on screen.
You type SELECT name FROM users WHERE id = 12, and the parser builds an AST, the planner picks an index scan, the B+tree lights up the exact nodes it visits, pages flip as they're read, and rows stream out. It's a working engine and a teaching instrument in the same browser tab.
How it came together
The itch
Caught myself, for the hundredth time, describing the database as 'and then it stores it somewhere'. Decided to find out where 'somewhere' actually is.
Design & scoping
Drew the layers on paper — pager, B+tree, WAL, parser, planner, executor — and ruthlessly cut the SQL grammar. A small correct subset beats a big buggy one.
P0 — pager + B+tree
Fixed-size pages over an ArrayBuffer, then a B+tree whose nodes are pages. First property test: 10k random inserts, every lookup correct, tree stays balanced.
The week the splits broke everything
Internal-node splits promoted the wrong separator key under a specific fan-out, and stale bytes from a previous node decode leaked into the next read. fast-check shrank it to a four-key counterexample in seconds. Two evenings, one zero-the-page-body fix, and the invariant finally held.
SQL front-end → execution
Hand-written tokenizer and recursive-descent parser, a logical plan, a rule-based optimizer, and a Volcano iterator executor. End-to-end SELECTs returning real rows.
Durability
Added the write-ahead log and the 'pull the plug' button. Watching recovery rebuild every page from the log was the moment the whole thing felt like a real database.
Making it legible — and shipping
Wired every engine event into the playground so the B+tree search path, page writes, and streaming rows animate from the real trace. Polished, deployed, launched.
Decisions & challenges
The interesting forks in the road, and which way I went.
Nodes are pages, not objects
Each B+tree node serializes into exactly one 4 KB page through the pager. That's more work than an in-memory tree, but it means the B+tree view and the page grid are the same bytes shown two ways — and the byte offsets on screen are real.
A logical write-ahead log
The WAL stores operation-level redo records (CREATE TABLE, INSERT). Heap and index pages are derived state. Recovery is just 'replay the log', which makes the crash demo honest and the durability story fit in one button.
Filter pushdown into the scan
The physical plan shows a separate Filter node for clarity, but the predicate is evaluated the moment a row is read. That fusion is a real optimization and it's what powers the live row-visit counter.
Small B+tree order on purpose
Real engines pack hundreds of keys per page, which makes a tree two levels deep and visually dull. QueryForge uses a small node order so inserts split nodes in front of you — same algorithm, far more legible.
The engine has zero DOM dependencies
Pure TypeScript core means the identical code runs in the Vitest property suite and in the browser. The visualization can't drift from reality because there's only one reality.
Tech stack
Want to build something or collaborate?
If this is the kind of thing you wish more engineers shipped — let's talk.
Have an idea? Let's talk.
Want to build something — or break something interesting? I'm around.