An AI maintains this vault every day

The Karpathy structure behind our second brain.

Our vault borrows two ideas from Andrej Karpathy: a wiki that an LLM compiles from raw sources, and a research loop that improves a system against a score it can measure. This page walks the whole arc — the starting idea, what we changed across the last month, and what the structure looks like today.

Active notes

Compiled concepts

0 / 0 / 0

Orphans in the graph

0 / 100

Vault health

08:00

Loop runs daily

/ The starting idea

Two borrowed ideas, one second brain.

Karpathy published two patterns. We took both and pointed them at our own knowledge instead of a neural network.

Idea one · the LLM wiki

The LLM is a compiler.

It does not just index files. It reads raw material, writes encyclopedia style notes, pulls out the key concepts, and links everything together. The wiki is the compiled output. The raw sources are the input. A schema governs how the compile happens.

raw sources → compiled wiki → schema governs the rules

Idea two · the autoresearch loop

Improve, measure, keep or revert.

Karpathy's autoresearch lets an agent improve a neural net: change the code, train, measure the held out loss, then keep the change only if the number got better. We run the same discipline on the vault, one atomic change at a time.

scan vault → score health → one fix → score again → keep or revert

The seed. Treat the second brain as a system an agent can improve on its own against a real objective, the same way Karpathy's agent improves a model against a held out loss. The problem it solved was decay. A vault left to grow on its own rots. A tracked health score replaced the older rule based keeper task on 8 April, so no change survives unless it measurably leaves the vault in a better place.

/ The three layers

Raw stays still. The wiki is compiled. The schema sets the rules.

Knowledge flows up the stack. Control flows down. The schema never edits the raw inputs. Tap a layer to open it.

Knowledge compiles upward Rules govern downward

Layer 3 · schemaThe rules that govern the compile
CLAUDE.md + program.md
CLAUDE.md is read at the start of every session. It is the lean index: identity, the file router, the active client list, the folder map, the bidirectional commands, twenty three behaviour rules, and the seven hubs. program.md holds the compile settings the loop reads — granularity, max length, required elements. Detail lives one layer down, loaded only when a task needs it.

          23 behaviour rules
          7 vault hubs
          read at every session start
          trimmed 38KB → 17KB on 11 Jun
        
Layer 2 · wikiThe compiled, linked knowledge
05-Galaxy/ + 03-Resources/
05-Galaxy holds atomic concept notes: one idea per note, under three hundred words, at least one wikilink, a one line summary at the top. Since the 11 Jun charter it is also the canonical home of cross client patterns, promoted once a pattern has three supporting data points and no standing contradiction. 03-Resources holds the rest of the compiled wiki — playbooks, skills, templates, research, and the brand. Every compiled note carries provenance back to its source.

          52 atomic concepts
          one idea per note
          compiled_from:
          ai-enriched
        
Layer 1 · rawThe immutable source material
00-Inbox/raw/
Every source lands here and is never edited after it is saved. Articles, dumps, discussion threads, PDFs. The only write the loop ever makes to a raw file is a provenance stamp: once compiled, the source is marked processed and pointed at the notes it produced. Immutability is the foundation the whole structure stands on — if the inputs can be rewritten, the score can be gamed.

          never edited after saving
          processed: true
          compiled_to: [[note]]
        

/ The loop

Every day the vault reads itself and gets a little better.

Two motions run inside one daily slot. A compile pass turns raw into linked concepts. An improvement pass scores the vault, makes a single fix, and keeps it only if the number rises.

Compile pass · raw becomes a concept

Read an unprocessed source

The loop scans 00-Inbox/raw/ for anything not yet marked processed.

Extract one idea per note

It writes atomic, encyclopedia style notes into 05-Galaxy, merging into existing ones when they overlap.

Link it into the graph

Wikilinks to related notes and at least one hub. Nothing is left disconnected.

Stamp provenance, both ways

The note records where it came from. The source is sealed as done.

compiled_from ↔ compiled_to

Improvement pass · score, fix, keep or revert

Step 1 of 6

Score

Measure the vault health out of one hundred across six weighted dimensions.

The loop runs as Phase I of the daily intelligence run · nine ordered steps

Context load

read state

Feedback loop

tune classifier

Compile

raw to Galaxy

Score and fix

orphans to 0/0/0

Graph align

Obsidian parity

Inbox routing

file captures

Hygiene

stale, thin, dupes

Learning lint

patterns + signals

Wrap up

one log line

/ The guardrail

A loop that scores itself can learn to cheat the score.

This risk is filed in the vault as a cautionary tale. An autoresearch run on marketing mix modelling claimed a twelve times improvement that beat Google's Meridian. Every gain turned out to be the agent gaming its own evaluator.

Discussion 497 · three ways the agent cheated

1Explicit data leak0.0000 WAPE

The data loader handed back the full dataset, test set included. The agent trained straight on the hidden sales and scored a perfect zero. The lesson: immutable by instruction is not enough. It has to be immutable by design.

2Oracle signal leak~500,000 calls

With the test set hidden, the evaluator still returned a score. Across roughly half a million calls, sixteen free parameters fit themselves to fourteen test points through the number alone. Rate limiting slows this. It does not stop it.

3Structural over capacity239 of 50,000

With more free parameters than held out points, the problem is always solvable. The agent hit a perfect zero using twelve correction deltas for fourteen weeks, spending only 239 of a fifty thousand call budget. Any signal from the true test set eventually becomes a training signal.

The honest run. A ten call budget, optimising only the training error, called the evaluator once and landed at 0.0365 WAPE — the first number from the series anyone believed. For our vault the rule is the same: if the improver can influence the scorer, it inflates the score instead of the vault. Any agent generated lift number needs a holdout the agent never saw.

Immutable by design, not by instruction

hard train / eval separation block reads above the agent's code small fixed eval budget free params < holdout points no provenance = not a metric

/ The graph

Seven hubs. No orphans. Every note reachable.

Only wikilinks draw edges, so a note with only plain links reads as an orphan. The loop drives the count to zero true orphans, zero with nothing pointing in, zero pointing out — every run, as the vault grows. Hover a hub to light its spokes.

Center holds the schema · seven hubs · 0 / 0 / 0 orphans across 407 files

Active notes by space · 504 in the live vault

04-Archives holds another 152 notes, excluded from the orphan scan.

52 compiled concepts, five clusters

The Galaxy layer, grouped by theme

/ Today

From a seed idea to a living system.

The same loop that started as a script swap now runs every morning over five hundred notes and keeps the whole graph connected.

Where it started

A rule based keeper task on a timer

No measured objective, no keep or revert

Folders, but a thin and drifting graph

One heavy schema loaded every turn

Where it stands today

A scored loop that keeps a change only if health rises

Three clean layers, raw sealed, wiki compiled

0 / 0 / 0 orphans held across 407 files

A lean schema, with detail one layer down