The Ceiling Is Always the Instruction Layer

The model sets the floor. You set the ceiling.

Apr 23, 2026

Andrej Karpathy published a research automation system called autoresearch. The concept: a human writes a research objective in a file called program.md — the experiments to run, the hypotheses to test, the evaluation criteria. An agent reads it, runs bounded experiments, and loops. The human reviews the results and revises program.md. Repeat.

The coverage it received focused on the agent. The architecture that enables autonomous research. The loop that runs while you sleep.

That framing is correct. It misses what determines the output.

I’ve been running a similar architecture for three weeks in a different domain. My system extracts structured knowledge from institutional documents — grant agreements, compliance reports, policy records — and maps relationships between entities: organizations, funding sources, obligations, outcomes. An agent processes documents against a schema. A pass system evaluates the extractions. The results go into a knowledge graph. A human reviews and refines.

The loop structure is similar. In autoresearch, the instruction layer is program.md — the objectives, hypotheses, and evaluation criteria the human writes before the agent runs anything. The quality ceiling is determined by how precisely program.md encodes what “good research” means. In my system, the instruction layer is schema.py plus system prompts — entity definitions, extraction rules, edge case judgments built from real document failures. The quality ceiling is determined by how precisely the schema encodes what “relevant knowledge” means.

Same architecture. The failure modes point to the same place.

The agent is not the differentiator in either system. The agent is the processor. What differentiates the output is the instruction layer — the artifact the human wrote before the agent ran anything.

Here’s what this looks like when the ceiling fails.

In my extraction system, I processed six months of documents before I identified that the relates_to relationship type — used when a document referenced another entity but no more specific relationship applied — was accumulating at a rate that indicated a problem. Forty-seven instances. Not a model failure. A schema failure.

relates_to was underspecified. The instruction layer said: use this when no other relationship type fits. It didn’t say what “fits” meant. The agent made consistent decisions according to the schema it had. The schema had a gap. Six months of extracted information followed the same gap consistently because the instruction layer contained it.

The fix was not a better model. It was a better instruction layer: explicit enumeration of what relates_to should and shouldn’t capture, with examples drawn from real documents. The extraction quality improved immediately on the next pass. The model hadn’t changed.

In my system, improving the model didn’t move the failure rate. Changing the schema did.

A prompt is a surface. The instruction layer is what survives across prompts.

A system prompt tells the model how to behave in a session. An instruction layer encodes what good means in this domain — built up through real work, across real failures, until the operator has enough conclusions to write them down explicitly. Most system prompts are not instruction layers. Most schemas aren’t either — they describe structure without encoding what good output actually means. The format is not what makes something an instruction layer. The provenance is.

In this system, the model set the floor. The instruction layer set the ceiling.

The Honest Part

The instruction layer requires the operator to have conclusions, not just intent.

Intent: “I want the agent to extract relevant relationships.”

Conclusion: “Relevant means entity-level, decision-affecting, sourced from post-award documents only. RFP language produces zero relationship extractions. Eligibility criteria are not compliance obligations.”

The gap between those two statements is weeks of extraction work and a lot of failures. You cannot write the conclusion without having earned it.

Here’s where this argument gets uncomfortable: models can generate instruction layers. Meta-prompting systems exist. Models can evaluate their own outputs, extract patterns, and refine the artifact that governs subsequent runs. The claim “the model cannot supply it” is too strong.

What’s more accurate: a model can compile an instruction layer from outputs. It cannot derive the evaluation standard that determines whether those outputs were any good — not without a practitioner who has already developed that standard through domain work. When I let the model propose refinements without a defined evaluation standard, it optimized for frequency, not consequence — collapsing distinct cases into patterns that looked consistent but weren’t decision-relevant. This is the same boundary “The Reflection Problem” identified. Automated reflection degrades in ambiguous domains because the feedback signal the Reflector needs is exactly what automation cannot generate. A model can refine relates_to if you tell it what makes an extraction correct. It cannot tell you what makes an extraction correct in the first place.

I don’t see this holding in domains where evaluation can be fully formalized. Extraction from institutional documents — where relevance means decision-affecting, not merely mentioned — isn’t one of them. What I can say is that in this system, the quality ceiling moved when the operator’s conclusions improved, not when the model did.

It also has to be maintained. An instruction layer written in month one reflects month-one understanding. The gap between what you know and what the system knows is Compiled Thinking that hasn’t been extracted yet.

My model didn’t change across three weeks. The output did — when the schema did.

Forty-seven edge cases forced to surface. Each one narrowed what relates_to was allowed to mean — until the schema held no ambiguous cases left for the model to fill with its best guess. The instruction layer encoded the constraint.

The gap that produced those forty-seven cases doesn’t exist anymore. The model has no ambiguity left to resolve.

Instruction Layer: the accumulated encoding of what “good output” means in a specific domain, built through real work and written explicitly enough that an agent can apply it without interpretation. Distinct from a system prompt (runtime instruction surface) by provenance — an instruction layer can only be written after the operator has earned the conclusions it contains.

Robert Ford builds products, writes stories and essays, and publishes The Intelligence Engine — a Substack about building AI practices that compound. His other writing lives at Brittle Views.

The Intelligence Engine

Discussion about this post

Ready for more?