Coding agents are confident, fast, and architecturally clueless about your codebase until you tell them otherwise. Hand Claude Code or Codex a vague ticket and it will happily write code that compiles, passes a quick test, and quietly violates the boundary between your domain layer and your HTTP layer. The agent didn’t read your design docs. It read the files it could see, pattern-matched, and guessed. A DESIGN.md file fixes the guessing problem by writing your architectural intent down in the one place an agent always looks: the repository itself.
TL;DR
DESIGN.md is a community-convention repo file that records a codebase’s architectural intent, constraints, and design decisions in plain Markdown so coding agents (Claude Code, Codex, Cursor) generate code that fits the system instead of fighting it. It answers “why is the code shaped this way,” where AGENTS.md answers “how do I build and test.”
Introduction
Here’s the failure mode every team adopting coding agents hits within a week. You ask an agent to add a refund endpoint to a payments service. It returns a working handler that calls the database directly from the controller, swallows the gateway error, and invents a new money type because it didn’t notice you already had one. The diff is clean. The tests pass. It’s also wrong in three ways that only a reviewer who knows the architecture can catch. The agent isn’t bad at coding; it’s blind to decisions that live in your head, in a Notion page, or in a Slack thread from eight months ago.
DESIGN.md is the answer a growing number of teams have converged on. It’s a single Markdown file, committed to the repo root, that tells any agent the load-bearing facts about your system: the layering rules, the invariants that must never break, the patterns you chose on purpose, and the ones you rejected. It’s not a vendor spec and there’s no committee that owns it; it’s a convention, the same way ARCHITECTURE.md and CONTRIBUTING.md are conventions. But it pairs naturally with the tool-specific instruction files agents already read, and for API and backend work it’s one of the highest-leverage documents you can write.
What DESIGN.md actually is
DESIGN.md is a plain-text record of why your code looks the way it does. Not what it does (that’s the README), not how to run it (that’s AGENTS.md), but the reasoning a senior engineer would walk a new hire through on day one before letting them touch anything important.
Think about the conversations that aren’t in any file. “We don’t call the payment gateway from the request thread; everything goes through the outbox table because the gateway times out under load.” “Money is always an integer count of minor units; we banned floats after the rounding incident.” “The Account aggregate owns balance mutations; nothing else writes to the ledger.” Those are design decisions. They’re invisible to an agent reading source code, because the source shows the result of the decision, not the decision or its rationale. An agent can see that Account.debit() exists. It cannot see that you deliberately made it the only write path, so it will cheerfully add a second one.
The convention has roots in older, well-established practices. The ARCHITECTURE.md pattern (popularized by Alex Kladov’s widely-cited write-up) argues a repo should carry a high-level map of the codebase that explains structure and invariants without trying to stay synced line-by-line with the code. Architecture Decision Records (ADRs) capture individual decisions and their rationale over time. DESIGN.md is what you get when you write that kind of document for an audience that includes a coding agent: terse, declarative, decision-oriented, and parked where the agent will actually load it.
Two properties make it work. It’s in the repo, so the agent reads it with the same tools it reads code; you don’t need a plugin or an API call. And it’s about intent, so it stays useful even as files move around. Rename a package and your README screenshots rot; the rule “domain logic never imports the web framework” is still true.
DESIGN.md vs AGENTS.md vs CLAUDE.md vs README
These files overlap enough to confuse people and differ enough that collapsing them into one is a mistake. The short version: README is for humans onboarding, AGENTS.md is the operational contract for agents, CLAUDE.md is the Claude-specific instruction file, and DESIGN.md is the architectural reasoning all of them benefit from.
AGENTS.md is now a real, broadly-adopted format; the agents.md project describes it as “a simple, open format for guiding coding agents,” used across tens of thousands of projects and stewarded under the Linux Foundation’s Agentic AI Foundation. Its job is operational: build steps, test commands, code style, commit conventions, the stuff you’d tell a new teammate to keep them unblocked. Per Anthropic’s Claude Code memory documentation, CLAUDE.md plays the same instruction role for Claude specifically; the docs even recommend that if you already have an AGENTS.md, you create a CLAUDE.md that imports it with @AGENTS.md so both tools read one source of truth.
Notice what’s missing from those descriptions: deep architectural rationale. AGENTS.md and CLAUDE.md are tuned to be short. The Claude Code docs explicitly recommend keeping CLAUDE.md under 200 lines because longer files consume context and reduce how reliably the model follows them. A real architecture explanation, the boundaries, the invariants, the rejected alternatives, the data model rules, won’t fit there without bloating it. So you reference it instead. DESIGN.md becomes the deep document; AGENTS.md / CLAUDE.md point at it with a single line.
| File | Audience | Answers | Lifespan / change rate | Length |
|---|---|---|---|---|
README.md |
Humans (users, new contributors) | What is this, how do I start it | Changes with features | Medium |
AGENTS.md |
Any coding agent | How do I build, test, lint, commit here | Changes with tooling | Short (operational) |
CLAUDE.md |
Claude Code specifically | Same as AGENTS.md, plus Claude-specific rules | Changes with tooling | Short (under ~200 lines) |
DESIGN.md |
Agents + engineers + reviewers | Why is the system shaped this way; what must never break | Changes with architecture (rarely) | Medium, decision-dense |
The relationship is complementary, not competitive. A clean setup for a Claude + Codex shop looks like this: README.md for humans; one AGENTS.md with build/test/style; a CLAUDE.md that’s just @AGENTS.md plus two Claude-only lines; and DESIGN.md holding the architecture, linked from AGENTS.md so every agent loads it on demand. No duplication, each file has one job. If you want a deeper tour of structuring Claude’s context across these files, Claude Code workflows walks through the memory model in practice.
What to put in DESIGN.md (with a template)
DESIGN.md should answer the questions an agent can’t infer from code: the shape of the system, the rules that don’t show up in any single file, and the decisions you made on purpose. Keep it declarative. Every section should read like a rule a reviewer would enforce, not an essay.
Cover these:
- System shape: the layers or modules and which direction dependencies flow. One sentence per boundary.
- Invariants: things that must always be true. State them as absolutes. “Balances never go negative outside an authorized overdraft.” “Every external call is idempotent by request key.”
- Key decisions and their rationale: the choices that look arbitrary until you know why. Include the why; the rationale is what stops an agent from “fixing” it.
- Rejected alternatives: what you deliberately did not do, so an agent doesn’t propose it as a fresh idea. This single section prevents a huge class of bad suggestions.
- Data and domain rules: money representation, time/timezone handling, identifiers, soft-delete, multi-tenancy.
- The API contract source of truth: where the OpenAPI spec lives and the rule that it’s authoritative over hand-written types.
- Where new code goes: a short “if you’re adding X, it belongs in Y” map so agents stop scattering logic.
- Out of scope / do not touch: generated files, legacy modules under migration, anything an agent should leave alone.
Here’s a full template, written for a realistic payments API service. Copy it, delete what doesn’t apply, fill in the rest.
# DESIGN.md: Payments API Service
This file records architectural intent and the decisions behind it.
Read this before generating or modifying code. If a change conflicts
with a rule here, stop and flag it instead of working around it.
## System shape
Layered, dependencies point inward only:
http (handlers, DTOs) -> app (use cases) -> domain (entities,
invariants) <- infra (db, gateway clients)
- `domain/` has zero imports from `http/`, `app/`, or any framework.
- `infra/` implements interfaces declared in `domain/` or `app/`.
- `http/` never touches the database or the payment gateway directly.
It calls a use case in `app/`.
## Invariants (must always hold)
- A ledger entry is immutable once written. Corrections are new
compensating entries, never updates or deletes.
- Account balance is derived from ledger entries, not stored as a
mutable field that code can set directly.
- Money is an integer count of minor units (cents) plus an ISO-4217
currency code. Never a float. Never mix currencies in one operation.
- Every call to an external payment gateway is idempotent, keyed by
`idempotency_key`. Retries must not double-charge.
- Balances never go negative unless an explicit `OverdraftPolicy`
authorizes it for that account.
## Key decisions and rationale
- **Outbox pattern for gateway calls.** Handlers write an intent row
in the same DB transaction as the business change, then a worker
calls the gateway. Rationale: the gateway times out under load;
doing it inline made request latency and failure handling unowned.
Do not call the gateway from a request handler.
- **Single write path per aggregate.** Only `Account.post_entry()`
writes to the ledger. Rationale: a second write path caused the
Mar-2025 balance drift. Add new behavior as methods on the
aggregate, not new queries.
- **Event sourcing for the ledger only.** The rest of the system is
CRUD. Rationale: we need a perfect audit trail for money and
nothing else, and full event sourcing was too costly elsewhere.
## Rejected alternatives (do not reintroduce)
- ORM lazy-loading across aggregates; caused N+1s and unclear
transaction boundaries. Repositories return fully-loaded aggregates.
- Storing balance as a column updated in place; see balance drift
incident. Balance is always derived.
- A generic `Money` library pulled from the registry; we have our
own `domain/money.py`; use it.
- Synchronous webhooks to merchants from the request thread; they
block and fail silently. Use the notification queue.
## Data and domain rules
- All timestamps are UTC, stored as timestamptz, formatted RFC 3339
at the edge. No naive datetimes cross a function boundary.
- IDs are ULIDs generated in the app layer, never DB autoincrement.
- Soft delete is not used. Records are either active or moved to an
archive table by an explicit use case.
- Multi-tenant: every query is scoped by `tenant_id`. A repository
method without a tenant scope is a bug.
## API contract source of truth
- The OpenAPI 3.1 spec in `api/openapi.yaml` is authoritative.
Request/response types are generated from it; do not hand-edit the
generated types in `http/generated/`.
- New or changed endpoints: update `api/openapi.yaml` first, then
regenerate, then implement. The spec is designed and reviewed in
Apidog before code changes.
- Error responses follow RFC 9457 (problem+json). Use the shared
`problem()` helper; do not invent ad-hoc error shapes.
## Where new code goes
- New endpoint: route in `http/routes/`, DTO in `http/dto/`, use
case in `app/usecases/`, domain logic in `domain/`.
- New external integration: client in `infra/clients/`, interface
in `app/ports/`.
- Cross-cutting concern (auth, logging, idempotency): middleware in
`http/middleware/`, never inline in handlers.
## Out of scope / do not touch
- `http/generated/`: regenerated from OpenAPI, edits are lost.
- `legacy/billing_v1/`: frozen, under migration. Do not extend.
- `migrations/`: never edit an applied migration; add a new one.
## When in doubt
If a requested change requires breaking a rule above, the right move
is to say so and propose the smallest design-consistent alternative,
not to silently work around the rule.
That last section matters more than it looks. Telling an agent what to do when the request conflicts with the design turns the file from passive documentation into an active guardrail. Without it, an agent that hits a constraint tends to route around it and ship the workaround.
How coding agents actually consume DESIGN.md
Agents don’t have a special DESIGN.md parser. They consume it the same way they consume any file: by reading it with their file tools and treating the contents as context. So the mechanics of getting it loaded matter, and they differ slightly per tool.
The reliable pattern is to reference DESIGN.md from the instruction file each agent already loads at startup. For Claude Code, that’s CLAUDE.md; the memory docs describe an @path import syntax where @DESIGN.md expands the file into context at session start. For the AGENTS.md ecosystem, you add a line in AGENTS.md pointing at it (“Architecture and design rules: see DESIGN.md; read it before structural changes”). Agents that walk the directory tree will pick up the nearest AGENTS.md, see the pointer, and pull DESIGN.md when the work touches architecture. Either way you’re not duplicating content; you’re keeping the short operational file short and letting the deep file be deep.
Three practical notes from how these tools behave:
First, the agent treats the file as context, not as enforced rules. The Claude Code docs are blunt that CLAUDE.md content is guidance the model tries to follow, not a hard constraint. Same applies to anything you reference from it. That’s why the template phrases everything as testable absolutes and adds an explicit “when in doubt” instruction; vague prose gets ignored under pressure, sharp rules get followed more often.
Second, length and structure change adherence. Headers and bullets beat paragraphs because the model scans structure the way a reader does. A 3-page wall of architectural philosophy will be skimmed; ten crisp invariants under a clear heading will be used. Write for retrieval, not for prose.
Third, the file changes review economics, not just generation. Even when an agent partly ignores it, a reviewer can point at the violated rule and the agent fixes it in one turn (“this breaks the single-write-path rule in DESIGN.md”). That feedback loop, ground the correction in the written decision, is where a lot of the real value lands. Teams building their own agent harnesses lean on exactly this; see build your own Claude Code for how that loop gets wired into autonomous flows.
Anti-patterns and keeping it from rotting
The fastest way to make DESIGN.md worthless is to write it like a wiki page. A rotted design file is worse than none, because agents and humans both trust it and get misled. Avoid these.
Restating the code. “The UserService class handles users” tells an agent nothing it can’t read from user_service.py. If a sentence is true by reading the file, cut it. Keep only what the code can’t tell you: rationale, invariants, rejected paths.
Tutorial creep. Step-by-step “how to add a feature” walkthroughs belong in CONTRIBUTING.md or a skill, not here. The moment DESIGN.md has shell commands and copy-paste snippets, it’s the wrong document and it’ll go stale at tooling speed.
Aspiration as fact. Writing “the system uses CQRS” when half of it doesn’t trains agents to produce code matching a fiction. Document what’s true now plus where you’re deliberately heading, and label the difference. “Target: all writes go through use cases. Current: legacy/ bypasses this; do not extend it.”
No owner, no review trigger. A design file no one is responsible for drifts in a quarter. Tie it to a trigger: review DESIGN.md in any PR that adds a module, changes a layer boundary, or introduces a new external dependency. Put that rule in the PR template. Some teams add a checklist item, “does this change a decision in DESIGN.md? If so, update it in the same PR.”
Synchronization theater. Don’t try to keep it line-synced with code; that’s a losing game and it’s the reason architecture docs get abandoned. Keep it at the level of decisions that change a few times a year, not function signatures that change weekly. The matklad ARCHITECTURE.md guidance is the right instinct here: only write down what’s unlikely to change often.
Contradicting the other instruction files. If AGENTS.md says one thing about error handling and DESIGN.md says another, agents pick one arbitrarily. Keep operational rules in AGENTS.md / CLAUDE.md and architectural rules in DESIGN.md, and don’t let them overlap. When they must reference each other, one points at the other; they don’t both assert the same fact.
A healthy DESIGN.md is short, dense, declarative, owned, and reviewed on a trigger. If yours is long, narrative, and last touched a year ago, agents are reading fiction.
DESIGN.md for API and backend codebases
This is where the file earns its keep. API and backend services have exactly the kind of invisible, high-cost constraints agents are worst at: contract boundaries, transaction semantics, idempotency, data integrity, layering. None of it is obvious from a single file, and getting it wrong ships bugs that reach production and money.
Put these API-specific things in DESIGN.md and agent output quality on backend tickets jumps:
The contract is the source of truth, and say where it is. State plainly that the OpenAPI spec is authoritative and generated types are not to be hand-edited. Agents love to “helpfully” tweak a generated type to make a build pass; one line in DESIGN.md stops it. Point at the spec file path. If you design the contract first in Apidog and export the OpenAPI document into the repo, your DESIGN.md can name that file as the thing every endpoint must conform to, and the agent has an unambiguous target. The argument for designing the contract before the code is covered in designing APIs for AI agents; a design-first contract is exactly what makes agent-generated handlers safe to accept.
Transaction and consistency boundaries. Where does a transaction begin and end? What’s allowed inside it? “External calls never happen inside a DB transaction; use the outbox.” Agents default to the naive inline call every time unless the file forbids it.
Idempotency and retries. State the idempotency strategy as an invariant. Payment, order, and provisioning endpoints are where a missing idempotency key becomes a double-charge. The agent will not infer this from reading a handler.
Error model. One sentence: “all errors are problem+json via the problem() helper; never invent error shapes.” Without it you get a different error envelope per endpoint, which breaks every client.
Auth and tenancy scoping. “Every query is tenant-scoped; an unscoped repository method is a bug.” This is a security invariant, and it’s invisible in any individual query, so it’s exactly the kind of rule that has to be written down.
Versioning and breaking-change rules. What counts as breaking, how you version, what’s allowed in a minor. Agents will happily rename a response field; the file tells them that’s a breaking change with a process.
For backend work the payoff is concrete: fewer layering violations, no surprise inline gateway calls, consistent error and pagination shapes, and contract-conformant handlers because the agent was pointed at the OpenAPI spec instead of guessing the schema. The agent stops inventing and starts conforming. If you want the agent to also exercise the API it just wrote, the contract-plus-design combination is what lets tools and agents test against a known interface; Download Apidog gives you the design-first workspace, the OpenAPI export your DESIGN.md points at, and an MCP server and AI-agent debugger for checking that generated endpoints actually match the contract.
Conclusion
DESIGN.mdrecords why your code is shaped the way it is: the invariants, decisions, and rejected alternatives an agent can’t read out of source.- It complements rather than replaces
AGENTS.mdandCLAUDE.md: those stay short and operational;DESIGN.mdholds the deep architecture and is referenced from them. - Write it declaratively, as testable absolutes plus a “when in doubt, flag don’t work around” rule, so it acts as a guardrail, not passive prose.
- It pays off most on API and backend codebases, where contract boundaries, transactions, idempotency, and tenancy scoping are invisible and expensive to get wrong.
- Keep it from rotting with an owner and a review trigger tied to your PR template; never line-sync it with code.
- The single biggest backend win is naming the OpenAPI spec as authoritative so agents conform to the contract instead of inventing schemas.
- Design that contract first. Download Apidog to design APIs design-first, export the OpenAPI spec your
DESIGN.mdpoints at, and test that agent-generated endpoints actually match it.
FAQ
Is DESIGN.md an official standard like AGENTS.md?
No. AGENTS.md is a defined, broadly-adopted format now stewarded under the Linux Foundation’s Agentic AI Foundation. DESIGN.md is a community convention with no single owner or spec, in the same family as ARCHITECTURE.md and ADRs. Treat it as a useful pattern you adapt, not a standard you conform to.
Do I need DESIGN.md if I already have AGENTS.md or CLAUDE.md?
If your architecture has non-obvious constraints, yes. AGENTS.md and CLAUDE.md are meant to stay short and operational; the Claude Code docs recommend keeping CLAUDE.md under about 200 lines. Deep architectural rationale doesn’t fit there without bloating it and hurting adherence, so you put it in DESIGN.md and reference it. For the operational file itself, see how to write AGENTS.md files.
How is DESIGN.md different from ARCHITECTURE.md?
Mostly intent and audience. ARCHITECTURE.md is the older convention aimed at human contributors mapping the codebase. DESIGN.md is the same idea written for an audience that includes a coding agent: more declarative, decision- and invariant-focused, and explicitly referenced from the agent instruction files so it gets loaded into context. Many teams use one file and one name; the principles are the same.
How long should DESIGN.md be?
Long enough to cover the decisions agents keep getting wrong, short enough that every line earns its place. Decision-dense beats comprehensive. If it reads like a tutorial or restates the code, cut it. A focused two to four pages of invariants and rationale beats a fifteen-page narrative no agent will read closely.
How do I make the agent actually read it?
Reference it from the file the agent already loads at startup. For Claude Code, import it from CLAUDE.md with @DESIGN.md. For the AGENTS.md ecosystem, add a pointer line in AGENTS.md telling agents to read DESIGN.md before structural changes. Don’t paste the whole thing into the short file; reference it so the operational file stays short.
Will the agent always follow DESIGN.md?
No, and you should design around that. Agent instruction files are context the model tries to follow, not enforced configuration. Write rules as sharp absolutes, add an explicit “flag conflicts, don’t work around them” instruction, and lean on the review loop; pointing at a violated rule in DESIGN.md gets a fast, correct fix even when the first pass missed it.
Does DESIGN.md help with API contract problems specifically?
A lot. Its highest-value backend use is stating that the OpenAPI spec is authoritative and naming the file, so agents conform to the contract instead of inventing schemas or hand-editing generated types. Designing that contract first in a tool like Apidog gives the agent an unambiguous target your DESIGN.md can point straight at.
Where should DESIGN.md live in the repo?
The repository root, next to README.md and AGENTS.md, so agents and humans find it with zero search. In a monorepo, a root DESIGN.md for system-wide rules plus a per-package one for local architecture works well, mirroring how agents read the nearest AGENTS.md in the directory tree.



