Tristan Denyer

Tristan Denyer


How We Got Our AI Agent to Remember Everything and Stop Clobbering Each Other's Work

A practical guide (with prompts) to persistent memory and shared context for dev teams using AI coding agents.

If you've shipped production code with an AI coding assistant, you've felt it: the assistant that confidently rebuilt something you already built three sprints ago, the one that missed a critical architectural decision buried in a Slack thread from six months back, the one that clashed with what your teammate was working on in parallel.

The problem is not the model. The problem is memory. And the fact that your docs are scattered to the winds in Confluence, Google Docs and even on someone's desktop.

AI coding agents are stateless by default. Every session starts cold with no memory of what came before it. And when multiple engineers are all prompting the same codebase in different directions, you can get drift, duplication, and debugging sessions that could have been avoided.

This post walks through a practical knowledge system that layers persistent context into your AI agent workflow. It is not a heavy process, and can take less than 2 mins of copy-n-pasting to create a small set of files with a clear lifecycle that compounds in value the longer your team uses it.

The Core Problem: Stateless Agents in a Stateful Codebase

Your codebase has history. Decisions were made. Patterns were chosen for specific reasons. A microservice was refactored because of a painful production incident. A third-party integration was dropped after a security audit. None of that lives anywhere your AI agent can see, unless you put it there.

For solo devs, this is annoying. For teams, it becomes a reliability problem. This is real enough that entire products are being built at the infrastructure level specifically to give AI agents shared persistent memory. Without adopting a new platform, a few well-structured files and a clear lifecycle can greatly improve your workflow and code quality.

The Five-File Knowledge Stack

The setup is five files (or file types), each with a distinct job:

CLAUDE.md (or your agent's root context file)

The persistent rulebook for your AI agent. Every session loads this. It sets standards, conventions, and tells the agent where to look for everything else. Think of it as the agent's onboarding document that never goes stale if you keep it updated.

If you already have one, we will be adding to it below.

ENGINEERING_LOG.md

A running, structured changelog written by and for your AI agent and your team. Every time a meaningful change is made to the codebase, such as a new service, a schema migration, a breaking API change, or a bug fix with architectural implications, it gets logged here.

(In my case) Entries older than 90 days are pruned unless they describe a major architectural decision. This keeps the log relevant without becoming a graveyard of outdated context.

Looks like this:

## [DATE] - [ENGINEER_NAME] - [BRIEF_TITLE]
Changed:
Why:
Impact areas:
Watch out:
and an itemized list of changes, where applicable

_README/PLANS/[feature].md

Before building anything significant, the agent creates a phased plan. Each phase contains embedded prompts that the agent (or a teammate's agent session) can pick up and execute independently. Plans are how you make parallel agent sessions coherent rather than chaotic.

_README/RCA/[incident-name].md

When something breaks in production, the post-incident analysis lives here. Each RCA doc captures the timeline, root cause, the fix applied, and follow-up action items. A short entry in ENGINEERING_LOG.md points to the doc so the agent can find it during planning without having to scan every file in the folder.

This keeps CLAUDE.md from becoming a graveyard of one-liner incident notes and gives the agent rich, structured context about what has gone wrong near a given system before. That context is genuinely useful when the agent is planning new work in the same area.

_README/DOCS/[feature].md

After a feature is shipped, the PLAN.md gets converted into a DOC.md. The doc preserves the reasoning, the key decisions, and enough implementation history that future debugging sessions do not start from zero. This is where "why did we build it this way" lives permanently.

The Lifecycle: What This Looks Like in Practice

Here is the full loop, narrated from the perspective of an AI agent working inside this system.

Phase 1: Planning

A new feature request comes in. Before writing a single line of code, the agent:

  1. Reads ENGINEERING_LOG.md to find any related or adjacent work that may be affected. If someone on the team recently refactored the auth service, the plan needs to account for that.
  2. Scans _README/DOCS/ for any existing documentation on similar systems. If this pattern has been implemented before, the agent inherits those lessons rather than rediscovering them.
  3. Generates a PLANS/[feature].md with phased steps. Critically, each phase includes a baked-in prompt so any agent session (from any team member) can pick up exactly where another left off.

This is where the Claude Superpowers plugin becomes a serious force multiplier. Superpowers is a Claude Code plugin built by Jesse Vincent that enforces structured development workflows rather than letting the agent skip straight to writing code. Its /superpowers:brainstorm and /superpowers:write-plan commands are a natural fit for this planning phase: the brainstorm skill runs a Socratic requirements session before any plan is written, and the write-plan skill generates a granular, phased implementation plan rather than a vague outline. When paired with the knowledge system here, the agent walks into the plan command already loaded with historical context from your ENGINEERING_LOG and DOCS, and Superpowers handles the discipline of making sure that context actually shapes a rigorous plan before execution begins.

šŸ’” Important: Do not rely on the agent remembering to check these files on its own. Make it mandatory. The pre-task checklist in CLAUDE.md should explicitly require the agent to read ENGINEERING_LOG.md, scan DOCS/, and check RCA/ before any plan is generated. The copyable prompts at the end of this post include that checklist.

Phase 2: Execution

The agent works through the plan, phase by phase. Because each phase has its own scoped prompt, a long or complex feature can be distributed across sessions or team members without losing thread. The plan is the shared state.

šŸ’” Important: Each phase should also include a brief context snapshot. This can be a note capturing the relevant state of the system at the time the plan was written, such as the schema version, active API contracts, or key dependencies. Codebases shift between planning and execution, and a plan written against an older schema can silently produce bugs if that baseline is not recorded alongside the work. This allows you to write the plan weeks or months ahead, with context and state of codebase, and when the plan gets started, your AI agent should compare the state of the codebase to what in the plan needs to change, if anything.

Phase 3: Documentation

When the feature ships, the agent converts PLANS/[feature].md into DOCS/[feature].md. The doc is not just a description of what was built but a record of how it was built and why certain decisions were made. Future agents (and future teammates, and future self, of course) can read this when triaging issues months later.

The doc should also include an open questions section. Most features ship with a handful of known tradeoffs or deferred decisions. If those are not captured explicitly, they become invisible technical debt that the next engineer (or the next agent session) has to reverse-engineer from the code.

Phase 4: Logging

The ENGINEERING_LOG.md is updated with a summary of what changed: services affected, schemas modified, any public interfaces that shifted. Each entry should include a blast radius tag: a quick label of whether the change was isolated, cross-service, or architectural. This lets the agent triage relevance during future planning phases at a glance rather than reading every entry in full. It is a small addition that pays off quickly as the log grows.

Back to Phase 1, and every loop through gets stronger.

Handling Production Issues—RCAs for Days

When a postmortem is done on a major issue or outage, the full RCA goes into _README/RCA/[incident-name].md. Stop burying them in a ticket or added to a Confluence folder that no one reads or checks. This is a proper structured doc with the timeline, root cause, fix, and follow-up action items, living right in your codebase. Good luck losing that!

"But wait! This is now buried in the repo!" You need to get honest with yourself and ask "how many people genuinely search for your RCA and read it? And learn from it?" In most cases, you send a link to your boss or QA team or Security. Send them the link to the .md file in the repo. Or email it to them. It's often a read-once doc anyway.

When was the last time you searched for an RCA before you started work to see if what you were building / changing would just reimplement the original issue? Never, the answer is you never have. This ensures it is always in context.

From there, two things happen:

  1. A short log entry is added to ENGINEERING_LOG.md summarizing the incident and pointing to the RCA doc. This makes it discoverable during future planning sessions without requiring the agent to scan every RCA file.
  2. CLAUDE.md holds a standing instruction to check _README/RCA/ when planning work near systems that have had prior incidents.

This matters because a one-liner in CLAUDE.md like "Greenhouse API hit rate-limit" gives the agent almost nothing to work with. A full RCA doc gives it the timeline, the conditions that triggered the issue, the tradeoffs in the fix, and the follow-ups that were deferred. That is the difference between an agent that designs and develops with context of a known failure mode, and one that walks straight into it.

The loop closes like this: incidents produce RCA docs, RCA docs inform future planning, and the quality of what gets built improves incrementally every cycle. Your RCA learnings are now part of the future planning process, every time.

Copy-n-paste Prompts to Get Started

Drop these into your CLAUDE.md to implement this system. Adjust paths and conventions to match your repo structure. Make it your own and work for you!

Prompt 1: Root Context Instructions

## AI Agent Knowledge System

Before starting any significant task, follow this pre-task checklist:

1. Read ENGINEERING_LOG.md and identify any recent changes that may affect or overlap with this task.
2. Check _README/DOCS/ for any existing documentation on this feature, service, or pattern.
3. Check _README/RCA/ for any past incidents related to the systems you are about to touch. Use ENGINEERING_LOG.md entries to find relevant RCA docs quickly.
4. If this task involves building something new or modifying something significant, create a plan at _README/PLANS/[feature-name].md before writing code.

Plans must be phased. Each phase must include a self-contained prompt that any agent session can use to execute that phase independently.

After a feature ships:
- Convert the plan to a doc at _README/DOCS/[feature-name].md.
- Include implementation decisions and reasoning, not just a description of what was built.
- Update ENGINEERING_LOG.md with a summary of what changed, what services were affected, and the blast radius (isolated / cross-service / architectural).

When a production issue is triaged and resolved:
- Create an RCA doc at _README/RCA/[incident-name].md.
- Add a short entry to ENGINEERING_LOG.md summarizing the incident and linking to the RCA doc.

Prompt 2: Generating a Phased Plan

Before writing any code, generate a phased plan for this feature and save it to _README/PLANS/[feature-name].md.

The plan must include:
- A summary of the feature and its goals
- A check against ENGINEERING_LOG.md: list any related or adjacent work that may be affected
- A check against _README/DOCS/: note any existing documentation relevant to this work
- Phased steps, where each phase includes:
  - A clear description of the work
  - A self-contained prompt that a fresh agent session can use to execute this phase
  - A context snapshot: the current relevant state of the system (schema version, API contracts, dependencies)
- Known tradeoffs or deferred decisions noted explicitly

Do not begin implementation until the plan is written and confirmed.

Prompt 3: Converting a Plan to a Doc

The feature described in _README/PLANS/[feature-name].md has shipped. Convert it to a documentation file at _README/DOCS/[feature-name].md.

The doc must include:
- What was built and why
- Key architectural and implementation decisions, with the reasoning behind them
- How it was built (relevant implementation notes that would help a future engineer debug or extend this)
- Open questions or deferred decisions that were not resolved
- Any known limitations or edge cases

Preserve enough of the build history that this doc is useful during future triage, not just as a reference.

Prompt 4: Creating an RCA Doc

A production incident has been resolved. Create an RCA doc at _README/RCA/[incident-name].md.

The doc must include:
- Incident summary and severity
- Timeline: when it was detected, when it was mitigated, when it was resolved
- Root cause
- Contributing factors
- The fix that was applied, including any tradeoffs made under pressure
- Follow-up action items and whether they were deferred
- Systems and services affected

After creating the doc, add a short entry to ENGINEERING_LOG.md with the incident name, affected systems, blast radius, and a reference to the RCA file path.

Prompt 5: Logging a Change

You can adjust this prompt to only log changes that are impactful, or large, or whatever is meaningful to you. If your codebase gets updated 10 times a day, you likely may not want to log a typo or text change. There is not a lot of value there.

Update ENGINEERING_LOG.md with a summary of the changes made in this session.

Each entry must include:
- Date
- Feature or change name
- Summary of what changed
- Services, schemas, or interfaces affected
- Blast radius / impact: isolated / cross-service / architectural

Looks like this:
## [DATE] - [BY_WHOM] - [BRIEF_TITLE]
Changed:
Why:
Impact areas:
Watch out:
and an itemized list of changes, where applicable

Start Small, Stay Consistent

You do not have to implement all five layers on day one, but consider it 'recommended'. The fastest path to value is to start with CLAUDE.md (prompt 1) and ENGINEERING_LOG.md (prompt 5), run one or two planning cycles through the system, and let the gaps reveal themselves. The PLANS/, DOCS/, and RCA/ folders tend to earn their place quickly once the log starts accumulating real history.

The system compounds. A week in it feels lightweight. A month in it starts catching things. Six months in it becomes the institutional knowledge layer your team did not know it was missing.

The files are simple. The outcome is far less rework, and markedly improved code quality.

Comment on this post →