2026-03 - W4

How we solved the agent memory problem

Interesting concept. Instead of summarizing, it’s using distillation, in order to scale as much as possible. Because summarizing is lossy, and might have side effects in the long run (e.g. forgot the instruction to never delete your system).

Distillation is the core mechanism that keeps the context window manageable while preserving what matters. Here’s how it actually works. A background process monitors the token count after each turn. When the context window exceeds ~60% capacity (roughly 60k tokens on a 200k model), the distillation agent wakes up. The agent doesn’t compress everything at once. It looks at the oldest un-distilled messages and works forward, creating distillations until the context drops below the target threshold. The distillation agent is explicitly instructed to preserve:

  • File paths and locations
  • Specific values, thresholds, configuration
  • Decisions and their rationale (the “why”)
  • User preferences and patterns
  • Error messages and their solutions
  • Anything that would be hard to rediscover

It drops:

  • Exploratory back-and-forth that led nowhere
  • Verbose tool outputs (the agent saw them; the summary is enough)
  • Social pleasantries and acknowledgments
  • Redundant restatements of the same information
  • The “texture” of debugging (the false starts, the confusion)

A 50-message debugging session might compress to 3 sentences of context and 5 bullet points of facts. That’s 10-20x compression while keeping everything operationally useful.