Self Improving Agent vs Mulch

Baseline (Rank #2 on ClawHub, legacy .learnings) vs Mulch Self Improver — what’s different

Self Improving Agent — Rank #2 on ClawHub
Store.learnings/ (LEARNINGS.md, ERRORS.md, mixed PREFERENCES-style file)
Session startLong reminder (632 chars) + load full .learnings files into context
RecordingAppend to markdown files (no types, no domains)
RetrievalRead full file(s); grep/cat to find “package manager” or errors
TroubleshootingLoad full ERRORS.md + LEARNINGS.md (1215 chars) to find a fix
Style / memoryOne mixed file (e.g. PREFERENCES.md); load full file for any question
Mulch Self Improver
Store.mulch/ (typed records, domains)
Session startShort reminder (452 chars) + mulch prime; only prime output in context
Recordingmulch record <domain> --type failure|convention|…
Retrievalmulch search "…" / mulch query (targeted; 330 chars vs 932 for 2 queries)
TroubleshootingOne mulch search "<error>" per scenario (559 chars, ~54% less)
Style / memoryDomains writing_style, addressing, preferences, habits, admin; targeted search (757 vs 1136 chars, ~33% less)
Net: Mulch uses a shorter reminder, targeted retrieval (search/query) instead of full-file reads, and domain separation for errors vs conventions vs style/preferences, so the same tasks use fewer characters (token proxy) and less noise. Combined efficiency gain over legacy: ~27.5% fewer chars when session + troubleshooting + style/memory are all used (3792 vs 5233 chars; ~352 tokens saved).

Full benchmark: benchmark-comparison.html · Elevator pitch: benchmark-elevator-pitch.html