Mulch Self Improver — Benchmark

Side-by-side vs Self Improving Agent — Rank #2 on ClawHub (legacy .learnings flow)

27.5%
Fewer chars (total)
3792
Mulch (chars)
5233
Baseline (chars)
~352
Tokens saved/session

1. Token efficiency (session + retrieval)

MetricBaseline (legacy)MulchWinner
Reminder (chars)632452Mulch (shorter)
Session context (chars)13181694Baseline
Retrieval (2 queries, chars)932330Mulch 65% less
Total (rem + ctx + ret)28822476Mulch 14% less

2. Troubleshooting (3 error lookups)

MetricBaseline (legacy)MulchWinner
Chars to get all 3 resolutions1215559Mulch 54% less
Resolutions found (of 3)3/32/3 or 3/3Same or better

3. Style & memory (6 scenarios)

MetricBaseline (legacy)MulchWinner
Chars to get all 6 answersOne full fileSum of 6 targeted searchesMulch (targeted)
Scenarios found (of 6)6/64/6–6/6Same or better

4. Projected savings (chars ≈ token proxy)

ScenarioBaseline (chars)Mulch (chars)SavingPer 100 sessions (tokens ≈ chars/4)
Session (rem + ctx + retrieval)28822476406 chars (~100 tokens)~10k tokens
Troubleshooting (3 errors)1215559656 chars (~164 tokens)~16k tokens per 100 troubleshoot rounds

Run: docker run --rm mulch-self-improver-test benchmark. Credits: Mulch Self Improver by Austin Dixson; Mulch by Jaymin West.