Side-by-side vs Self Improving Agent — Rank #2 on ClawHub (legacy .learnings flow)
| Metric | Baseline (legacy) | Mulch | Winner |
|---|---|---|---|
| Reminder (chars) | 632 | 452 | Mulch (shorter) |
| Session context (chars) | 1318 | 1694 | Baseline |
| Retrieval (2 queries, chars) | 932 | 330 | Mulch 65% less |
| Total (rem + ctx + ret) | 2882 | 2476 | Mulch 14% less |
| Metric | Baseline (legacy) | Mulch | Winner |
|---|---|---|---|
| Chars to get all 3 resolutions | 1215 | 559 | Mulch 54% less |
| Resolutions found (of 3) | 3/3 | 2/3 or 3/3 | Same or better |
| Metric | Baseline (legacy) | Mulch | Winner |
|---|---|---|---|
| Chars to get all 6 answers | One full file | Sum of 6 targeted searches | Mulch (targeted) |
| Scenarios found (of 6) | 6/6 | 4/6–6/6 | Same or better |
| Scenario | Baseline (chars) | Mulch (chars) | Saving | Per 100 sessions (tokens ≈ chars/4) |
|---|---|---|---|---|
| Session (rem + ctx + retrieval) | 2882 | 2476 | 406 chars (~100 tokens) | ~10k tokens |
| Troubleshooting (3 errors) | 1215 | 559 | 656 chars (~164 tokens) | ~16k tokens per 100 troubleshoot rounds |
Run: docker run --rm mulch-self-improver-test benchmark. Credits: Mulch Self Improver by Austin Dixson; Mulch by Jaymin West.