PrismML Bonsai 1.7B
This run used the official PrismML demo on a CPU-only host and put Bonsai 1.7B through the same quick operator pack we used for the Gemma local comparison. Result: interesting tiny model, but too unreliable on routing and operational judgment to be a default OpenClaw worker.
56/100 overall 8.8s average latency CPU quick pack Date: 2026-04-16- Extraction was mostly correct on the structured JSON task.
- The summary task was genuinely good, concise, and on point.
- The model is tiny enough to stay interesting for cheap local experimentation.
- It invented a fake tool path for reading X/Twitter threads instead of following the Bird-first rule.
- It leaked
<think>markers, which makes strict structured output less trustworthy. - Its local-vs-hosted reasoning was generic and partly inverted, which is exactly the sort of judgment failure this benchmark is meant to expose.
Bonsai 1.7B is usable for lightweight local extraction and summarization, but it is not trustworthy enough for routing, benchmark-sensitive judgment, or default operator work. If we want the real upside case, the fair next test is Bonsai 8B on Apple Silicon or another GPU-backed setup.
| Task | Score | Time | What happened |
|---|---|---|---|
| t1 JSON extract | 18/25 | 8.0s | Mostly correct extraction, but it leaked thinking markers so it failed strict cleanliness. |
| t2 Routing | 4/25 | 9.3s | The big miss. It invented an url tool instead of choosing Bird-first for X thread reading. |
| t3 Reasoning | 9/25 | 11.1s | Recognized some privacy/on-prem logic, but defaulted the wrong way and stayed too generic. |
| t4 Summary | 25/25 | 6.8s | Clean concise summary. Best part of the run. |
This host was only 4 vCPU, 15 GiB RAM, and CPU-only, so Bonsai 8B was not practical to benchmark interactively here. That means this page is the honest CPU-box result for Bonsai 1.7B, not the final word on the family.
The raw machine-readable files are still available, but they are secondary now, not the main presentation.