Fixed Gemma 4 thinking level mapping to route between
MINIMALandHIGH, and map Pi reasoning levels to the model’s supported thinking levels (#2903 by @aadishv)
Source: pi-mono/packages/coding-agent/CHANGELOG.md at main
This makes pi so much more useful as a local model. While I can run the gemma4:31b-it-q8_0, the tokens per second is too slow for daily usage. However, gemma4:26b-a4b-it-q8_0 runs fast and comes default with thinking. The fact that it can be useful as an offline model is promising. The promise of offline thinking models to be run fast enough on a local computer is forever alluring.
However, I know for sure that right now I’d rather save that money from buying a new computer and use that towards cloud token budget.