🔗 The milestone of Gemma 4

April 24, 2026

Google has somehow managed to extend Gemini’s visual acuity into these open-weights models. My application has to do with handwriting recognition, plus the calculation of bounding boxes for blobs of text, and the 31B version performs as well as Gemini 3 Flash … and nearly as well as Gemini 3.1 Pro?! (This isn’t just vibes, but quantitative scoring.) Yet Gemma 4 31B is a model I can run however and wherever I want … it runs (quantized) on my old 2017-era deep learning rig with its three 12GB GPUs. It runs in the secure enclaves on Tinfoil.

Source: The milestone of Gemma 4

I wish to believe this. With gemini-cli still not available to me for my account for “violation of ToS,” i’ve shifted to using gemma4 completely for running and learning pi. I am still working through all my workflows to test and see how well they run between models and it’s gotten me into the mindset of developing an eval tool for models for my workflows. What I can append to the visual-acuity capabilities quoted in the post is the ability to run tool calls via skills. It’s reached a level where I feel comfortable using it for some mundane, repetitive local tasks; tasks for which I have a robust tool doing all the necessary checks to ensure that there’s nothing generated and with enough checks to ensure a bad tool call won’t cause data loss.

However, the capability level still feels at least one generation lower than the current frontier models. It over thinks, gets itself into unnecessary loops, which most of the current gen models avoid really well.

Interactions