Google has somehow manĀ­aged to extend Gemini’s visual acuity into these open-weights models. My appliĀ­caĀ­tion has to do with handĀ­writing recognition, plus the calĀ­cuĀ­laĀ­tion of bounding boxes for blobs of text, and the 31B verĀ­sion perĀ­forms as well as Gemini 3 Flash … and nearly as well as Gemini 3.1 Pro?! (This isn’t just vibes, but quanĀ­tiĀ­taĀ­tive scoring.) Yet Gemma 4 31B is a model I can run howĀ­ever and wherĀ­ever I want … it runs (quantized) on my old 2017-era deep learning rig with its three 12GB GPUs. It runs in the secure enclaves on Tinfoil.

Source: The milestone of Gemma 4

I wish to believe this. With gemini-cli still not available to me for my account for ā€œviolation of ToS,ā€ i’ve shifted to using gemma4 completely for running and learning pi. I am still working through all my workflows to test and see how well they run between models and it’s gotten me into the mindset of developing an eval tool for models for my workflows. What I can append to the visual-acuity capabilities quoted in the post is the ability to run tool calls via skills. It’s reached a level where I feel comfortable using it for some mundane, repetitive local tasks; tasks for which I have a robust tool doing all the necessary checks to ensure that there’s nothing generated and with enough checks to ensure a bad tool call won’t cause data loss.

However, the capability level still feels at least one generation lower than the current frontier models. It over thinks, gets itself into unnecessary loops, which most of the current gen models avoid really well.