š The milestone of Gemma 4
Google has somehow manĀaged to extend Geminiās visual acuity into these open-weights models. My appliĀcaĀtion has to do with handĀwriting recognition, plus the calĀcuĀlaĀtion of bounding boxes for blobs of text, and the 31B verĀsion perĀforms as well as Gemini 3 Flashāā¦āand nearly as well as Gemini 3.1 Pro?! (This isnāt just vibes, but quanĀtiĀtaĀtive scoring.) Yet Gemma 4 31B is a model I can run howĀever and wherĀever I wantāā¦āit runs (quantized) on my old 2017-era deep learning rig with its three 12GB GPUs. It runs in the secure enclaves on Tinfoil.
Source: The milestone of Gemma 4
I wish to believe this. With gemini-cli still not available to me for my account for āviolation of ToS,ā iāve shifted to using gemma4 completely for running and learning pi. I am still working through all my workflows to test and see how well they run between models and itās gotten me into the mindset of developing an eval tool for models for my workflows. What I can append to the visual-acuity capabilities quoted in the post is the ability to run tool calls via skills. Itās reached a level where I feel comfortable using it for some mundane, repetitive local tasks; tasks for which I have a robust tool doing all the necessary checks to ensure that thereās nothing generated and with enough checks to ensure a bad tool call wonāt cause data loss.
However, the capability level still feels at least one generation lower than the current frontier models. It over thinks, gets itself into unnecessary loops, which most of the current gen models avoid really well.