The key feature then is speed. I made it through the waitlist and tried it out just now and wow, they are not kidding about it being fast. In this video I prompt it with “Build a simulated chat app” and it responds at 857 tokens/second, resulting in an interactive HTML+JavaScript page (embedded in the chat tool, Claude Artifacts style) within single digit seconds.
Source: Gemini Diffusion
This is the worst it will be. That makes it incredibly impressive and is nothing short of a novel way to improve token outputs. I can only imagine how this could be implemented in parts of the LLM paths to make the process so much faster.
Fascinating.