China is back at the forefront of open-weight models
The world’s best open weight model is Made in China (again): … Chinese startup Moonshot has built and released via open weights Kimi K2, a large-scale mixture-of-experts model. K2 is the most powerful open weight model available today and comfortably beats other widely used open weight models like DeepSeek and Qwen, and approaches the performance of Western frontier models from companies like Anthropic. The model is 32 billion activated parameters and 1 trillion total parameters (by comparison, DeepSeek V3 is ~700B parameters, and LLaMa 4 Maverick is ~400B parameters).
This seems like a strong contender for the open weights best model. I seemed to have missed that we are now in the era of post-trillion parameters for the model. I learned today that GPT 4.1 has 1.76T parameters while Gemini 2.5 has about 1.5T parameters.
It seems like we are still establishing the results of how these base models improve / impact reasoning given Gemini's reasoning prowess seems to be higher to OpenAI's. Or for that matter how Deepseek R1 was way better than what Deepseek-V3 originally promised.
Developer markets vibe with consumer markets?
Vibes: More importantly, the ‘vibes are good’ - “Kimi K2 is so good at tool calling and agentic loops, can call multiple tools in parallel and reliably, and knows “when to stop”, which is another important property,” says Pietro Schirano on Twitter. “It’s the first model I feel comfortable using in production since Claude 3.5 Sonnet.
It's fascinating to me that most consumers are aware of ChatGPT and primarily use that over anything else. The battle for the developer-mindshare is very strong though. And weirdly it's a vibe-eval from developers. This market, as a result isn't an objective test, but more of a consumer mind-share battle again.
You can see how consumer marketing tactics work here also as a result.
The power of constraints
We should appreciate the irony here that China's leading the market of open-weight models while American labs are all mostly closed-weights and their equivalent open-weight models are nowhere near as competitive. At the same time, the paradox that China continues to deliver these improvements while facing the GPU shortage - at least theoretically - is one that American labs must take into account.
Sometimes, constraints help. American labs are such a function of scale driven growth that I worry they might be missing the impetus to be sustainable and that the current growth trajectory isn't organic. But, that's just one worry and there's a tremendous amount of built-in knowledge based on success of scaling out a business before profitability to lock-in those gains when they reach the majority share. Amazon, Uber, majority of cloud-build up, some SaaS verticals. There is an institutional know-how of scale ahead of unit-economics, cost structure that's a source of fuel for American venture capital. It also comes with victory-bias and recency bias as people shouted through the roof tops (esp in the case of Uber) that unit economics is going to hamper its success at scale.