
Beyond DeepSeek: What China's Model Builders Are Seeing
<<< Back to All
Google is back on top. Anthropic is charging ahead. And OpenAI is facing the toughest narrative moment it’s had in years.
Some skeptics argue that OpenAI’s moat is disappearing. Models are becoming commoditized. ChatGPT lacks true network effects. Google has the edge in traffic and compute. And in high-value enterprise tasks, Anthropic appears to be pulling ahead.
To be fair, these concerns aren’t unfounded. We’re only one month into 2026, and instead of stabilizing, the model landscape has grown even more competitive. For the first time since launching ChatGPT, OpenAI finds itself playing from behind.
Still, we remain optimistic. 2026 could be a turning point for OpenAI, but nine critical questions will determine how the story unfolds.
OpenAI is feeling the impact of Gemini across three fronts: narrative, model performance, and traffic.
Narrative is where the impact is most visible.
Google’s resurgence has knocked OpenAI off the SOTA pedestal. More importantly, it’s shifted public perception. After 4o, OpenAI didn’t release a model with a dramatic leap in performance. The takeaway for many wasn’t that scaling had hit a wall, it was that OpenAI’s scaling had.
The market reaction was immediate. Since the release of Gemini 3, Google is up 20%, while SoftBank, often seen as a proxy for OpenAI exposure in public markets, is down 17%.
On the model side, OpenAI’s own missteps matter more than Gemini’s gains.
Rather than launching a new pre-training generation after 4o, OpenAI iterated on top of it, leaning heavily on RL and post-training improvements. Gemini 3 appears to have executed pre-training better than OpenAI this cycle, but it hasn’t delivered a true step-change. Meanwhile, OpenAI still leads in post-training and RL, and is actively addressing its pre-training gaps.
The more likely scenario from here: accelerated releases—Gemini 4, GPT-5.3, Opus 5—pushing Q1 into an intense benchmark race, with leadership alternating from model to model.
Until the next paradigm shift arrives, these back-and-forth wins may not mean much strategically. But the pressure is heavier on OpenAI. It lacks Google’s compute and infrastructure advantages, yet it’s simultaneously funding next-paradigm research, building the next generation of models, and serving one billion users.
However, Gemini 3 has had little impact so far. It has barely moved OpenAI’s API revenue or ChatGPT subscription revenue.
On the traffic front, OpenAI has already rebounded from its recent lows.
According to third-party tracking data, ChatGPT’s web traffic in January returned to pre-holiday levels, while mobile traffic surpassed them. On a longer time horizon, the post-holiday acceleration is even clearer.



2026 is shaping up to be a year of intensified competition and an upgraded battlefield. The test is no longer just technological strength, it’s about strategically deciding where to allocate resources.
OpenAI and Google will compete head-to-head in consumer and advertising markets, while Anthropic, leveraging its consistent strategic focus, is gaining a first-mover advantage in high-value tasks like coding, agentic AI, and Excel automation.
That doesn’t mean OpenAI or Google will ignore high-value tasks. For example, in coding, they are certain to strike at some point. It’s just that the window for consumer competition is shorter, while high-value tasks allow for long-term positioning.
Anthropic has already demonstrated the ability to continuously innovate in high-value tasks. Whether this innovation will become a durable moat or merely pave the way for OpenAI and Google will become clearer over the course of this year.
In the short term, Google can leverage a fully free strategy and its super-platforms, like its browser and search, to drive traffic to Gemini, which could slow ChatGPT’s growth. This is a luxury only a giant can afford. After all, Google has nine products with over one billion users each, while the second-place Meta has only three.

In the long term, we should be optimistic about ChatGPT’s growth. Chat and search are inevitably heading toward deep integration, and chat can handle far longer and more complex queries than search. The total volume of chat queries and usage frequency will eventually surpass search engines, meaning the user base could reach at least the same scale as search, around 5 billion monthly active users.
Currently, ChatGPT has roughly 1.2 billion MAU, and Gemini about 400 million, still far from the 5 billion mark. Even if their market share shifts from 4:1 to 1:1, ChatGPT still has room to double.
Assuming a 4:1 ratio and ChatGPT reaches 4 billion MAU, the implications are:
Optimistically, visible ARR for ChatGPT could reach $200 billion, with even larger upside potential beyond that. Conservatively, if ChatGPT and Gemini achieve a 1:1 ratio and reach 2.5 billion MAU, applying a 60% factor to the visible revenue estimate still leaves a huge upside potential.
Every decade brings a wave of fundamental shifts in user behavior, often bigger than technological improvements themselves.
The move from search to chat mirrors the transition from image-and-text browsing to short-form video. The old format doesn’t disappear, but the new one delivers a tenfold stronger experience, hitting users in a completely different dimension.
There are several parallels between AI chat and short video:
Key differences include:
Currently, Google Search handles around 14 billion queries per day, while ChatGPT processes roughly 2.5 billion prompts per day (according to OpenAI data shared with Axios as of July 2025), already about 18% of Google’s volume. In terms of consumer intent, chat shows a clear advantage, many brands were already lining up even before OpenAI officially launched ads.
With shifts in both user behavior and ad models, Google faces a significant threat. The 2C battle between Gemini and ChatGPT in 2026 will be intense.
Although OpenAI emphasizes consumer products in its messaging and Anthropic leans toward Enterprise Business, OpenAI’s Enterprise Business has consistently been underestimated.
In 2025, OpenAI’s ARR was $20 billion (revenue $13 billion), with API revenue accounting for roughly 30%, about $6 billion. In the same year, Anthropic’s ARR was around $9 billion (revenue ~$4.5 billion), with 85% coming from coding and other Enterprise Business offerings; Claude Chat subscriptions made up only 15%.
At first glance, Anthropic’s Enterprise Business revenue seems larger. In reality, OpenAI’s Enterprise Business scale is at least comparable, if not bigger:
Combined, OpenAI’s API and ChatGPT Enterprise revenues account for about 40% of total revenue, roughly $5.2 billion, still larger than Anthropic’s total revenue of $4.5 billion, according to Information.

Recently, Sam promoted the API on X, noting that it added $1 billion in ARR in the past month. OpenAI is also increasing its focus on the enterprise side. With 2C facing intense competition, emphasizing the enterprise business is a natural strategic choice.

One more point: APIs are closely tied to the cloud and may even be reshaping the cloud landscape. Anthropic was previously the only company offering a SOTA model across Azure, AWS, and GCP, capturing advantages on the enterprise business and developer side. With OpenAI’s new funding round, Amazon is highly likely to participate, which could open up new opportunities for its enterprise business.
The three keywords for ChatGPT in 2026 are likely memory, proactive, and personalization, they’re both product and research challenges.
With model pre-training and RL already in the industrialized era, Google has advantages in engineering infrastructure and TPU compute. For OpenAI to break through, it must excel in memory and proactive agents.
Memory and proactive features were pioneered by OpenAI, but they’re not yet fully realized:
Personalization is closely tied to memory and continual learning. Language models can’t yet personalize or learn user preferences in real time like recommendation algorithms, but they have the potential to go orders of magnitude deeper. Neolabs and Thinking Machines Lab in the Bay Area, as well as the recently heavily funded startup Humans&, are exploring different technical paths to achieve personalization and continual learning—creating AI that improves through interaction with users.
This isn’t just OpenAI’s potential game-changer, it’s a high ground that any AI product must capture. Only by realizing memory and personalization can AI achieve a true “data flywheel.”
In the past two paradigm shifts, model scaling and reasoning models, OpenAI led the way. It still has a strong chance of pioneering continual learning, widely recognized by top researchers in China and the U.S. as the next major paradigm.
If OpenAI hadn’t undergone organizational changes, its probability of leading the next paradigm would be the highest.
Historically, OpenAI struck an effective organizational balance, combining top-down coordination with bottom-up innovation. This allowed large-scale deployment of personnel for model training while encouraging grassroots innovation. Innovations like reasoning models and mini O3 emerged from the bottom up, and OpenAI consistently allocated ample compute to frontier research.
By comparison, Anthropic has limited resources and remains highly focused on coding and agentic AI. xAI started later and has been chasing SOTA models, leaving little bandwidth for exploratory research; Meta is similar.
Now, OpenAI has experienced multiple researcher departures and its focus has been split by commercialization and product demands. As a result, I estimate that OpenAI, Google, and Bay Area neolabs each have roughly a one-third chance of pioneering the next paradigm.
Google’s advantage lies in its dense talent pool and abundant resources, there’s always someone internally experimenting with something different.
Neolabs, including SSI, TML, Isara, Humans&, and Core Automation, have sprung up in the Bay Area, founded specifically to create the next paradigm. They are highly focused and include exceptional talents like Ilya Sutskever.
Even if OpenAI isn’t the first to create the next paradigm, once it emerges, the company has the ability to catch up quickly, and holds the strongest advantage in product integration.
OpenAI’s paid subscription rate is currently around 5%, so advertising remains the most effective monetization method for consumer scenarios. Even Netflix focused heavily on ad monetization last year.
Current ads are priced on a CPM basis at roughly $60 per thousand impressions, comparable to top-tier video ads like those during the NFL. This likely reflects OpenAI’s confidence in ad targeting. Users can also interact directly with brands after seeing an ad, representing a new form of advertising innovation in the AI era.

Advertisers’ demand for ChatGPT is bound to be huge. In long conversations, users reveal far more intent, and LLMs excel at recognizing it. Combined, this creates a “gold mine” far richer and easier to tap than anything accumulated before.
The challenge is that mining this gold requires building a full advertising system, infrastructure, and ecosystem, an extremely complex task, likely different from traditional ad models. Within the next year, ad revenue may not scale significantly; early results will likely come from case studies and marketing.
Beyond advertising, ChatGPT’s bigger potential lies in e-commerce.
TikTok has already proven the power of an “ad platform + e-commerce loop” in China. In 2024, TikTok’s e-commerce GMV exceeded ¥3 trillion, creating a loop that makes its per-user value far higher than a pure ad platform.
By contrast, Google and Meta have both struggled to close the e-commerce loop. ChatGPT is pursuing a different path, and its progress is faster than commonly perceived.
Instant Checkout has already integrated with Shopify, with a 4% take rate, connecting over 1 million Shopify merchants. Etsy is live, and major retailers like Walmart are following. More importantly, OpenAI partnered with Stripe to launch the Agentic Commerce Protocol and chose to open-source it, signaling an attempt to set platform standards.
OpenAI’s goal by the end of 2027 is to generate $11 billion in annual revenue from non-paying users, primarily through ads and e-commerce.
Over a 3–5 year horizon, ChatGPT could become the first non-Amazon player to establish a fully internalized e-commerce ecosystem in the U.S. market. This potential far exceeds ad revenue alone—the ceiling for advertising is Google’s ~$300 billion, whereas global e-commerce GMV exceeds $6 trillion. A 4% take rate means every $100 billion GMV generates $4 billion in revenue.
One concern about OpenAI is that it has pioneered a new entry point for the LLM era, but does that mean it’s the final one? AI is still in its very early stages. If chatbots are disrupted by a completely new interaction mode, or if the next major entry point isn’t chat/information but agents, tasks, or entirely new hardware, could OpenAI fade like Yahoo?
The possibility exists, but it’s very low. Yahoo made two mistakes OpenAI is unlikely to repeat:
Even so, Yahoo remained a top-tier internet company for a decade.
Today, information and talent flow is extremely transparent. No lab would underestimate a key technology or be foolish enough to feed a competitor. Perhaps a single product, like ChatGPT, could fade like Yahoo, but OpenAI itself will not. Nor will Google.
In fact, this may be the first time in Silicon Valley history that a startup challenges a giant, and the giant isn’t an elderly, rusty competitor scrambling to catch up, it’s a master swordsman, at the peak of his skill, who has spent the last decade forging a legendary blade. OpenAI is engaged in a hard-fought battle worthy of respect.
We’ve been tracking China’s model ecosystem closely. After the Lunar New Year releases landed , Seedance 2.0(&Seed2.0), GLM-5, and MiniMax M2.5, we wanted to go beyond the benchmarks and the press releases. So we organized a closed-door discussion inside our Best Ideas community.
The session brought together a mix of voices: researchers from Zhipu, MiniMax, and ByteDance who are building these models, alongside developers and practitioners who are deploying them in production. What follows reflects the full conversation — not any single participant’s view.
Some of them confirmed what we expected. A lot of it didn’t.
This analysis and these judgments were compiled from a collective Best Ideas community discussion.
Seedance 2.0: For the first time, a Chinese model isn’t catching up — it’s leading
The framing that matters most about Seedance 2.0 isn’t technical — it’s historic. For the first time, a Chinese foundation model isn’t just competitive globally, it’s decisively ahead.
In video generation, the consensus in the room was that Seedance 2.0 stands at least a generation in front of Sora 2 and Veo 3. This isn’t a claim about catching up. It’s a claim about leading.
What makes that lead meaningful is where it lands: efficiency and usability, not just quality scores.
Seedance collapses the gap between idea and output in ways that restructure the economics of content creation entirely.
For non-professional creators, a full day of iteration compresses into thirty minutes. At that ratio, the traditional grammar of film and TV production, its workflows, budgets, and gatekeepers, doesn’t just become less relevant. It becomes a relic.
The room’s projection: within 6 to 12 months, for under $14, an ordinary person could plausibly produce a 30-40 minute film at Netflix-level image quality. The supply of content won’t just grow — it will compound.
The disruption runs in two directions:
Apprantly, the latter one looks more exciting to the community.
The shift from lottery to reliability is what makes this real. Older video models operated on what participants called a “roulette dynamic”: generate an eight-second clip, get three or four usable seconds, reroll. Seedance 2.0 broke that pattern. A single 15s-generation now largely meets commercial standards for semantic consistency and shot continuity — cinematic pacing, micro-expressions, the physical logic of how one shot leads into the next.
Video creation has moved from a roulette dynamic to something industrial-grade.
The discussion was direct about why ByteDance’s lead here is likely to persist. It’s not just compute or data. It’s organizational depth.
ByteDance appears to operate on what might be called a “three-generation” model cadence: one generation in live optimization, one in core development, and one in forward-looking research — a depth of parallel iteration that most peers, domestic and international, can barely sustain two of.
The downstream consequence that surprised us most: the teams under the most pressure right now aren’t incumbent video studios. It’s the workflow tool builders — the products that stitched together Midjourney for images, ElevenLabs for voice, and editing software into a chain. End-to-end models are quietly swallowing that stack.
MiniMax: The architecture is a commercial thesis, not a technical one
MiniMax M2’s parameter count — 200B total, 10B activated — is unusual enough that it raised eyebrows across the industry. The MiniMax team was direct about the reasoning: the sizing decision wasn’t a capability trade-off. It was a bet on how agents will actually be deployed.

In real agentic workflows, users run five to ten agents in parallel. Latency shapes productivity as much as raw capability. And as agents migrate from developers to mainstream users, they become 24/7 personal assistants — always-on, high-frequency, generating enormous token volumes.
At those usage densities, frontier pricing becomes structurally unsustainable. They pointed to Talkie — MiniMax’s companion AI product — as the proof of concept: wide audience, long conversation threads, extreme usage density. The M2-her model powering Talkie(MiniMax’s companion AI app) is a post-trained variant built on the M2 base.
The implicit critique of the broader industry is sharp: a model that tops every benchmark but prices itself out of agentic deployment isn’t winning the market. It’s winning a leaderboard.
MiniMax also shared the internal benchmark they built to evaluate the coding capability — VIBE Bench — which spans the full stack of real-world software development, including cloud configuration, multi-language environments, and API integrations.
Their view on standard benchmarks was pointed: SWE-style evaluations are too narrow (focused mostly on bug fixes) and too language-homogeneous (skewing Python and TypeScript, while production systems run Go, C++, Rust).
According to their internal testing, MiniMax M2 shows significant improvements over Claude Opus on the VIBE leaderboard.
The lesson the MiniMax team kept coming back to was Anthropic’s approach to dogfooding.
They’ve moved the entire company — including legal and HR — into an agent-native operating model. If agents fail at a task internally, that failure gets formalized as the next benchmark. It’s a feedback loop they believe is the actual reason Anthropic has been able to maintain both model depth and product quality — and where it genuinely leads most domestic players.
GLM-5: Benchmarking against Anthropic, not the leaderboard
There was an internal strategic debate at Zhipu AI that came up in the conversation. One path: chase top scores in math and coding competitions, optimize for reasoning benchmarks, build toward long-horizon scientific research. The other path: prioritize economically valuable productivity tasks — real-world programming workflows, agentic problem-solving.
Zhipu’s researchers chose the second, and chose Anthropic as the explicit benchmark.
The reasoning they laid out is worth unpacking. Coding and agent tasks aren’t just harder versions of reasoning tasks — they’re structurally different. Reasoning operates within a bounded problem space where heavy thinking, and reinforcement learning alone, can produce high scores. Complex coding requires something closer to engineering intuition: identifying root causes quickly, resolving issues with minimal token usage rather than trial and error. That kind of intuition, they found experimentally, scales with model size in ways that reasoning benchmarks don’t fully capture.
GLM-5 runs at 744B total parameters with roughly 40B activated per inference — deliberately constrained to fit within a standard 8-GPU cluster, keeping it viable for real-world production deployment. (Notably, GLM-5 was trained and runs inference entirely on Huawei Ascend chips, with no dependency on NVIDIA hardware — a geopolitical statement as much as a technical one.)
The model also introduces DSA (DeepSeek Sparse Attention) to handle long-context coding workloads, where input context regularly dwarfs generated output.

The competitive positioning Zhipu’s team laid out is explicit: Anthropic currently dominates the high end of the coding market at $15-25 per million tokens. GLM-5 is targeting the upper-middle segment at roughly $3 — a price gap significant enough that even developers deeply embedded in Claude’s tooling ecosystem are paying attention. Domestic open-source models have also maintained strong compatibility with Claude skills and related tooling, making switching relatively seamless.
What this means for the broader picture
A few things from the conversation that didn’t fit neatly into any single company’s story:
Data is becoming the real moat, not compute.
Multiple participants pointed to this independently. The paradigm is shifting from compute-bound to data-bound — not because compute isn’t scarce, but because it’s becoming more accessible. The edge increasingly lives in data acquisition, cleaning, long-tail mining, and evaluation loops. Seedance 2.0’s lead in video generation, for instance, is inseparable from fine-grained work on tail-end video data. And China has structural advantages here that are underappreciated: a deep ecosystem of small, agile video production teams capable of capturing proprietary footage at scale.
Hardware scarcity is driving architecture-level innovation.
Chinese companies are working under tighter GPU constraints than their Silicon Valley counterparts. Several researchers noted that this is producing more architectural creativity at the inference layer — necessity forcing efficiency work that well-resourced labs have little incentive to do.
The market won’t consolidate to one winner.
The consensus in the room was that the AI market will look less like search (9:1 winner-take-all) and more like e-commerce — multiple players finding durable niches through different product sensibilities and user relationships. When model capabilities converge, the differentiation shifts to taste: do you build a maximally efficient tool, or a product with genuine personality?
The question we’re sitting with
We framed 2026 as a structural inflection point, not an incremental one. The Lunar New Year releases were the opening move. By mid-year, new long-horizon reasoning models and async agent frameworks are expected to enter. Every industry is showing up with real deployment use cases.
The harder question — one the room didn’t fully resolve — is what happens to the application layer. Agent capabilities are increasingly being driven by base model improvements, which means orchestration logic and system prompt design that once belonged to application developers is getting absorbed into the model itself. Kimi 2.5 reportedly baked an orchestrator-plus-subagent architecture directly into the model via RL.
For investors watching this space: the infrastructure play is maturing, the model race is compressing, and the application layer is getting thinner. The next durable positions will be carved out by products that own a specific user relationship — not by whoever has the best benchmark score this quarter.
We’ll keep hosting these sessions. The signal-to-noise ratio inside these conversations is meaningfully higher than what surfaces publicly.

Beyond DeepSeek: What China's Model Builders Are Seeing
<<< Back to All
Google is back on top. Anthropic is charging ahead. And OpenAI is facing the toughest narrative moment it’s had in years.
Some skeptics argue that OpenAI’s moat is disappearing. Models are becoming commoditized. ChatGPT lacks true network effects. Google has the edge in traffic and compute. And in high-value enterprise tasks, Anthropic appears to be pulling ahead.
To be fair, these concerns aren’t unfounded. We’re only one month into 2026, and instead of stabilizing, the model landscape has grown even more competitive. For the first time since launching ChatGPT, OpenAI finds itself playing from behind.
Still, we remain optimistic. 2026 could be a turning point for OpenAI, but nine critical questions will determine how the story unfolds.
OpenAI is feeling the impact of Gemini across three fronts: narrative, model performance, and traffic.
Narrative is where the impact is most visible.
Google’s resurgence has knocked OpenAI off the SOTA pedestal. More importantly, it’s shifted public perception. After 4o, OpenAI didn’t release a model with a dramatic leap in performance. The takeaway for many wasn’t that scaling had hit a wall, it was that OpenAI’s scaling had.
The market reaction was immediate. Since the release of Gemini 3, Google is up 20%, while SoftBank, often seen as a proxy for OpenAI exposure in public markets, is down 17%.
On the model side, OpenAI’s own missteps matter more than Gemini’s gains.
Rather than launching a new pre-training generation after 4o, OpenAI iterated on top of it, leaning heavily on RL and post-training improvements. Gemini 3 appears to have executed pre-training better than OpenAI this cycle, but it hasn’t delivered a true step-change. Meanwhile, OpenAI still leads in post-training and RL, and is actively addressing its pre-training gaps.
The more likely scenario from here: accelerated releases—Gemini 4, GPT-5.3, Opus 5—pushing Q1 into an intense benchmark race, with leadership alternating from model to model.
Until the next paradigm shift arrives, these back-and-forth wins may not mean much strategically. But the pressure is heavier on OpenAI. It lacks Google’s compute and infrastructure advantages, yet it’s simultaneously funding next-paradigm research, building the next generation of models, and serving one billion users.
However, Gemini 3 has had little impact so far. It has barely moved OpenAI’s API revenue or ChatGPT subscription revenue.
On the traffic front, OpenAI has already rebounded from its recent lows.
According to third-party tracking data, ChatGPT’s web traffic in January returned to pre-holiday levels, while mobile traffic surpassed them. On a longer time horizon, the post-holiday acceleration is even clearer.



2026 is shaping up to be a year of intensified competition and an upgraded battlefield. The test is no longer just technological strength, it’s about strategically deciding where to allocate resources.
OpenAI and Google will compete head-to-head in consumer and advertising markets, while Anthropic, leveraging its consistent strategic focus, is gaining a first-mover advantage in high-value tasks like coding, agentic AI, and Excel automation.
That doesn’t mean OpenAI or Google will ignore high-value tasks. For example, in coding, they are certain to strike at some point. It’s just that the window for consumer competition is shorter, while high-value tasks allow for long-term positioning.
Anthropic has already demonstrated the ability to continuously innovate in high-value tasks. Whether this innovation will become a durable moat or merely pave the way for OpenAI and Google will become clearer over the course of this year.
In the short term, Google can leverage a fully free strategy and its super-platforms, like its browser and search, to drive traffic to Gemini, which could slow ChatGPT’s growth. This is a luxury only a giant can afford. After all, Google has nine products with over one billion users each, while the second-place Meta has only three.

In the long term, we should be optimistic about ChatGPT’s growth. Chat and search are inevitably heading toward deep integration, and chat can handle far longer and more complex queries than search. The total volume of chat queries and usage frequency will eventually surpass search engines, meaning the user base could reach at least the same scale as search, around 5 billion monthly active users.
Currently, ChatGPT has roughly 1.2 billion MAU, and Gemini about 400 million, still far from the 5 billion mark. Even if their market share shifts from 4:1 to 1:1, ChatGPT still has room to double.
Assuming a 4:1 ratio and ChatGPT reaches 4 billion MAU, the implications are:
Optimistically, visible ARR for ChatGPT could reach $200 billion, with even larger upside potential beyond that. Conservatively, if ChatGPT and Gemini achieve a 1:1 ratio and reach 2.5 billion MAU, applying a 60% factor to the visible revenue estimate still leaves a huge upside potential.
Every decade brings a wave of fundamental shifts in user behavior, often bigger than technological improvements themselves.
The move from search to chat mirrors the transition from image-and-text browsing to short-form video. The old format doesn’t disappear, but the new one delivers a tenfold stronger experience, hitting users in a completely different dimension.
There are several parallels between AI chat and short video:
Key differences include:
Currently, Google Search handles around 14 billion queries per day, while ChatGPT processes roughly 2.5 billion prompts per day (according to OpenAI data shared with Axios as of July 2025), already about 18% of Google’s volume. In terms of consumer intent, chat shows a clear advantage, many brands were already lining up even before OpenAI officially launched ads.
With shifts in both user behavior and ad models, Google faces a significant threat. The 2C battle between Gemini and ChatGPT in 2026 will be intense.
Although OpenAI emphasizes consumer products in its messaging and Anthropic leans toward Enterprise Business, OpenAI’s Enterprise Business has consistently been underestimated.
In 2025, OpenAI’s ARR was $20 billion (revenue $13 billion), with API revenue accounting for roughly 30%, about $6 billion. In the same year, Anthropic’s ARR was around $9 billion (revenue ~$4.5 billion), with 85% coming from coding and other Enterprise Business offerings; Claude Chat subscriptions made up only 15%.
At first glance, Anthropic’s Enterprise Business revenue seems larger. In reality, OpenAI’s Enterprise Business scale is at least comparable, if not bigger:
Combined, OpenAI’s API and ChatGPT Enterprise revenues account for about 40% of total revenue, roughly $5.2 billion, still larger than Anthropic’s total revenue of $4.5 billion, according to Information.

Recently, Sam promoted the API on X, noting that it added $1 billion in ARR in the past month. OpenAI is also increasing its focus on the enterprise side. With 2C facing intense competition, emphasizing the enterprise business is a natural strategic choice.

One more point: APIs are closely tied to the cloud and may even be reshaping the cloud landscape. Anthropic was previously the only company offering a SOTA model across Azure, AWS, and GCP, capturing advantages on the enterprise business and developer side. With OpenAI’s new funding round, Amazon is highly likely to participate, which could open up new opportunities for its enterprise business.
The three keywords for ChatGPT in 2026 are likely memory, proactive, and personalization, they’re both product and research challenges.
With model pre-training and RL already in the industrialized era, Google has advantages in engineering infrastructure and TPU compute. For OpenAI to break through, it must excel in memory and proactive agents.
Memory and proactive features were pioneered by OpenAI, but they’re not yet fully realized:
Personalization is closely tied to memory and continual learning. Language models can’t yet personalize or learn user preferences in real time like recommendation algorithms, but they have the potential to go orders of magnitude deeper. Neolabs and Thinking Machines Lab in the Bay Area, as well as the recently heavily funded startup Humans&, are exploring different technical paths to achieve personalization and continual learning—creating AI that improves through interaction with users.
This isn’t just OpenAI’s potential game-changer, it’s a high ground that any AI product must capture. Only by realizing memory and personalization can AI achieve a true “data flywheel.”
In the past two paradigm shifts, model scaling and reasoning models, OpenAI led the way. It still has a strong chance of pioneering continual learning, widely recognized by top researchers in China and the U.S. as the next major paradigm.
If OpenAI hadn’t undergone organizational changes, its probability of leading the next paradigm would be the highest.
Historically, OpenAI struck an effective organizational balance, combining top-down coordination with bottom-up innovation. This allowed large-scale deployment of personnel for model training while encouraging grassroots innovation. Innovations like reasoning models and mini O3 emerged from the bottom up, and OpenAI consistently allocated ample compute to frontier research.
By comparison, Anthropic has limited resources and remains highly focused on coding and agentic AI. xAI started later and has been chasing SOTA models, leaving little bandwidth for exploratory research; Meta is similar.
Now, OpenAI has experienced multiple researcher departures and its focus has been split by commercialization and product demands. As a result, I estimate that OpenAI, Google, and Bay Area neolabs each have roughly a one-third chance of pioneering the next paradigm.
Google’s advantage lies in its dense talent pool and abundant resources, there’s always someone internally experimenting with something different.
Neolabs, including SSI, TML, Isara, Humans&, and Core Automation, have sprung up in the Bay Area, founded specifically to create the next paradigm. They are highly focused and include exceptional talents like Ilya Sutskever.
Even if OpenAI isn’t the first to create the next paradigm, once it emerges, the company has the ability to catch up quickly, and holds the strongest advantage in product integration.
OpenAI’s paid subscription rate is currently around 5%, so advertising remains the most effective monetization method for consumer scenarios. Even Netflix focused heavily on ad monetization last year.
Current ads are priced on a CPM basis at roughly $60 per thousand impressions, comparable to top-tier video ads like those during the NFL. This likely reflects OpenAI’s confidence in ad targeting. Users can also interact directly with brands after seeing an ad, representing a new form of advertising innovation in the AI era.

Advertisers’ demand for ChatGPT is bound to be huge. In long conversations, users reveal far more intent, and LLMs excel at recognizing it. Combined, this creates a “gold mine” far richer and easier to tap than anything accumulated before.
The challenge is that mining this gold requires building a full advertising system, infrastructure, and ecosystem, an extremely complex task, likely different from traditional ad models. Within the next year, ad revenue may not scale significantly; early results will likely come from case studies and marketing.
Beyond advertising, ChatGPT’s bigger potential lies in e-commerce.
TikTok has already proven the power of an “ad platform + e-commerce loop” in China. In 2024, TikTok’s e-commerce GMV exceeded ¥3 trillion, creating a loop that makes its per-user value far higher than a pure ad platform.
By contrast, Google and Meta have both struggled to close the e-commerce loop. ChatGPT is pursuing a different path, and its progress is faster than commonly perceived.
Instant Checkout has already integrated with Shopify, with a 4% take rate, connecting over 1 million Shopify merchants. Etsy is live, and major retailers like Walmart are following. More importantly, OpenAI partnered with Stripe to launch the Agentic Commerce Protocol and chose to open-source it, signaling an attempt to set platform standards.
OpenAI’s goal by the end of 2027 is to generate $11 billion in annual revenue from non-paying users, primarily through ads and e-commerce.
Over a 3–5 year horizon, ChatGPT could become the first non-Amazon player to establish a fully internalized e-commerce ecosystem in the U.S. market. This potential far exceeds ad revenue alone—the ceiling for advertising is Google’s ~$300 billion, whereas global e-commerce GMV exceeds $6 trillion. A 4% take rate means every $100 billion GMV generates $4 billion in revenue.
One concern about OpenAI is that it has pioneered a new entry point for the LLM era, but does that mean it’s the final one? AI is still in its very early stages. If chatbots are disrupted by a completely new interaction mode, or if the next major entry point isn’t chat/information but agents, tasks, or entirely new hardware, could OpenAI fade like Yahoo?
The possibility exists, but it’s very low. Yahoo made two mistakes OpenAI is unlikely to repeat:
Even so, Yahoo remained a top-tier internet company for a decade.
Today, information and talent flow is extremely transparent. No lab would underestimate a key technology or be foolish enough to feed a competitor. Perhaps a single product, like ChatGPT, could fade like Yahoo, but OpenAI itself will not. Nor will Google.
In fact, this may be the first time in Silicon Valley history that a startup challenges a giant, and the giant isn’t an elderly, rusty competitor scrambling to catch up, it’s a master swordsman, at the peak of his skill, who has spent the last decade forging a legendary blade. OpenAI is engaged in a hard-fought battle worthy of respect.
We’ve been tracking China’s model ecosystem closely. After the Lunar New Year releases landed , Seedance 2.0(&Seed2.0), GLM-5, and MiniMax M2.5, we wanted to go beyond the benchmarks and the press releases. So we organized a closed-door discussion inside our Best Ideas community.
The session brought together a mix of voices: researchers from Zhipu, MiniMax, and ByteDance who are building these models, alongside developers and practitioners who are deploying them in production. What follows reflects the full conversation — not any single participant’s view.
Some of them confirmed what we expected. A lot of it didn’t.
This analysis and these judgments were compiled from a collective Best Ideas community discussion.
Seedance 2.0: For the first time, a Chinese model isn’t catching up — it’s leading
The framing that matters most about Seedance 2.0 isn’t technical — it’s historic. For the first time, a Chinese foundation model isn’t just competitive globally, it’s decisively ahead.
In video generation, the consensus in the room was that Seedance 2.0 stands at least a generation in front of Sora 2 and Veo 3. This isn’t a claim about catching up. It’s a claim about leading.
What makes that lead meaningful is where it lands: efficiency and usability, not just quality scores.
Seedance collapses the gap between idea and output in ways that restructure the economics of content creation entirely.
For non-professional creators, a full day of iteration compresses into thirty minutes. At that ratio, the traditional grammar of film and TV production, its workflows, budgets, and gatekeepers, doesn’t just become less relevant. It becomes a relic.
The room’s projection: within 6 to 12 months, for under $14, an ordinary person could plausibly produce a 30-40 minute film at Netflix-level image quality. The supply of content won’t just grow — it will compound.
The disruption runs in two directions:
Apprantly, the latter one looks more exciting to the community.
The shift from lottery to reliability is what makes this real. Older video models operated on what participants called a “roulette dynamic”: generate an eight-second clip, get three or four usable seconds, reroll. Seedance 2.0 broke that pattern. A single 15s-generation now largely meets commercial standards for semantic consistency and shot continuity — cinematic pacing, micro-expressions, the physical logic of how one shot leads into the next.
Video creation has moved from a roulette dynamic to something industrial-grade.
The discussion was direct about why ByteDance’s lead here is likely to persist. It’s not just compute or data. It’s organizational depth.
ByteDance appears to operate on what might be called a “three-generation” model cadence: one generation in live optimization, one in core development, and one in forward-looking research — a depth of parallel iteration that most peers, domestic and international, can barely sustain two of.
The downstream consequence that surprised us most: the teams under the most pressure right now aren’t incumbent video studios. It’s the workflow tool builders — the products that stitched together Midjourney for images, ElevenLabs for voice, and editing software into a chain. End-to-end models are quietly swallowing that stack.
MiniMax: The architecture is a commercial thesis, not a technical one
MiniMax M2’s parameter count — 200B total, 10B activated — is unusual enough that it raised eyebrows across the industry. The MiniMax team was direct about the reasoning: the sizing decision wasn’t a capability trade-off. It was a bet on how agents will actually be deployed.

In real agentic workflows, users run five to ten agents in parallel. Latency shapes productivity as much as raw capability. And as agents migrate from developers to mainstream users, they become 24/7 personal assistants — always-on, high-frequency, generating enormous token volumes.
At those usage densities, frontier pricing becomes structurally unsustainable. They pointed to Talkie — MiniMax’s companion AI product — as the proof of concept: wide audience, long conversation threads, extreme usage density. The M2-her model powering Talkie(MiniMax’s companion AI app) is a post-trained variant built on the M2 base.
The implicit critique of the broader industry is sharp: a model that tops every benchmark but prices itself out of agentic deployment isn’t winning the market. It’s winning a leaderboard.
MiniMax also shared the internal benchmark they built to evaluate the coding capability — VIBE Bench — which spans the full stack of real-world software development, including cloud configuration, multi-language environments, and API integrations.
Their view on standard benchmarks was pointed: SWE-style evaluations are too narrow (focused mostly on bug fixes) and too language-homogeneous (skewing Python and TypeScript, while production systems run Go, C++, Rust).
According to their internal testing, MiniMax M2 shows significant improvements over Claude Opus on the VIBE leaderboard.
The lesson the MiniMax team kept coming back to was Anthropic’s approach to dogfooding.
They’ve moved the entire company — including legal and HR — into an agent-native operating model. If agents fail at a task internally, that failure gets formalized as the next benchmark. It’s a feedback loop they believe is the actual reason Anthropic has been able to maintain both model depth and product quality — and where it genuinely leads most domestic players.
GLM-5: Benchmarking against Anthropic, not the leaderboard
There was an internal strategic debate at Zhipu AI that came up in the conversation. One path: chase top scores in math and coding competitions, optimize for reasoning benchmarks, build toward long-horizon scientific research. The other path: prioritize economically valuable productivity tasks — real-world programming workflows, agentic problem-solving.
Zhipu’s researchers chose the second, and chose Anthropic as the explicit benchmark.
The reasoning they laid out is worth unpacking. Coding and agent tasks aren’t just harder versions of reasoning tasks — they’re structurally different. Reasoning operates within a bounded problem space where heavy thinking, and reinforcement learning alone, can produce high scores. Complex coding requires something closer to engineering intuition: identifying root causes quickly, resolving issues with minimal token usage rather than trial and error. That kind of intuition, they found experimentally, scales with model size in ways that reasoning benchmarks don’t fully capture.
GLM-5 runs at 744B total parameters with roughly 40B activated per inference — deliberately constrained to fit within a standard 8-GPU cluster, keeping it viable for real-world production deployment. (Notably, GLM-5 was trained and runs inference entirely on Huawei Ascend chips, with no dependency on NVIDIA hardware — a geopolitical statement as much as a technical one.)
The model also introduces DSA (DeepSeek Sparse Attention) to handle long-context coding workloads, where input context regularly dwarfs generated output.

The competitive positioning Zhipu’s team laid out is explicit: Anthropic currently dominates the high end of the coding market at $15-25 per million tokens. GLM-5 is targeting the upper-middle segment at roughly $3 — a price gap significant enough that even developers deeply embedded in Claude’s tooling ecosystem are paying attention. Domestic open-source models have also maintained strong compatibility with Claude skills and related tooling, making switching relatively seamless.
What this means for the broader picture
A few things from the conversation that didn’t fit neatly into any single company’s story:
Data is becoming the real moat, not compute.
Multiple participants pointed to this independently. The paradigm is shifting from compute-bound to data-bound — not because compute isn’t scarce, but because it’s becoming more accessible. The edge increasingly lives in data acquisition, cleaning, long-tail mining, and evaluation loops. Seedance 2.0’s lead in video generation, for instance, is inseparable from fine-grained work on tail-end video data. And China has structural advantages here that are underappreciated: a deep ecosystem of small, agile video production teams capable of capturing proprietary footage at scale.
Hardware scarcity is driving architecture-level innovation.
Chinese companies are working under tighter GPU constraints than their Silicon Valley counterparts. Several researchers noted that this is producing more architectural creativity at the inference layer — necessity forcing efficiency work that well-resourced labs have little incentive to do.
The market won’t consolidate to one winner.
The consensus in the room was that the AI market will look less like search (9:1 winner-take-all) and more like e-commerce — multiple players finding durable niches through different product sensibilities and user relationships. When model capabilities converge, the differentiation shifts to taste: do you build a maximally efficient tool, or a product with genuine personality?
The question we’re sitting with
We framed 2026 as a structural inflection point, not an incremental one. The Lunar New Year releases were the opening move. By mid-year, new long-horizon reasoning models and async agent frameworks are expected to enter. Every industry is showing up with real deployment use cases.
The harder question — one the room didn’t fully resolve — is what happens to the application layer. Agent capabilities are increasingly being driven by base model improvements, which means orchestration logic and system prompt design that once belonged to application developers is getting absorbed into the model itself. Kimi 2.5 reportedly baked an orchestrator-plus-subagent architecture directly into the model via RL.
For investors watching this space: the infrastructure play is maturing, the model race is compressing, and the application layer is getting thinner. The next durable positions will be carved out by products that own a specific user relationship — not by whoever has the best benchmark score this quarter.
We’ll keep hosting these sessions. The signal-to-noise ratio inside these conversations is meaningfully higher than what surfaces publicly.

How OpenAI Could Turn the Tables: 9 Questions to Answer
<<< Back to All
Google is back on top. Anthropic is charging ahead. And OpenAI is facing the toughest narrative moment it’s had in years.
Some skeptics argue that OpenAI’s moat is disappearing. Models are becoming commoditized. ChatGPT lacks true network effects. Google has the edge in traffic and compute. And in high-value enterprise tasks, Anthropic appears to be pulling ahead.
To be fair, these concerns aren’t unfounded. We’re only one month into 2026, and instead of stabilizing, the model landscape has grown even more competitive. For the first time since launching ChatGPT, OpenAI finds itself playing from behind.
Still, we remain optimistic. 2026 could be a turning point for OpenAI, but nine critical questions will determine how the story unfolds.
OpenAI is feeling the impact of Gemini across three fronts: narrative, model performance, and traffic.
Narrative is where the impact is most visible.
Google’s resurgence has knocked OpenAI off the SOTA pedestal. More importantly, it’s shifted public perception. After 4o, OpenAI didn’t release a model with a dramatic leap in performance. The takeaway for many wasn’t that scaling had hit a wall, it was that OpenAI’s scaling had.
The market reaction was immediate. Since the release of Gemini 3, Google is up 20%, while SoftBank, often seen as a proxy for OpenAI exposure in public markets, is down 17%.
On the model side, OpenAI’s own missteps matter more than Gemini’s gains.
Rather than launching a new pre-training generation after 4o, OpenAI iterated on top of it, leaning heavily on RL and post-training improvements. Gemini 3 appears to have executed pre-training better than OpenAI this cycle, but it hasn’t delivered a true step-change. Meanwhile, OpenAI still leads in post-training and RL, and is actively addressing its pre-training gaps.
The more likely scenario from here: accelerated releases—Gemini 4, GPT-5.3, Opus 5—pushing Q1 into an intense benchmark race, with leadership alternating from model to model.
Until the next paradigm shift arrives, these back-and-forth wins may not mean much strategically. But the pressure is heavier on OpenAI. It lacks Google’s compute and infrastructure advantages, yet it’s simultaneously funding next-paradigm research, building the next generation of models, and serving one billion users.
However, Gemini 3 has had little impact so far. It has barely moved OpenAI’s API revenue or ChatGPT subscription revenue.
On the traffic front, OpenAI has already rebounded from its recent lows.
According to third-party tracking data, ChatGPT’s web traffic in January returned to pre-holiday levels, while mobile traffic surpassed them. On a longer time horizon, the post-holiday acceleration is even clearer.



2026 is shaping up to be a year of intensified competition and an upgraded battlefield. The test is no longer just technological strength, it’s about strategically deciding where to allocate resources.
OpenAI and Google will compete head-to-head in consumer and advertising markets, while Anthropic, leveraging its consistent strategic focus, is gaining a first-mover advantage in high-value tasks like coding, agentic AI, and Excel automation.
That doesn’t mean OpenAI or Google will ignore high-value tasks. For example, in coding, they are certain to strike at some point. It’s just that the window for consumer competition is shorter, while high-value tasks allow for long-term positioning.
Anthropic has already demonstrated the ability to continuously innovate in high-value tasks. Whether this innovation will become a durable moat or merely pave the way for OpenAI and Google will become clearer over the course of this year.
In the short term, Google can leverage a fully free strategy and its super-platforms, like its browser and search, to drive traffic to Gemini, which could slow ChatGPT’s growth. This is a luxury only a giant can afford. After all, Google has nine products with over one billion users each, while the second-place Meta has only three.

In the long term, we should be optimistic about ChatGPT’s growth. Chat and search are inevitably heading toward deep integration, and chat can handle far longer and more complex queries than search. The total volume of chat queries and usage frequency will eventually surpass search engines, meaning the user base could reach at least the same scale as search, around 5 billion monthly active users.
Currently, ChatGPT has roughly 1.2 billion MAU, and Gemini about 400 million, still far from the 5 billion mark. Even if their market share shifts from 4:1 to 1:1, ChatGPT still has room to double.
Assuming a 4:1 ratio and ChatGPT reaches 4 billion MAU, the implications are:
Optimistically, visible ARR for ChatGPT could reach $200 billion, with even larger upside potential beyond that. Conservatively, if ChatGPT and Gemini achieve a 1:1 ratio and reach 2.5 billion MAU, applying a 60% factor to the visible revenue estimate still leaves a huge upside potential.
Every decade brings a wave of fundamental shifts in user behavior, often bigger than technological improvements themselves.
The move from search to chat mirrors the transition from image-and-text browsing to short-form video. The old format doesn’t disappear, but the new one delivers a tenfold stronger experience, hitting users in a completely different dimension.
There are several parallels between AI chat and short video:
Key differences include:
Currently, Google Search handles around 14 billion queries per day, while ChatGPT processes roughly 2.5 billion prompts per day (according to OpenAI data shared with Axios as of July 2025), already about 18% of Google’s volume. In terms of consumer intent, chat shows a clear advantage, many brands were already lining up even before OpenAI officially launched ads.
With shifts in both user behavior and ad models, Google faces a significant threat. The 2C battle between Gemini and ChatGPT in 2026 will be intense.
Although OpenAI emphasizes consumer products in its messaging and Anthropic leans toward Enterprise Business, OpenAI’s Enterprise Business has consistently been underestimated.
In 2025, OpenAI’s ARR was $20 billion (revenue $13 billion), with API revenue accounting for roughly 30%, about $6 billion. In the same year, Anthropic’s ARR was around $9 billion (revenue ~$4.5 billion), with 85% coming from coding and other Enterprise Business offerings; Claude Chat subscriptions made up only 15%.
At first glance, Anthropic’s Enterprise Business revenue seems larger. In reality, OpenAI’s Enterprise Business scale is at least comparable, if not bigger:
Combined, OpenAI’s API and ChatGPT Enterprise revenues account for about 40% of total revenue, roughly $5.2 billion, still larger than Anthropic’s total revenue of $4.5 billion, according to Information.

Recently, Sam promoted the API on X, noting that it added $1 billion in ARR in the past month. OpenAI is also increasing its focus on the enterprise side. With 2C facing intense competition, emphasizing the enterprise business is a natural strategic choice.

One more point: APIs are closely tied to the cloud and may even be reshaping the cloud landscape. Anthropic was previously the only company offering a SOTA model across Azure, AWS, and GCP, capturing advantages on the enterprise business and developer side. With OpenAI’s new funding round, Amazon is highly likely to participate, which could open up new opportunities for its enterprise business.
The three keywords for ChatGPT in 2026 are likely memory, proactive, and personalization, they’re both product and research challenges.
With model pre-training and RL already in the industrialized era, Google has advantages in engineering infrastructure and TPU compute. For OpenAI to break through, it must excel in memory and proactive agents.
Memory and proactive features were pioneered by OpenAI, but they’re not yet fully realized:
Personalization is closely tied to memory and continual learning. Language models can’t yet personalize or learn user preferences in real time like recommendation algorithms, but they have the potential to go orders of magnitude deeper. Neolabs and Thinking Machines Lab in the Bay Area, as well as the recently heavily funded startup Humans&, are exploring different technical paths to achieve personalization and continual learning—creating AI that improves through interaction with users.
This isn’t just OpenAI’s potential game-changer, it’s a high ground that any AI product must capture. Only by realizing memory and personalization can AI achieve a true “data flywheel.”
In the past two paradigm shifts, model scaling and reasoning models, OpenAI led the way. It still has a strong chance of pioneering continual learning, widely recognized by top researchers in China and the U.S. as the next major paradigm.
If OpenAI hadn’t undergone organizational changes, its probability of leading the next paradigm would be the highest.
Historically, OpenAI struck an effective organizational balance, combining top-down coordination with bottom-up innovation. This allowed large-scale deployment of personnel for model training while encouraging grassroots innovation. Innovations like reasoning models and mini O3 emerged from the bottom up, and OpenAI consistently allocated ample compute to frontier research.
By comparison, Anthropic has limited resources and remains highly focused on coding and agentic AI. xAI started later and has been chasing SOTA models, leaving little bandwidth for exploratory research; Meta is similar.
Now, OpenAI has experienced multiple researcher departures and its focus has been split by commercialization and product demands. As a result, I estimate that OpenAI, Google, and Bay Area neolabs each have roughly a one-third chance of pioneering the next paradigm.
Google’s advantage lies in its dense talent pool and abundant resources, there’s always someone internally experimenting with something different.
Neolabs, including SSI, TML, Isara, Humans&, and Core Automation, have sprung up in the Bay Area, founded specifically to create the next paradigm. They are highly focused and include exceptional talents like Ilya Sutskever.
Even if OpenAI isn’t the first to create the next paradigm, once it emerges, the company has the ability to catch up quickly, and holds the strongest advantage in product integration.
OpenAI’s paid subscription rate is currently around 5%, so advertising remains the most effective monetization method for consumer scenarios. Even Netflix focused heavily on ad monetization last year.
Current ads are priced on a CPM basis at roughly $60 per thousand impressions, comparable to top-tier video ads like those during the NFL. This likely reflects OpenAI’s confidence in ad targeting. Users can also interact directly with brands after seeing an ad, representing a new form of advertising innovation in the AI era.

Advertisers’ demand for ChatGPT is bound to be huge. In long conversations, users reveal far more intent, and LLMs excel at recognizing it. Combined, this creates a “gold mine” far richer and easier to tap than anything accumulated before.
The challenge is that mining this gold requires building a full advertising system, infrastructure, and ecosystem, an extremely complex task, likely different from traditional ad models. Within the next year, ad revenue may not scale significantly; early results will likely come from case studies and marketing.
Beyond advertising, ChatGPT’s bigger potential lies in e-commerce.
TikTok has already proven the power of an “ad platform + e-commerce loop” in China. In 2024, TikTok’s e-commerce GMV exceeded ¥3 trillion, creating a loop that makes its per-user value far higher than a pure ad platform.
By contrast, Google and Meta have both struggled to close the e-commerce loop. ChatGPT is pursuing a different path, and its progress is faster than commonly perceived.
Instant Checkout has already integrated with Shopify, with a 4% take rate, connecting over 1 million Shopify merchants. Etsy is live, and major retailers like Walmart are following. More importantly, OpenAI partnered with Stripe to launch the Agentic Commerce Protocol and chose to open-source it, signaling an attempt to set platform standards.
OpenAI’s goal by the end of 2027 is to generate $11 billion in annual revenue from non-paying users, primarily through ads and e-commerce.
Over a 3–5 year horizon, ChatGPT could become the first non-Amazon player to establish a fully internalized e-commerce ecosystem in the U.S. market. This potential far exceeds ad revenue alone—the ceiling for advertising is Google’s ~$300 billion, whereas global e-commerce GMV exceeds $6 trillion. A 4% take rate means every $100 billion GMV generates $4 billion in revenue.
One concern about OpenAI is that it has pioneered a new entry point for the LLM era, but does that mean it’s the final one? AI is still in its very early stages. If chatbots are disrupted by a completely new interaction mode, or if the next major entry point isn’t chat/information but agents, tasks, or entirely new hardware, could OpenAI fade like Yahoo?
The possibility exists, but it’s very low. Yahoo made two mistakes OpenAI is unlikely to repeat:
Even so, Yahoo remained a top-tier internet company for a decade.
Today, information and talent flow is extremely transparent. No lab would underestimate a key technology or be foolish enough to feed a competitor. Perhaps a single product, like ChatGPT, could fade like Yahoo, but OpenAI itself will not. Nor will Google.
In fact, this may be the first time in Silicon Valley history that a startup challenges a giant, and the giant isn’t an elderly, rusty competitor scrambling to catch up, it’s a master swordsman, at the peak of his skill, who has spent the last decade forging a legendary blade. OpenAI is engaged in a hard-fought battle worthy of respect.
We’ve been tracking China’s model ecosystem closely. After the Lunar New Year releases landed , Seedance 2.0(&Seed2.0), GLM-5, and MiniMax M2.5, we wanted to go beyond the benchmarks and the press releases. So we organized a closed-door discussion inside our Best Ideas community.
The session brought together a mix of voices: researchers from Zhipu, MiniMax, and ByteDance who are building these models, alongside developers and practitioners who are deploying them in production. What follows reflects the full conversation — not any single participant’s view.
Some of them confirmed what we expected. A lot of it didn’t.
This analysis and these judgments were compiled from a collective Best Ideas community discussion.
Seedance 2.0: For the first time, a Chinese model isn’t catching up — it’s leading
The framing that matters most about Seedance 2.0 isn’t technical — it’s historic. For the first time, a Chinese foundation model isn’t just competitive globally, it’s decisively ahead.
In video generation, the consensus in the room was that Seedance 2.0 stands at least a generation in front of Sora 2 and Veo 3. This isn’t a claim about catching up. It’s a claim about leading.
What makes that lead meaningful is where it lands: efficiency and usability, not just quality scores.
Seedance collapses the gap between idea and output in ways that restructure the economics of content creation entirely.
For non-professional creators, a full day of iteration compresses into thirty minutes. At that ratio, the traditional grammar of film and TV production, its workflows, budgets, and gatekeepers, doesn’t just become less relevant. It becomes a relic.
The room’s projection: within 6 to 12 months, for under $14, an ordinary person could plausibly produce a 30-40 minute film at Netflix-level image quality. The supply of content won’t just grow — it will compound.
The disruption runs in two directions:
Apprantly, the latter one looks more exciting to the community.
The shift from lottery to reliability is what makes this real. Older video models operated on what participants called a “roulette dynamic”: generate an eight-second clip, get three or four usable seconds, reroll. Seedance 2.0 broke that pattern. A single 15s-generation now largely meets commercial standards for semantic consistency and shot continuity — cinematic pacing, micro-expressions, the physical logic of how one shot leads into the next.
Video creation has moved from a roulette dynamic to something industrial-grade.
The discussion was direct about why ByteDance’s lead here is likely to persist. It’s not just compute or data. It’s organizational depth.
ByteDance appears to operate on what might be called a “three-generation” model cadence: one generation in live optimization, one in core development, and one in forward-looking research — a depth of parallel iteration that most peers, domestic and international, can barely sustain two of.
The downstream consequence that surprised us most: the teams under the most pressure right now aren’t incumbent video studios. It’s the workflow tool builders — the products that stitched together Midjourney for images, ElevenLabs for voice, and editing software into a chain. End-to-end models are quietly swallowing that stack.
MiniMax: The architecture is a commercial thesis, not a technical one
MiniMax M2’s parameter count — 200B total, 10B activated — is unusual enough that it raised eyebrows across the industry. The MiniMax team was direct about the reasoning: the sizing decision wasn’t a capability trade-off. It was a bet on how agents will actually be deployed.

In real agentic workflows, users run five to ten agents in parallel. Latency shapes productivity as much as raw capability. And as agents migrate from developers to mainstream users, they become 24/7 personal assistants — always-on, high-frequency, generating enormous token volumes.
At those usage densities, frontier pricing becomes structurally unsustainable. They pointed to Talkie — MiniMax’s companion AI product — as the proof of concept: wide audience, long conversation threads, extreme usage density. The M2-her model powering Talkie(MiniMax’s companion AI app) is a post-trained variant built on the M2 base.
The implicit critique of the broader industry is sharp: a model that tops every benchmark but prices itself out of agentic deployment isn’t winning the market. It’s winning a leaderboard.
MiniMax also shared the internal benchmark they built to evaluate the coding capability — VIBE Bench — which spans the full stack of real-world software development, including cloud configuration, multi-language environments, and API integrations.
Their view on standard benchmarks was pointed: SWE-style evaluations are too narrow (focused mostly on bug fixes) and too language-homogeneous (skewing Python and TypeScript, while production systems run Go, C++, Rust).
According to their internal testing, MiniMax M2 shows significant improvements over Claude Opus on the VIBE leaderboard.
The lesson the MiniMax team kept coming back to was Anthropic’s approach to dogfooding.
They’ve moved the entire company — including legal and HR — into an agent-native operating model. If agents fail at a task internally, that failure gets formalized as the next benchmark. It’s a feedback loop they believe is the actual reason Anthropic has been able to maintain both model depth and product quality — and where it genuinely leads most domestic players.
GLM-5: Benchmarking against Anthropic, not the leaderboard
There was an internal strategic debate at Zhipu AI that came up in the conversation. One path: chase top scores in math and coding competitions, optimize for reasoning benchmarks, build toward long-horizon scientific research. The other path: prioritize economically valuable productivity tasks — real-world programming workflows, agentic problem-solving.
Zhipu’s researchers chose the second, and chose Anthropic as the explicit benchmark.
The reasoning they laid out is worth unpacking. Coding and agent tasks aren’t just harder versions of reasoning tasks — they’re structurally different. Reasoning operates within a bounded problem space where heavy thinking, and reinforcement learning alone, can produce high scores. Complex coding requires something closer to engineering intuition: identifying root causes quickly, resolving issues with minimal token usage rather than trial and error. That kind of intuition, they found experimentally, scales with model size in ways that reasoning benchmarks don’t fully capture.
GLM-5 runs at 744B total parameters with roughly 40B activated per inference — deliberately constrained to fit within a standard 8-GPU cluster, keeping it viable for real-world production deployment. (Notably, GLM-5 was trained and runs inference entirely on Huawei Ascend chips, with no dependency on NVIDIA hardware — a geopolitical statement as much as a technical one.)
The model also introduces DSA (DeepSeek Sparse Attention) to handle long-context coding workloads, where input context regularly dwarfs generated output.

The competitive positioning Zhipu’s team laid out is explicit: Anthropic currently dominates the high end of the coding market at $15-25 per million tokens. GLM-5 is targeting the upper-middle segment at roughly $3 — a price gap significant enough that even developers deeply embedded in Claude’s tooling ecosystem are paying attention. Domestic open-source models have also maintained strong compatibility with Claude skills and related tooling, making switching relatively seamless.
What this means for the broader picture
A few things from the conversation that didn’t fit neatly into any single company’s story:
Data is becoming the real moat, not compute.
Multiple participants pointed to this independently. The paradigm is shifting from compute-bound to data-bound — not because compute isn’t scarce, but because it’s becoming more accessible. The edge increasingly lives in data acquisition, cleaning, long-tail mining, and evaluation loops. Seedance 2.0’s lead in video generation, for instance, is inseparable from fine-grained work on tail-end video data. And China has structural advantages here that are underappreciated: a deep ecosystem of small, agile video production teams capable of capturing proprietary footage at scale.
Hardware scarcity is driving architecture-level innovation.
Chinese companies are working under tighter GPU constraints than their Silicon Valley counterparts. Several researchers noted that this is producing more architectural creativity at the inference layer — necessity forcing efficiency work that well-resourced labs have little incentive to do.
The market won’t consolidate to one winner.
The consensus in the room was that the AI market will look less like search (9:1 winner-take-all) and more like e-commerce — multiple players finding durable niches through different product sensibilities and user relationships. When model capabilities converge, the differentiation shifts to taste: do you build a maximally efficient tool, or a product with genuine personality?
The question we’re sitting with
We framed 2026 as a structural inflection point, not an incremental one. The Lunar New Year releases were the opening move. By mid-year, new long-horizon reasoning models and async agent frameworks are expected to enter. Every industry is showing up with real deployment use cases.
The harder question — one the room didn’t fully resolve — is what happens to the application layer. Agent capabilities are increasingly being driven by base model improvements, which means orchestration logic and system prompt design that once belonged to application developers is getting absorbed into the model itself. Kimi 2.5 reportedly baked an orchestrator-plus-subagent architecture directly into the model via RL.
For investors watching this space: the infrastructure play is maturing, the model race is compressing, and the application layer is getting thinner. The next durable positions will be carved out by products that own a specific user relationship — not by whoever has the best benchmark score this quarter.
We’ll keep hosting these sessions. The signal-to-noise ratio inside these conversations is meaningfully higher than what surfaces publicly.