Domestic Large Model: This Script is Different

Economic Observer Follow 2026-04-04 13:09

Economic Observer reporter Zheng Chenye

By the end of 2025, the annual usage report released by OpenRouter, the world's largest AI model aggregation platform, shows that 47% of its user base comes from the United States, with Chinese developers accounting for 6%. In addition, English accounts for 83% of the platform's call content, while Chinese accounts for less than 5%.

As of the week of April 3, 2026, six of the top ten models in terms of platform call volume were from China. The order of adjustment quantity from high to low is: Xiaomi MiMo-V2-Pro, Step3.5 Flash, MiniMaxM2.7, DeepSeekV3.2, Zhipu GLM5Turbo, and MiniMaxM2.5. Among them, Xiaomi MiMo-V2-Pro ranks first on the entire platform with 4.82 trillion tokens.

In fact, since February 9-15, 2026, when the number of model calls in China exceeded that of the United States for the first time that week, China's model has been leading for nearly two months.

The OpenRouter platform aggregates over 400 AI models, covering more than 60 suppliers, and its call volume data is regarded as one of the windows for observing global developer model selection preferences. Developers can switch between different models at any time using the same APIKey (a key used for verifying identity and calling services).

Chris Clark, co-founder and COO of OpenRouter, publicly stated in February 2026 that the proportion of Chinese open-source models in Agent workflows run by American enterprises is "disproportionately high". Meanwhile, there is an increasing discussion in the developer community regarding task allocation and cost optimization between models.

Some argue that this phenomenon can be compared to the manufacturing industry in China 30 years ago: at that time, China entered the assembly stage of the global electronics industry chain with its cost advantage, giving rise to contract manufacturing companies such as Foxconn and Lite On Precision; Nowadays, Chinese big models are also entering the execution stage of the global AI industry chain with price advantages. Some also view domestic large models as the 'Foxconn of the AI era'.

What role do domestic large models play in the AI industry chain? How high is the value of this character?

Price Advantage

Economic Observer reporters have reviewed the official API pricing of various manufacturers as of the end of March 2026 and found that there is a huge price gap between mainstream models in China and the United States.

Taking input prices as an example, in the Chinese model, DeepSeekV3.2 is $0.28 per million tokens, MiniMaxM2.5 is $0.3, and the dark side of the month KimiK2.5 is $0.42. In the American model, An thropic Claude Opus 4.6 costs $5, while OpenAIGPT-5.4 costs $2.50. The input price of mainstream models in the United States is about 10 to 20 times that of mainstream models in China.

The price difference in output is more pronounced. In terms of Chinese models, DeepSeekV3.2 is priced at $0.42 per million Tokens, MiniMaxM2.5 is priced at $1.1, and KimiK2.5 on the dark side of the month is priced at $2.2. In terms of American models, OpenAIGPT-5.4 is priced at $15, while ClaudeOpus 4.6 is priced at $25. The price gap between the mainstream models of China and the United States is about 7 to 60 times.

The above price difference has always existed and has not triggered large-scale user migration before. The reason is simple: the main scenario for most people to use AI is chatting, and the token consumption is low, so the impact of the price difference is minimal.

But in early 2026, the appearance of a "lobster" changed everything.

The open-source tool OpenClaw (known as "Lobster" in the developer community) quickly became popular around February 2026 and quickly topped the OpenRouter app ranking after its launch, with a weekly consumption of over 600 billion To ken. Lobster "belongs to intelligent agent applications, which is different from the chat mode of" you ask me, I answer "in the past. It allows AI to autonomously perform tasks such as programming, testing, and file management on computers without the need for gradual human intervention.

In this working mode, the token consumption is not on the same scale as the chat scenario.

For example, a programming task may need to go through dozens of rounds of "write code run error modify run again" loops, each round being a complete model call. In order for the agent to remember previous operations, each call also needs to call the conversation history.

Developers have expressed on social media that an active OpenClaw session context can easily inflate to over 230000 tokens. If Claude API is used throughout the entire process, the monthly fee may be between $800 and $1500. Some users also claimed that an improperly configured automated task burned $200 in a day.

Intelligent agent applications represented by OpenClaw have driven up the entire platform's token consumption. For example, during the week of March 3-9, 2025, the top ten OpenRouter models had a total weekly call volume of 1.24 trillion tokens. By the week of February 16th to 22nd, 2026, the weekly usage of the top ten models alone exceeded 8.7 trillion tokens, an increase of nearly 7 times. The proportion of programming tasks in platform token consumption has also increased from 11% at the beginning of 2025 to over 50% by the end of 2025.

When the consumption of tokens for a single task increases from a few thousand to several hundred thousand, the price gap between the Chinese and American models shifts from negligible costs to significant differences of hundreds or even thousands of dollars per month.

Around February 19, 2026, American modeling company Anthropic updated its terms of service, prohibiting users from using Claude's subscription account credentials to access third-party tools such as OpenClaw, and requiring billing based on API usage. Subsequently, Google also introduced similar restrictions. For intelligent applications that require frequent API calls every day, the price factor in model selection becomes an unavoidable issue, and developers are pushed onto the pay as you go track.

In the programming scenario of intelligent agent core, the capabilities of Chinese and American models are relatively close.

SWE BenchVerified is a publicly available programming proficiency assessment maintained by a research team at Princeton University, which involves using AI models to fix real code issues on GitHub, the world's largest open-source code hosting platform. According to the publicly available ranking of the evaluation, the Chinese model MiniMaxM2.5 released on February 13, 2026 received 80.2%, while the American model ClaudeOps4.6 released on February 5 received 80.8%, with a difference of only 0.6 percentage points between the two.

In situations where abilities are similar but prices are vastly different, developers' choices are quickly reflected in the data.

During the week of February 9-15, 2026, the call volume of Chinese model tokens reached 4.12 trillion, surpassing the 2.94 trillion of American models for the first time. In the following week, the number of model calls in China rose to 5.16 trillion, an increase of 127% in three weeks. During the same period, the number of model calls in the United States decreased to 2.7 trillion.

Why can Chinese big models be so much cheaper than American big models?

Pan Helin, a member of the Information and Communication Economy Expert Committee of the Ministry of Industry and Information Technology, stated to the Economic Observer that there are two main reasons: firstly, China's computing infrastructure is large in scale and has a high reuse rate, resulting in lower quotes; Secondly, there is a large amount of self built computing power in China's computing power clusters, and the acquisition cost is lower than overseas.

In addition, the technological roadmap also affects costs. An industry insider told reporters that currently mainstream Chinese models generally adopt the MoE architecture, also known as the "hybrid expert model". Simply put, although a MoE model has a large number of parameters, only a small portion of them are activated to process the task at each run, rather than all parameters, which greatly reduces the computational cost required for each inference.

Different Paths

Martin Casado, a partner at Silicon Valley venture capital firm a16z, stated at the end of 2025 that approximately 80% of AI startups using open-source technology stacks are using Chinese models. He later added on social media that this does not mean that 80% of American AI startups are using Chinese models, but rather that among those companies that choose the open-source technology route (about 20% to 30% of all American AI startups), about 80% are using Chinese models.

The reporter noticed that multiple open-source tools have appeared on GitHub to help developers optimize costs between different models. The idea is often to classify tasks according to their difficulty level, with simple tasks being handled by free or low-cost Chinese models, and complex tasks being handled by expensive American models.

One project called ClawRouter provided comparative data in its documentation, showing that after adopting this combination, the average cost decreased from $25 per million Tokens to about $2. Anthropic's product ClaudeCode also adopts a similar layered design in its official documentation, using the cheapest model to handle daily tasks by default.

The prerequisite for the establishment of this model is that the Chinese model has sufficient ability to perform similar tasks. In terms of programming, the SWE Bench data mentioned earlier has already demonstrated this point. What is the overall capability gap between the Chinese and American models beyond programming?

LMSYSChatbotArena is currently one of the most widely recognized AI model evaluation platforms in the world. Its approach is to allow real users to try two models simultaneously without knowing their names, and then vote for the better one, which is equivalent to a blind product test between AI.

In its comprehensive ranking as of March 25, 2026, the top five are all American company models, with DeepSeekV3.2Speciale, the highest ranked Chinese model, ranking sixth. In the category of HardPrompts (high difficulty prompt words specifically used to test the model's ability to handle complex reasoning and multi-step logical tasks), the gap between Chinese and American models is more pronounced, with the first tier still mainly consisting of American models.

The difference in programming ability and the gap in complex reasoning between China and the United States is a manifestation of their differentiated capabilities, and it is also the basis for the establishment of the "layered calling" approach.

However, unlike contract manufacturers locked in low profit margins 30 years ago, Chinese large model manufacturers have not been consistently lowering their prices.

In fact, since 2024, there has been a round of price war in China's big model industry: in May 2024, ByteDance's volcanic engine Doubao big model triggered a "price war" at the price of 0.0008 yuan/1000 Token, followed by Alibaba Cloud and Baidu AI Cloud. In the following year, the industry experienced a phase where token prices dropped by more than 90%, and some manufacturers' gross profit margin for inference computing power was once negative.

The manufacturer's strategy at that time was to trade losses for scale and cultivate user calling habits. However, after OpenClaw became popular in February 2026, the growth rate of token consumption far exceeded expectations, and the supply of computing power tightened.

Zhipu was the first to react, raising API pricing when the new model GLM-5 was released on February 12, 2026, and raising prices again when the GLM-5-Turbo was released on March 16, with a cumulative increase of 83% in two rounds.

Zhipu CEO Zhang Peng stated at the 2025 performance briefing that API call pricing will increase by 83% and call volume will grow by 400% in the first quarter of 2026. According to the annual report, Zhipu's annual revenue for 2025 is 724.3 million yuan, a year-on-year increase of 132%. The MaaS (Model as a Service) platform's annual recurring revenue is about 1.7 billion yuan, a 60 fold increase in 12 months.

Not only does Zhipu choose to raise prices. On March 13, 2026, Tencent Cloud adjusted the pricing of its hybrid series of large models, with some models increasing by over 460%. On March 18, Alibaba Cloud and Baidu AI Cloud released a price adjustment announcement on the same day. AI computing related products rose between 5% and 34%, and the new price took effect on April 18.

Li Bin, Senior Vice President of Zhongke Shuguang, said in an interview with the Economic Observer that the evaluation indicators for computing power systems are changing. In the past, the standard for measuring a system was how much computing power it had, but now it depends on how economically it can produce tokens.

The transition from collective price reduction to collective price increase took less than two years.

In March 2026, Liu Liehong, Director of the National Data Administration, announced a set of figures at the China Development Forum: the daily average token call volume in China has exceeded 140 trillion, an increase of over 1000 times compared to two years ago.

At the GTC conference in the same month, Nvidia founder Huang Renxun stated that tokens will be the most core commodity in the future digital world.

In Pan Helin's view, the competitiveness of China's big models is strong, not in completing, but in leading, especially in AI applications. But he also stated that there is still room for improvement in China's original innovation. The core architecture of the current AI system, from artificial neural networks to attention mechanisms, was first proposed overseas and followed up and iterated domestically. The next step for the Chinese big model is to continue to focus on the application side while carrying out original innovation on the basic algorithms.

30 years ago, the consumer electronics OEM industry had a characteristic where the profit margin in the assembly process was firmly suppressed by upstream brand manufacturers, and many top OEM factories have developed gross profit margins that have not exceeded 10% to this day. The cost advantage brought orders, but failed to bring pricing power.

At present, the situation of China's large models seems somewhat similar to that of the consumer electronics outsourcing industry in the past, but it seems quite different in terms of pricing power. For example, after the price of smart spectrum rose by 83%, the call volume increased by 400%. Alibaba Cloud, Baidu AI Cloud and Tencent Cloud collectively raised the prices of AI computing power and model services in March 2026. The demand has not shrunk, and the number of calls is growing.

In SWE Bench programming evaluation, the gap between the top Chinese model and the top American model has narrowed to less than 1 percentage point. The gap between the two in complex reasoning is still present, but this gap is also rapidly narrowing.

This time, the development path of Chinese large model manufacturers seems to be different.

Disclaimer: The views expressed in this article are for reference and communication only and do not constitute any advice.
Senior journalist. Pay attention to new industries such as new energy, semiconductors, and intelligent vehicles. If you have any inquiries, please feel free to contact: zhengchenye@eeo. cn? WeChat: zcy096x.