On January 27, DeepSeek topped the free APP download rankings of Apple's US App Store, surpassing ChatGPT in the US download rankings. On the same day, Apple's China App Store free rankings showed that DeepSeek became the number one in China.
Feng Ji, founder and CEO of Game Science and producer of "Black Myth: Wukong", commented on DeepSeek: It may be a scientific and technological achievement of national level.
Less than a month later, on January 20 this year, DeepSeek officially open-sourced the R1 inference model. According to DeepSeek, its latest model DeepSeek-R1 used reinforcement learning technology on a large scale in the post-training stage, which greatly improved the model's reasoning ability with only very little labeled data. In tasks such as mathematics, code, and natural language reasoning, its performance is comparable to the official version of OpenAI o1.After the release of this model, it sparked discussions among many technology leaders in the overseas AI circle. For example, Jim Fan, a senior research scientist at Nvidia, publicly tweeted on his personal social platform: "We are at a historical moment: a non-American company is continuing OpenAI's original mission - empowering all of humanity through truly open cutting-edge research. It seems unreasonable, but the most interesting endings are often the most likely to come true."
Why did DeepSeek suddenly become popular?
On the 26th, DeepSeek experienced a short-term flash crash. Many netizens reported that they encountered a "server busy" prompt when using it. In response, DeepSeek said that there was indeed a local service fluctuation that afternoon, but the problem was resolved within minutes. This incident may be due to the surge in user visits after the release of the new model, and the server was temporarily unable to meet the concurrent needs of a large number of users. However, the official status page did not mark this incident as an accident.
It is understood that DeepSeek's ranking in the US list was not particularly outstanding before, and it was in a steady upward stage, but it did not enter the top ten. This sudden surge is directly related to its recent series of outstanding performances. According to Guangzhou Daily, "The reason for DeepSeek's explosion can be attributed to two points: performance and cost." Zheng Lei, chief economist of Samoyed Cloud Technology Group, told reporters. DeepSeek explained that R1 used reinforcement learning technology on a large scale in the post-training stage, which greatly improved the model's reasoning ability with only very little labeled data. This outstanding performance has not only attracted widespread attention from the technology community, but also made the investment community see its huge commercial potential. What is more concerning is that what really makes DeepSeek R1 different is its cost - or its low cost. The pre-training cost of DeepSeek's R1 is only 5.576 million US dollars, which is less than one-tenth of the training cost of OpenAI GPT-4o model. At the same time, DeepSeek announced the pricing of its API, 1 yuan (cache hit)/4 yuan (cache miss) per million input tokens, and 16 yuan per million output tokens. This fee is about one-thirtieth of the operating cost of OpenAI o1, and for this reason, DeepSeek is called the "Pinduoduo" in the AI world. Zheng Lei bluntly said that DeepSeek has a significant impact on the hardware market because it may reduce the hardware cost of artificial intelligence models, thereby promoting the development of artificial intelligence technology. The team has less than 140 people, all from top domestic universities. DeepSeek's innovations are not achieved overnight, but the result of "incubation" for several years and long-term planning. Liang Wenfeng, the founder of DeepSeek, is also the founder of Huanfang Quantitative, a leading quantitative private equity firm. Deepseek makes full use of the funds, data and cards accumulated by Huanfang Quantitative. Liang Wenfeng graduated from Zhejiang University with a bachelor's and master's degree in information and electronic engineering. Since 2008, he has led the team to explore fully automatic quantitative trading using technologies such as machine learning. In July 2023, DeepSeek was officially established and entered the field of general artificial intelligence. It has never raised external funds so far.Previously, Jack Clark, former policy director of OpenAI and co-founder of Anthropic, believed that DeepSeek hired "a group of unfathomable geniuses". In this regard, Liang Wenfeng once revealed in an interview with self-media that there were no unfathomable geniuses. They were all graduates from top universities, interns in their fourth and fifth doctoral programs who had not graduated, and some young people who had just graduated for a few years.
From the current public media reports, it can be seen that the biggest feature of the DeepSeek team is famous schools and young people. Even at the team leader level, most of them are under 35 years old. The team has less than 140 people, and almost all engineers and R&D personnel are from top domestic universities such as Tsinghua University, Peking University, Sun Yat-sen University, and Beijing University of Posts and Telecommunications, and they have not worked for a long time. Bonus
Expert Interpretation: Why China's New AI Big Model Is So Popular on the Internet
Recently, a Chinese AI startup called DeepSeek has become a hot topic in the field of artificial intelligence (AI) big models at home and abroad. In less than 30 days, DeepSeek has released two big models, DeepSeek-V3 and DeepSeek-R1. Their costs are low compared to foreign big model projects that cost hundreds of millions or even tens of billions of dollars, and their performance is comparable to that of foreign top big models. At the same time, DeepSeek is different from the closed-source path of foreign big model giants and adopts an open-source model. The development model and achievements of this Chinese company have attracted great attention from Silicon Valley. Many Western mainstream media have published articles lamenting that "Chinese AI models shocked Silicon Valley", and even triggered a "trend" of many well-known manufacturers and institutions at home and abroad trying to reproduce DeepSeek's achievements overnight. What are the characteristics of DeepSeek's development? Does it bring some inspiration to the development path and innovative ideas of domestic large models? The Global Times reporter interviewed several experts in the field of artificial intelligence on the 26th.
"OpenAI o1 economical and open competitor"
DeepSeek released the large model R1 on the 20th of this month, and said that "in tasks such as mathematics, code, and natural language reasoning, the performance is comparable to the official version of OpenAI o1", which has attracted great attention from foreign media, especially American media, to this Chinese company and its latest large model results.
"China's cheap and open artificial intelligence model DeepSeek makes scientists excited." "Nature" magazine said on the 24th that the large language model DeepSeek-R1 developed by China has excited scientists. It is considered to be an economical and open competitor to "reasoning" models such as OpenAI o1.
The New York Times reported on the 24th with the title "How Chinese AI Startup DeepSeek Competes with Silicon Valley Giants" that achieving the above results is already a milestone, but the team behind the DeepSeek-V3 large model described a greater improvement. They used only a small fraction of the highly specialized computer chips used by advanced AI companies to train the system. Chinese engineers said they only spent about $6 million and about 2,000 Nvidia dedicated chips to complete the training of the new model, which is far less than the world's leading AI companies in terms of both funding and chip usage.
"This is not a question of China catching up with the United States, but a question of open source catching up with closed source"
DeepSeek has attracted much attention, in addition to its high cost-effectiveness, there is another reason: open source. In recent days, a wave of DeepSeek reproduction has appeared on the Internet. The University of California, Berkeley, Hong Kong University of Science and Technology, and the well-known artificial intelligence company HuggingFace have all successfully reproduced the model, using only reinforcement learning, without supervised fine-tuning, and even with a cost of only tens of dollars.
The US Reddit said on the 25th that China's DeepSeek's model is open source, which is the real reason for excitement. Basically, they make the knowledge of making these things available to the world for free, ensuring that no one can really monopolize it. Chinese companies are basically doing the exact opposite of what American companies do. Can you see OpenAI, Anthropic or Google open source any powerful models? So far, all we have from them is fur. Meta is the only major Western company that has made a significant contribution to open source large models, but they may not open source their best models in the future. Yann LeCun, known as one of the "three giants of deep learning", said on the social platform X that this is not a question of China catching up with the United States, but a question of open source catching up with closed source.
Liu Wei, director of the Human-Computer Interaction and Cognitive Engineering Laboratory of Beijing University of Posts and Telecommunications, said in an interview with the Global Times that the three core elements of a large model are data, algorithms, and computing power. Deepseek uses less data and less computing power, and achieves equivalent or even better results than well-known foreign large models through algorithm optimization, which is very worthy of recognition. At the same time, it should be seen that it is open source and can be used and reproduced by users around the world who want to use this large model.
Shen Yang, a professor at the School of Journalism and the School of Artificial Intelligence of Tsinghua University, told the Global Times on the 26th that DeepSeek's large model is a very good one among the global open source large models. It is an innovative breakthrough that surpasses traditional pre-training technology by mixing multiple advanced technologies. He talked about several advantages of this large model based on his own experience. First, it combines the current methods of improving the capabilities of AI large models with micro-innovations in engineering. Second, DeepSeek has published relevant papers, and the whole process can be reproduced by everyone. This is the power of open source. Third, DeepSeek's reasoning process has its own innovations. As a researcher in the field of AI, Shenyang has used AI more than 30,000 times. He believes that compared with American AI, DeepSeek still has many Chinese elements in it, such as some hot words in the Chinese network.
Improving reasoning ability
As for what important inspirations Deepseek's development model provides for the development and innovation of large models in China, Liu Wei believes that "innovation is not planned, and it requires the market and professional institutions to find new ways through long-term research. In particular, some commercial companies that have long focused on vertical fields can find better innovation points through reflection on technical paths and a sense of market development. The initial development process of OpenAI is also like this. It was not planned by the US government and technology giants with heavy investments."
Recently, OpenAI, SoftBank and other companies announced the "Stargate" plan to spend $500 billion in four years to accelerate the development of artificial intelligence in the United States. Liu Wei emphasized that this development path that concentrates human, financial and material resources and then gives policy inclinations has certain uncertainties in future research directions and research results. "We should encourage more domestic commercial companies and scientific research institutes to focus on their own research fields and find innovation and development paths that suit them." Shenyang said that in the history of AI development, new breakthroughs are often driven by inconspicuous engineering innovations and scientific exploration. This trend is deeply reflected in the achievements of DeepSeek, which not only breaks through the traditional training methods, but also brings a new perspective to the improvement of reasoning ability. "Although its achievements are still at a stage level, its engineering contributions and theoretical innovations have laid an important foundation for the future development of AI." Shenyang believes that the contribution of the DeepSeek team in pre-training basic models is not only a breakthrough at the technical level, but also lies in the refinement and efficiency of its engineering methods. This engineering innovation of DeepSeek marks a new stage in AI model training, which not only reduces development costs, but also provides a reference path for other companies. At the same time, DeepSeek's core innovation is also reflected in the improvement of reasoning ability, especially through the innovation of related algorithms to promote the natural reasoning ability of the model, proving a potential in the field of AI-without a large number of expensive thought chain annotations, the model can still emerge with reasoning ability.
Shenyang believes that the success of DeepSeek also allows us to see the future development direction of the AI industry: more open source innovation, deep collaboration between hardware and software, and continuous optimization of model development costs and reasoning capabilities. At the same time, we must also see that although DeepSeek has achieved significant interim results, it still needs to face many deep-level challenges in order to achieve further breakthroughs in the future development path, such as more original training data and algorithm innovation.