On Wednesday, Anthropic CEO Dario Amodei published a lengthy article analyzing the debate over whether DeepSeek's success threatens the United States and means that U.S. export controls on AI chips don't work.
The following is a translation of the original article, and the "I" in the article refers to Dario Amodei.
Image source: Dario Amodei
A few weeks ago, I advocated for the United States to strengthen chip export controls on China. Since then, Chinese AI company DeepSeek has approached the performance of cutting-edge American AI models at least in some aspects, but at a lower cost.
Here I will not focus on whether DeepSeek poses a threat to US AI companies like Anthropic (although I do think that many of the claims that they pose a threat to US AI leadership are greatly exaggerated)1. Instead, I will focus on whether the release of DeepSeek undermines the case for chip export control policies. I think not. In fact, I think they make export control policies more important than they were a week ago2.
Export controls have a vital purpose: to keep the United States at the forefront of AI development. To be clear, this is not a way to avoid the competition between the United States and China. Ultimately, if we want to win, American AI companies must have a better model than China. We should not cede technological advantage to China without having to.
Three Dynamics of AI
Before I make my policy arguments, I’ll describe three fundamental dynamics that are critical to understanding AI systems:
Scaling Laws. One property of AI (which my co-founders and I were the first to document while working at OpenAI) is that, all else being equal, scaling up the training of an AI system can improve results across the board on a range of cognitive tasks. For example, a $1 million model might solve 20% of important coding tasks, a $10 million model might solve 40%, a $100 million model might solve 60%, and so on. These differences tend to have huge consequences in practice—another factor of 10 might correspond to the difference between the skill level of an undergraduate and a PhD student—and so companies are investing heavily in training these models.
Changing the curve. The field is constantly flooded with ideas, big and small, that make things more effective or efficient: this could be an improvement in model architecture (a tweak to the basic Transformer architecture used by all models today), or just a way to run a model more efficiently on the underlying hardware. New generations of hardware have the same effect. This often changes the curve: if the innovation is a 2x "compute multiplier" (CM), then it can allow you to get 40% of the coding work for $5M instead of $10M; or 60% of the coding work for $50M instead of $100M, etc.
Every leading AI company regularly discovers many of these CMs: usually small (~1.2x), sometimes medium (~2x), and occasionally very large (~10x). Because the value of having smarter systems is so high, this shift in the curve generally causes companies to spend more on training models, not less: the cost efficiency gains end up being entirely devoted to training smarter models, limited only by the company’s financial resources. People are naturally drawn to the idea that “it’s expensive at first, then it gets cheaper” — as if AI is a single thing of constant quality, and as it gets cheaper we’ll use fewer chips to train it.
But what’s important is the scaling curve: when it moves, we just traverse it faster because the value at the end of the curve is so high. In 2020, my team published a paper that suggested the curve change due to algorithmic advances was about 1.68x per year. That rate has probably accelerated significantly since then; it also doesn’t take into account efficiency and hardware.
My guess is that the number today is probably about 4x per year. Another estimate is here. Changes in the training curve also change the inference curve, so over the years, prices have dropped dramatically while model quality has remained constant. For example, Claude 3.5 Sonnet, released 15 months after the original GPT-4, outperforms GPT-4 on almost every benchmark while offering an API price that’s about 10x lower.
Shifting Paradigms. Every once in a while, something changes in the underlying content that’s being extended, or a new type of extension is added to the training process. From 2020 to 2023, the main thing about extensions was pre-trained models: models trained on growing amounts of internet text, with a small amount of additional training on top of that. In 2024, the idea of using reinforcement learning (RL) to train models to generate chains of thought has become a new focus for extensions.
Anthropic, DeepSeek, and many others (perhaps most notably OpenAI, which released its o1 preview model in September) have found that this kind of training greatly improves performance on some selected, objectively measurable tasks (like math, coding competitions), as well as reasoning similar to those tasks.
This new paradigm involves starting with a plain old pre-trained model and then using RL to add reasoning skills in a second phase. Importantly, because this type of reinforcement learning is new, we are still early in the scaling curve: the amount of money spent on the second reinforcement learning phase is small for all involved. Spending $1 million instead of $100,000 is enough to achieve huge gains.
Companies are now rapidly scaling up to hundreds of millions or even billions of dollars in the second phase, but it is important to understand that we are at a unique “intersection point” where there is a powerful new paradigm that is early in the scaling curve and can therefore achieve huge gains quickly.
DeepSeek’s Model
The above three dynamics help us understand DeepSeek’s recent launch. About a month ago, DeepSeek released a model called “DeepSeek-V3” which is a pure pre-trained model3 —that is, the first stage described in point 3 above. Then last week, they released “R1” which added the second stage. It’s impossible to determine everything about these models from the outside, but here’s my best understanding of the two versions.
DeepSeek-V3 is actually a real innovation that should have been noticed a month ago (and we did). As a pre-trained model, it seems to perform close to 4 state-of-the-art US models on some important tasks, while being significantly cheaper to train (although we found that Claude 3.5 Sonnet performed particularly well on some other key tasks).The DeepSeek team achieved this through some really impressive innovations that are focused on engineering efficiency. There are particularly innovative improvements in managing something called “key-value caching” and in making an approach called “mixture of experts” go further than before.
However, it is important to look closely:
DeepSeek does not "do for $6m what US AI companies spend billions on". I can only speak for Anthropic, but the Claude 3.5 Sonnet is a medium-sized model that cost tens of millions of dollars to train (I won't give an exact number). Furthermore, the 3.5 Sonnet was not trained in a way that involved larger or more expensive models (contrary to some rumors). Sonnet was trained 9-12 months ago, DeepSeek's model was trained in November/December, and Sonnet still leads in many internal and external evaluations. So I think a fair statement is "DeepSeek produces models approaching the performance of US models from 7-10 months ago, at a much lower cost (but nowhere near the proportion people suggest)" If the historical trend of the cost curve going down is ~4x per year, that means today's models are 3-4x cheaper than 3.5 Sonnet/GPT-4o. Since DeepSeek-V3 is worse than the US leading edge models - assuming ~2x worse on the scaling curve, which I think is pretty generous for DeepSeek-V3 - that would mean it would be completely normal and completely "in line with trend" if DeepSeek-V3's training costs were ~8x lower than current US models developed a year ago. I won't put a number on it, but it's clear from the previous bullet points that even if you take DeepSeek's training costs at face value, they are in line with trend at best, and probably not even out of trend. For example, this is smaller than the original GPT-4 to Claude 3.5 Sonnet inference price difference (10x), and the 3.5 Sonnet is a much better model than GPT-4. All of this suggests that DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM; it is an expected point on a curve of ongoing cost reductions.
What’s different this time is that the first company to demonstrate the expected cost reduction is a Chinese company. This has never happened before and has geopolitical implications. However, US companies will soon follow suit — they won’t do it by copying DeepSeek, but because they are also realizing the general trend of cost reductions.
Both DeepSeek and US AI companies have more money and chips than they did when they trained their main models. The extra chips are used in R&D to develop the ideas behind the model, and sometimes to train large models that are not yet ready (or require multiple attempts to get right). It has been reported — and we can’t be sure it’s true — that DeepSeek actually has 50,000 Hopper generation chips6, which I’d guess is about 2-3x as many as the major US AI companies have (e.g., it’s 2-3x less than the xAI “ Colossus ” cluster)7. Those 50,000 Hopper chips cost about $1 billion. Thus, DeepSeek’s total spending as a company (as distinct from the spending on training individual models) isn’t that different from US AI labs.
It’s worth noting that the “scaling curve” analysis is a bit of an oversimplification, as there is some variation among the models, with each having its own strengths and weaknesses; the scaling curve number is a rough average that ignores a lot of detail. I can only speak for Anthropic’s models, but as I hinted above, Claude is very good at coding and has a well-crafted style of interacting with people (which many use to get personal advice or support). On these tasks and a few others, DeepSeek simply can’t compare. These factors don’t show up in the scaling numbers.
R1 is the model released last week that has generated a lot of public attention (including a ~17% drop in Nvidia’s stock price) but is nowhere near as interesting from an innovation or engineering perspective as V3. It adds a second stage of training — reinforcement learning, as described in point 3 of the previous section — and essentially replicates what OpenAI did with o1 (they appear to scale similarly, with similar results) 8.
However, because we are so early in the scaling curve, multiple companies can produce this type of model, as long as they start with a strong pre-trained model. Producing R1 in the case of V3 is probably very cheap. So we are at an interesting “crossover point” where, for the time being, a few companies can produce good inference models. This will quickly cease to be true as everyone moves further up the scaling curve for these models.
Export Controls
The above is just a preface to the main topic I’m interested in: export controls on Chinese chips. Given the facts above, I think the situation is as follows:
There is a trend that companies are investing more and more money in training powerful AI models, even though the curve periodically changes and the cost of training a particular level of model intelligence falls rapidly. It’s just that the economic value of training increasingly smarter models is so large that any cost gains are almost immediately eaten up - they are reinvested into making smarter models for the same amount of money we originally planned to spend.
Until now, in US labs, the efficiency innovations developed by DeepSeek will soon be used by US and Chinese labs to train billions of dollars worth of models. These models will outperform the billions of dollars worth of models they previously planned to train - but they will still cost billions of dollars. This number will continue to rise until we have AI that is smarter than almost all humans at almost everything.
Making AI that is smarter than humans at almost everything will require millions of chips, tens of billions of dollars (at least), and will most likely be available in 2026-2027. The release of DeepSeek does not change that, as they are roughly on the expected cost reduction curve that has always been factored into these calculations.
This means that in 2026-2027 we may be stuck in two very different worlds. In the United States, multiple companies will certainly have the millions of chips needed (at a cost of tens of billions of dollars). The question is whether China can also get millions of chips9.
If it can, we will live in a bipolar world, with the United States and China both having powerful AI models that will drive extremely rapid progress in science and technology - what I call the "Genius Nation of the Data Center". The bipolar world will not necessarily remain balanced indefinitely. Even if the US and China are evenly matched in AI systems, China, with its massive industrial base, could help it dominate the global stage, not just in AI, but in all fields.
If China cannot get millions of chips, we will (at least temporarily) live in a unipolar world, where only the US and its allies have these models. It is not clear whether the unipolar world will last, but there is at least the possibility that a temporary lead can be transformed into a lasting advantage because AI systems can eventually help make smarter AI systems. Therefore, in this world, the US and its allies may dominate the global stage and maintain their lead for a long time.
Only strict enforcement of export controls11 can prevent China from getting millions of chips, and is therefore the most important factor in determining whether we end up in a unipolar or bipolar world.
DeepSeek’s performance does not mean that export controls have failed. As I said above, DeepSeek has a moderate to large number of chips, so it is not surprising that they were able to develop and train a powerful model. Their resource constraints are no more severe than those of US AI companies, and export controls are not the primary factor driving their "innovation." They are just very talented engineers, and have shown why China is a serious competitor to the US.
DeepSeek also doesn't show that controls always have loopholes. $1 billion of economic activity can be hidden, but $100 billion or even $10 billion is hard to hide. A million chips are also physically hard to smuggle.
It's also instructive to look at the chips DeepSeek has reported so far. According to SemiAnalysis, it's a mix of H100s, H800s, and H20s, for a total of 50,000. The H100 has been banned by export controls since its release, so if DeepSeek has any chips, they must not have come through the official channel (note that Nvidia has stated that DeepSeek's progress is "fully compliant with export controls"). The H800 was allowed in the first round of export controls in 2022, but was banned when the controls were updated in October 2023, so these chips were probably shipped before the ban. The H20 has lower training efficiency and higher sampling efficiency - it is still allowed even though I think it should be banned.
All of this suggests that a significant portion of DeepSeek's AI chip fleet appears to be made up of chips that are not yet banned (but should be). This suggests that export controls are indeed working and adapting: loopholes are being plugged. If we can plug them fast enough, we may be able to increase the likelihood of the United States leading a unipolar world.
Given my focus on export controls and US national security, I want to be clear. I do not consider DeepSeek to be an adversary per se, and the focus is not specifically on them. In the interviews they gave, they seemed like smart, curious researchers who just wanted to develop useful technology.
But export controls are one of the most powerful tools we have to prevent China from catching up with the United States. The idea that the increasing power and cost-effectiveness of technology is a reason to lift export controls is completely unreasonable.
Footnotes
[1] In this article, I will not take any position on the Western model extraction report. Here, I just believe DeepSeek's statement that they trained it the way the paper said.
[2] As an aside, I think the release of the DeepSeek model is clearly not a bad thing for Nvidia, and their double-digit (~17%) drop in stock price as a result is puzzling. The reasons why this release is not a bad thing for Nvidia are even more obvious than the reasons why it is not a bad thing for AI companies. But my main goal in this article is to defend export control policy.
[3] To be precise, it is a pre-trained model that contains a small amount of RL training typical of models before the inference paradigm shift.
[4] It performs better on some very narrow tasks.
[5] This is the number quoted in the DeepSeek paper - I just take it at face value and don’t question it, just the comparison with the cost of training models for US companies, and the difference between the cost of training a specific model ($6 million) and the total cost of R&D (much higher). However, we can’t be completely sure about the $6 million either - the model size is verifiable, but other aspects (such as the number of tokens) are not. ↩
[6] In some interviews, I said they had “50,000 H100s,” which is a subtly incorrect summary of reporting that I want to correct here. By far the most well-known “Hopper chip” is the H100 (which I assume is what is being referred to), but Hopper also includes the H800 and H20, and DeepSeek reportedly has a mix of all three, totaling 50,000. This doesn’t change the situation much, but it’s worth correcting. I’ll talk more about the H800 and H20 when I talk about export controls. ↩
[7] Note: I expect this gap to widen significantly in next-generation clusters due to export controls. [8] I suspect one of the main reasons R1 has received so much attention is that it was the first model to show the user the reasoning the model was performing (OpenAI’s o1 only showed the final answer). DeepSeek shows that users are interested in this. To be clear, this is a user interface choice and has nothing to do with the model itself. [9] China’s own chips will not be able to compete with US-made chips anytime soon.
Reference:
[1] https://techcrunch.com/2025/01/29/anthropics-ceo-says-deepseek-shows-that-u-s-export-rules-are-working-as-intended/
[2] https://darioamodei.com/on-deepseek-and-export-control