Source: Quantum
It took about a month for the financial community to start panicking about DeepSeek, but when the panic really hit, Nvidia's market value had shrunk by more than $500 billion (about 3.6 trillion yuan), the equivalent of an entire Stargate. It's not just Nvidia, the market values of Tesla, Google, Amazon, and Microsoft have all fallen.
According to Alexander Wang, CEO of Scale AI, the two AI models released in succession by DeepSeek are comparable to the best models from American laboratories. And DeepSeek seems to work under limited conditions, which means that its training costs are much lower than its American counterparts. It is said that the final training cost of one of its most recent models was only $5.6 million (about 40.6 million yuan), which is about the same as the salary of American AI experts. Last year, Anthropic CEO Dario Amodei said the cost of training a model ranged from $100 million to $1 billion. OpenAI’s GPT-4 cost more than $100 million, according to CEO Sam Altman. DeepSeek appears to have upended our perception of the cost of AI, and could have a huge impact on the industry as a whole.
This all happened in just a few weeks. On Christmas Day, DeepSeek released an inference model (v3) that attracted widespread attention. Its second model, R1, was released last week and was called “one of the most amazing and impressive breakthroughs I’ve ever seen” by venture capitalist and President Trump adviser Marc Andreessen. David Sacks, Trump’s AI and crypto expert, said the progress of DeepSeek’s model showed that “the AI race is going to be very intense.” In addition to the training data, both models are partially open source.
DeepSeek’s success calls into question whether it really takes billions of dollars of computing power to win the AI race. Conventional wisdom has long held that big tech companies would dominate AI simply because they have the spare cash to chase advances. Now it looks like big tech is just burning cash. Calculating the actual cost of these models is a little tricky because, as Scale AI’s Wang points out, DeepSeek may not be able to be truthful about what kind and how many GPUs it has because of sanctions.
Hugging Face’s head of research Leandro von Vera says that even if the critics are right and DeepSeek isn’t being truthful about the number of GPUs it has (napkin math suggests they’re using optimization techniques, which means they’re telling the truth), it won’t be long before the open source community figures it out. His team began replicating and open-sourcing the R1 recipe last weekend, and once researchers are able to create their own versions of the model, “we’ll soon find out if the numbers are right.”
What is DeepSeek?
Two-year-old DeepSeek, led by CEO Liang Wenfeng, is China’s premier AI startup. The company, which was spun off from a hedge fund founded by engineers from Zhejiang University, is focused on "potentially game-changing architectural and algorithmic innovations" to create general artificial intelligence (AGI) - at least that's what Liang Wenfeng says. Unlike OpenAI, the company also claims to be profitable.
In 2021, Liang began buying thousands of Nvidia GPUs (just before the US imposed sanctions on the chips) and launched DeepSeek in 2023 with the goal of "exploring the essence of general artificial intelligence," that is, artificial intelligence as smart as humans. Like OpenAI CEO Altman and other industry leaders, Liang has a lot of big talk. "Our goal is general artificial intelligence," Liang said in an interview, "which means we need to study new model structures to achieve stronger model capabilities with limited resources."
DeepSeek does just that. The team uses some innovative technical methods to enable its models to run more efficiently, and claims that the final training run cost of R1 was $5.6 million (about 406 billion yuan). This is a 95% reduction from OpenAI's o1. Rather than starting from scratch, DeepSeek used existing open-source models as a starting point to build its AI — specifically, the researchers used Meta’s Llama model as a foundation. While the company’s training data mix is not disclosed, DeepSeek does mention that it used synthetic data, or artificially generated information (which may become more important as AI labs appear to be hitting a data bottleneck).
Without training data, it’s unclear to what extent this is a “copy” of o1 — did DeepSeek use o1 to train R1? When the first paper was released in December, Altman wrote that “copying something you know works (is) relatively easy,” while “doing something new, risky, and difficult when you don’t know if it works is extremely difficult.” So DeepSeek’s argument is that it doesn’t create new cutting-edge models; it just copies old ones. OpenAI investor Joshua Kushner also seems to say that DeepSeek is “trained on leading-edge models from Silicon Valley.”
R1 uses two key optimization tricks: more efficient pre-training and reinforcement learning for thought chain reasoning, according to former OpenAI policy researcher Miles Brundage. DeepSeek found smarter ways to train AI using cheaper GPUs, helped in part by using a newer technique that requires AI to "think" about problems step by step through trial and error (reinforcement learning) rather than imitating humans. The combination allowed the model to achieve O1-level capabilities while using less computing power and money.
"DeepSeek v3, and DeepSeek v2 before it, are basically the same models as GPT-4, but with smarter engineering tricks to get more bang for the GPU buck," Brundage said.
It should be noted that other labs have also adopted these techniques (DeepSeek used a "mixture of experts" technique to activate only some of the model's features for specific queries. GPT-4 also uses this approach). The DeepSeek version innovates on this concept by creating more refined categories of experts and developing more effective ways to communicate, making the training process itself more efficient. The DeepSeek team also developed a technique called DeepSeekMLA (Multi-Headed Subconscious), which drastically reduces the memory required to run AI models by compressing the way models store and retrieve information.
What astounded the world wasn’t just the architecture of the models, but how quickly OpenAI’s achievements could be replicated in a matter of months, rather than the year or more that typically passes between major AI advances, Brundage added.
OpenAI positions itself as uniquely capable of building advanced AI, and that public profile just happened to win over investors to build the world’s largest AI data center infrastructure. But DeepSeek’s rapid replication suggests that technological advantage won’t last—even if the company tries to keep its methods secret.
“To some extent, these closed companies are clearly surviving on people thinking they’re doing the greatest thing, and that’s how they maintain their valuations. Maybe they exaggerate a little bit in order to raise more money or build more projects,” von Villa said. “Whether they’ve overstated their internal prowess, no one knows, but it’s clearly working in their favor.”
Talking Money
The investment community has been delusional about AI since OpenAI released ChatGPT in 2022. The question isn’t whether we’re in an AI bubble, but “are bubbles actually a good thing?” (“Bubbles have been given an unfairly negative connotation,” Deepwater Asset Management wrote in 2023.)
It’s not clear that investors understand how AI works, but they’re hopeful that it will at least lead to widespread cost savings. A December 2024 report from PwC found that two-thirds of investors surveyed expected AI to increase productivity, and a similar number expected increased profits.
The public company that has benefited the most from the hype cycle is Nvidia, which makes the complex chips used by AI companies. Buying Nvidia stock, people thought, was investing in the company that makes the shovels in the AI gold rush. Whoever dominates the AI race will need lots of Nvidia chips to run their models. On Dec. 27, Nvidia’s stock closed at $137.01 — nearly 10 times what it will be worth in early January 2023.
DeepSeek’s success upends the investment thesis that has driven Nvidia’s stock soaring. If the company is indeed using its chips more efficiently (rather than simply buying more of them), then other companies will start to do the same. That could mean a smaller market for Nvidia’s most advanced chips as companies try to cut spending.
“Nvidia’s growth expectations are a bit ‘optimistic,’ so I think this is a necessary reaction,” said Naveen Rao, vice president of AI at Databricks. “Nvidia’s current revenue is unlikely to be threatened; but the big growth of the past few years could be impacted.”
Nvidia isn’t the only company driven by this investment thesis. In 2023, the “Big Seven” — Nvidia, Meta, Amazon, Tesla, Apple, Microsoft, and Alphabet — outperformed the rest of the market, gaining 75% in value. They continued this amazing bull run in 2024, with all but Microsoft outperforming the S&P 500. Only Apple and Meta were unaffected by the DeepSeek incident.
The craze isn’t limited to public markets. Startups like OpenAI and Anthropic have also hit dizzying valuations — $157 billion and $60 billion, respectively — as venture capital firms have poured money into the space. Profitability isn’t a big issue. OpenAI expects to lose $5 billion in 2024, despite projected revenue of $3.7 billion.
DeepSeek’s success suggests that simply throwing lots of money at the problem isn’t as protective as many companies and investors thought. It suggests that smaller startups can be more competitive with the giants — and even disrupt known leaders through technological innovation. So while this is bad news for the giants, it could be good news for smaller AI startups, especially since their models are open source.
So while this is bad news for the giants, it could be good news for smaller AI startups, especially since their models are open source.
Hugging Face’s von Vera argues that cheaper training models won’t actually reduce GPU requirements. “If you can build a super powerful model on a smaller scale, why not scale it up again?” he asks. “What you naturally do is you figure out how to do something cheaper, why not scale it up and build a better version that costs more.”
Optimization is necessary
But DeepSeek isn’t just disrupting the investment landscape; it’s also a clear signal. The progress made by DeepSeek’s model shows how easily rival nations can catch up to the most advanced U.S. technology even with export controls in place.
Export controls on the most advanced chips, which officially began in October 2023, are relatively new and their full impact has yet to be felt, according to Lennart Heim, an expert at the RAND Corporation, and Sihao Huang, a doctoral student at Oxford University who specializes in industrial policy.
DeepSeek shows that despite limited computing power, you can still innovate through optimization, while the United States is betting big on raw power — as evidenced by Altman’s $500 billion (RMB 362.54 billion) Stargate project with Trump.
“Inference models like DeepSeek’s R1 require a lot of GPUs, as DeepSeek quickly ran into trouble serving its applications to more users,” Brundage said. “Given this, and the fact that scaling up reinforcement learning will make DeepSeek’s model even more powerful than it is today, it is more important than ever that the U.S. has effective export controls on GPUs.”
Some are skeptical that DeepSeek’s achievement is as described. “We question whether DeepSeek’s achievement was achieved without access to advanced GPUs for fine-tuning and/or building the underlying large language model on which the final model is based,” Citi analyst Atif Malik said in a research note. “The claim that ‘DeepSeek replicated OpenAI for $5 million’ appears to be completely false and we do not think it is really worth discussing further,” Bernstein analyst Stacey Rasgon said in her own note.
For others, export controls appear to be counterproductive: Rather than slowing down development in rival countries, they are forcing them to innovate. While the U.S. has restricted access to advanced chips, companies like DeepSeek and Alibaba-owned Tongyi Qianwen have found creative workarounds — optimizing their training techniques and leveraging open source technology while developing their own chips.
No doubt some will wonder what this means for general artificial intelligence, which the savviest AI experts consider a pie in the sky to attract capital. (Last December, OpenAI’s Altman notably lowered the bar for general artificial intelligence from something that could “enhance humanity” to something that is “much less important than people think.”) Since AI superintelligence is still largely a fantasy, it’s hard to know whether it’s even possible—not to mention that DeepSeek has taken a reasonable step in that direction. In that sense, the company’s whale logo is correct; this is an industry full of Ahabs. The endgame of AI is anyone’s guess.
What it takes for future AI leaders
AI has long been a story of overdevelopment: data centers consume the energy of a small country, training costs billions of dollars, and only the tech giants can play the game. To many, DeepSeek’s emergence seems to have turned that view on its head.
While models like DeepSeek might seem like a solution to the problem of disruptive AI by making training cheaper, unfortunately, it’s not that simple. Both Brundage and von Vera agree that more efficient resources mean companies will likely use more computing power to get better models. Von Vera also says that means smaller startups and researchers will have easier access to the best models, so the demand for compute will only increase.
DeepSeek’s use of synthetic data isn’t revolutionary either, though it does show that it’s possible for AI labs to create something useful without destroying the entire internet. But the destruction has already been done; there’s only one internet, and it’s already trained models that are foundational to the next generation. Synthetic data doesn’t completely solve the problem of finding more training data, but it’s a promising approach.
The most important thing DeepSeek has done is make it cheaper. You don’t have to be tech-savvy to understand that powerful AI tools may soon become more affordable. AI leaders have made promises that progress will come soon. One possible change is that someone can now build a cutting-edge model in their garage.
The race to general artificial intelligence is largely a fantasy. The money, however, is real. DeepSeek has powerfully demonstrated that money alone cannot put a company at the forefront of the field. The long-term impact could reshape the AI industry as we know it.