Regulatory Tightening: Binance.US Faces Service Halts
Binance.US halts services in Florida and Alaska post-CZ's legal woes; states express regulatory concerns.
BrianAuthor: Yu Lili; Source: Undercurrent Waves
Among the seven large-scale model startups in China, DeepSeek is the most silent, but it is always remembered in unexpected ways.
A year ago, this unexpectedness came from the quantitative private equity giant Huanfang behind it, which was the only company outside the big companies that had a reserve of 10,000 A100 chips. A year later, it was the source of the price war for large models in China.
In May, which was bombarded by AI continuously, DeepSeek became famous overnight. The reason is that they released an open source model called DeepSeek V2, which provides an unprecedented cost-effectiveness: the inference cost is reduced to only 1 yuan per million tokens, which is about one-seventh of Llama3 70B and one-seventieth of GPT-4 Turbo.
While DeepSeek was quickly dubbed the "AI industry's Pinduoduo", major companies such as ByteDance, Tencent, Baidu, and Alibaba could not hold back and cut prices one after another. China's large model price war is imminent.
The diffuse smoke actually conceals a fact: Unlike many large companies that burn money to subsidize, DeepSeek is profitable.
Behind this is DeepSeek's all-round innovation of the model architecture. It proposed a new MLA (a new multi-head potential attention mechanism) architecture, which reduced the memory usage to 5%-13% of the most commonly used MHA architecture in the past. At the same time, its original DeepSeekMoESparse structure also reduced the amount of calculation to the extreme, all of which ultimately contributed to the reduction in costs.
In Silicon Valley, DeepSeek is called "the mysterious power from the East". SemiAnalysis chief analyst believes that the DeepSeek V2 paper "may be the best one this year". Andrew Carr, a former employee of OpenAI, believes that the paper is "full of amazing wisdom" and applies its training settings to his own model. Jack Clark, former policy director of OpenAI and co-founder of Anthropic, believes that DeepSeek "hired a group of unfathomable geniuses" and believes that the large models made in China "will become a force that cannot be ignored, just like drones and electric vehicles."
This is a rare case in the AI wave where the story is basically driven by Silicon Valley. Many industry insiders told us that this strong response stems from innovation at the architectural level, which is a rare attempt by domestic large-scale model companies and even global open source base large-scale models. An AI researcher said that the Attention architecture has been proposed for many years and has hardly been successfully modified, let alone large-scale verification. "This is even an idea that will be cut off when making decisions, because most people lack confidence."
On the other hand, domestic large-scale models have rarely been involved in architectural innovation before, and it is also because few people have taken the initiative to break such a stereotype: the United States is better at technical innovation from 0-1, while China is better at application innovation from 1-10. What's more, this behavior is very uneconomical - a new generation of models will naturally be made by someone in a few months, and Chinese companies only need to follow and make good applications. Innovating the model structure means that there is no path to follow, and it will experience many failures, and the time and economic costs are huge.
DeepSeek is obviously a retrograde. Amid the clamor that big model technologies are bound to converge and that following is a smarter shortcut, DeepSeek values the value accumulated in the "detours" and believes that Chinese big model entrepreneurs can join the global technological innovation in addition to application innovation.
Many of DeepSeek's choices are different. So far, among the seven Chinese big model startups, it is the only one that has given up the "both" route and has focused on research and technology, not doing toC applications. It is also the only company that has not fully considered commercialization, firmly chosen the open source route, and has not even raised funds. These make it often forgotten outside the poker table, but on the other hand, it is often spread by users in the community.
How did DeepSeek come about? For this reason, we interviewed Liang Wenfeng, the founder of DeepSeek who rarely appears.
This founder born in the 1980s, who has been devoted to researching technology behind the scenes since the Magic Square era, continues his low-key style in the DeepSeek era. Like all researchers, he "reads papers, writes code, and participates in group discussions" every day.
Unlike many quantitative fund founders who have overseas hedge fund resumes and come from physics, mathematics and other majors, Liang Wenfeng has always been a local background and studied artificial intelligence in the Department of Electronic Engineering at Zhejiang University in his early years.
Many industry insiders and DeepSeek researchers told us that Liang Wenfeng is a very rare person in the current Chinese AI industry who "has both strong infra engineering capabilities and model research capabilities, and can mobilize resources", "can make accurate judgments from a high level, and can be better than first-line researchers in details". He has "terrifying learning ability" and is "not like a boss at all, but more like a geek".
This is a particularly rare interview. In the interview, this technological idealist provided a voice that is particularly scarce in China's current technology community: He is one of the few people who puts "right and wrong" before "interests" and reminds us to see the inertia of the times and put "original innovation" on the agenda.
A year ago, when DeepSeek just came out, we first interviewed Liang Wenfeng: "Crazy Magic Square: The Big Model Road of an Invisible AI Giant". If the phrase "must be crazy ambitious and crazy sincere" was still a beautiful slogan at the time, a year later, it has become an action.
The following is part of the conversation
「Undercurrent」: After the release of DeepSeek V2 model, a bloody price war of large models was quickly triggered. Some people said that you are a catfish in the industry.
Liang Wenfeng: We did not intend to become a catfish, but we became one accidentally.
「Undercurrent」: Did this result surprise you?
Liang Wenfeng: Very unexpected. I didn't expect that price would make everyone so sensitive. We just do things at our own pace and then calculate the cost and price. Our principle is not to lose money or make huge profits. This price is also slightly above the cost.
「Undercurrent」: Five days later, Zhipu AI followed suit, followed by ByteDance, Alibaba, Baidu, Tencent and other big companies.
Liang Wenfeng: Zhipu AI's price cut is an entry-level product, and the model of the same level as ours is still very expensive. ByteDance was the first to follow up. The flagship model was reduced to the same price as ours, which triggered other large manufacturers to reduce their prices. Because the model costs of large manufacturers are much higher than ours, we did not expect that anyone would lose money doing this, and finally it became the logic of burning money and subsidies in the Internet era.
"Undercurrent": From an outside perspective, the price cut is very much like snatching users, and the price war in the Internet era is usually like this.
Liang Wenfeng: Snatching users is not our main purpose. We cut prices on the one hand because we have reduced costs first in exploring the structure of the next generation of models, and on the other hand, we also think that both APIs and AI should be inclusive and affordable for everyone.
「暗涌」: Before this, most Chinese companies would directly copy this generation of Llama structure to make applications. Why did you start from the model structure?
梁文锋: If the goal is to make applications, then it is also a reasonable choice to continue to use the Llama structure and quickly launch products. But our destination is AGI, which means we need to study new model structures to achieve stronger model capabilities under limited resources. This is one of the basic research required to scale up to larger models. In addition to the model structure, we have also done a lot of other research, including how to construct data, how to make the model more human-like, etc., which are all reflected in the models we released. In addition, the structure of Llama is estimated to be two generations behind the advanced level of foreign countries in terms of training efficiency and inference cost.
「暗涌」: Where does this generation gap mainly come from?
Liang Wenfeng: First of all, there is a gap in training efficiency. We estimate that the best level in China may be twice as big as the best level abroad in terms of model structure and training dynamics. Just for this reason, we need to consume twice the computing power to achieve the same effect. In addition, there may be a gap of one time in data efficiency, that is, we need to consume twice the training data and computing power to achieve the same effect. Together, we need to consume 4 times more computing power. What we need to do is to constantly narrow these gaps.
"Undercurrent": Most Chinese companies choose to have both models and applications. Why does DeepSeek currently choose to only do research and exploration?
Liang Wenfeng: Because we think the most important thing now is to participate in the wave of global innovation. In the past many years, Chinese companies have been accustomed to others doing technological innovation, and we took it over to make applications and realize it, but this is not a matter of course. In this wave, our starting point is not to take advantage of the opportunity to make a fortune, but to go to the forefront of technology and promote the development of the entire ecosystem.
"Undercurrent": The inertial cognition left by most people in the Internet and mobile Internet era is that the United States is good at technological innovation, and China is better at application.
Liang Wenfeng: We believe that with economic development, China should also gradually become a contributor, rather than always taking a free ride. In the past thirty years of IT waves, we have basically not participated in real technological innovation. We have become accustomed to Moore's Law falling from the sky, and better hardware and software will come out in 18 months at home. Scaling Law is also being treated in this way.
But in fact, this is created by the Western-dominated technology community tirelessly from generation to generation. It is only because we did not participate in this process before that we ignored its existence.
「Undercurrent」: Why did DeepSeek V2 surprise many people in Silicon Valley?
Liang Wenfeng: Among the many innovations that happen every day in the United States, this is a very common one. They are surprised because this is a Chinese company that joins their game as an innovative contributor.After all, most Chinese companies are used to following rather than innovating.
「Undercurrent」: But this choice is too luxurious in the Chinese context. The big model is a heavy investment game. Not all companies have the capital to only research innovation instead of considering commercialization first.
Liang Wenfeng: The cost of innovation is definitely not low, and the inertia of the past is also related to the national conditions in the past. But now, whether it is China's economic size or the profits of large companies such as ByteDance and Tencent, they are not low in the world. What we lack in innovation is definitely not capital, but lack of confidence and not knowing how to organize high-density talents to achieve effective innovation.
"Undercurrent": Why do Chinese companies, including large companies that are not short of money, so easily regard rapid commercialization as the first priority?
Liang Wenfeng: In the past 30 years, we have only emphasized making money and ignored innovation. Innovation is not entirely driven by business, but also requires curiosity and creativity. We are just bound by the inertia of the past, but it is also a stage.
"Undercurrent": But you are a business organization, not a public welfare research institution. You choose to innovate and share it through open source. Where do you form a moat? Like the innovation of the MLA architecture in May, it will be copied by other companies soon, right?
Liang Wenfeng: In the face of disruptive technology, the moat formed by closed source is short-lived. Even if OpenAI closes the source, it cannot prevent being overtaken by others. So we precipitate value on the team, our colleagues grow in this process, accumulate a lot of know-how, and form an innovative organization and culture, which is our moat.
Open source, publish papers, in fact, do not lose anything. For technical personnel, being followed is a very fulfilling thing. In fact, open source is more like a cultural behavior, not a commercial behavior. Giving is actually an extra honor. A company doing this will also have cultural appeal.
「暗涌」: What do you think of the market believers like Zhu Xiaohu?
梁文锋: Zhu Xiaohu is self-consistent, but his style is more suitable for companies that make money quickly, and the most profitable companies in the United States are all high-tech companies that have accumulated a lot of experience.
「暗涌」: But when it comes to large models, it is difficult to form an absolute advantage with simple technological leadership. What is the bigger thing you are betting on?
梁文锋: What we see is that China's AI cannot always be in a follower position. We often say that there is a gap of one or two years between China's AI and the United States, but the real gap is the difference between originality and imitation. If this does not change, China can only be a follower forever, so some explorations cannot be avoided.
Nvidia's leading position is not just the result of the efforts of one company, but the joint efforts of the entire Western technology community and industry. They can see the technology trends of the next generation and have a roadmap in hand. The development of China's AI also requires such an ecosystem. Many domestic chips cannot develop because they lack supporting technology communities and only have second-hand information, so China must have someone standing at the forefront of technology.
"Undercurrent": DeepSeek now has a kind of idealistic temperament of OpenAI's early days, and it is also open source. Will you choose to close the source later? OpenAI and Mistral have both gone through the process of going from open source to closed source.
Liang Wenfeng: We will not close the source. We think it is more important to have a strong technology ecosystem first.
"Undercurrent": Do you have a financing plan? According to media reports, Huanfang has plans to spin off DeepSeek and list it independently. Silicon Valley AI startups will inevitably be tied to large companies in the end. Liang Wenfeng: There is no financing plan in the short term. The problem we face is never money, but the embargo on high-end chips. "Undercurrent": Many people think that doing AGI and doing quantitative are two completely different things. Quantitative can be done quietly, but AGI may need to be more high-profile and need alliances, so that your investment can be increased. Liang Wenfeng: More investment does not necessarily lead to more innovation. Otherwise, large companies can take over all innovations. "Undercurrent": You are not doing applications now. Is it because you don't have the genes for operation?
Liang Wenfeng: We believe that the current stage is an explosive period of technological innovation, not an explosive period of application. In the long run, we hope to form an ecosystem, that is, the industry directly uses our technology and output, we are only responsible for basic models and cutting-edge innovation, and then other companies build toB and toC businesses on the basis of DeepSeek. If we can form a complete upstream and downstream of the industry, we don’t have to make applications ourselves. Of course, if necessary, there is no obstacle for us to make applications, but research and technological innovation will always be our first priority.
「Undercurrent」: But if you choose an API, why choose DeepSeek instead of a large company?
Liang Wenfeng: The future world is likely to be specialized and divided. Basic large models need to continue to innovate. Large companies have their own capabilities and are not necessarily suitable.
「暗涌」: But can technology really make a difference? You also said that there is no absolute technical secret.
梁文锋: There is no secret in technology, but resetting it takes time and cost. Nvidia's graphics card, in theory, has no technical secrets and is easy to copy, but it takes time to reorganize the team and catch up with the next generation of technology, so the actual moat is still very wide.
「暗涌」: After you lowered the price, ByteDance took the lead in following up, which shows that they still feel a certain threat. What do you think of the new solution for startups to compete with large companies?
梁文锋: To be honest, we don't care much about this, we just did it by the way. Providing cloud services is not our main goal. Our goal is still to achieve AGI.
There are no new solutions, but the big companies do not have a clear advantage. The big companies have existing users, but their cash flow business is also a burden, which will make them the object of subversion at any time.
"Undercurrent": What do you think of the final outcome of the six large model startups other than DeepSeek?
Liang Wenfeng: Maybe 2 to 3 will survive. Now they are still in the stage of burning money, so those with clear self-positioning and more refined operations have a better chance of survival. Other companies may be reborn. Valuable things will not disappear, but will change in a different way.
"Undercurrent": In the era of magic squares, the attitude towards competition is evaluated as "going its own way", and rarely cares about horizontal comparison. What is the origin of your thinking about competition?
Liang Wenfeng: What I often think about is whether something can make society more efficient, and whether you can find a position in its industrial division of labor chain. As long as the end result is to make society more efficient, it is established. Many of the middle steps are staged, and excessive attention will inevitably be dazzling.
"Undercurrent": Jack Clark, former policy director of OpenAI and co-founder of Anthropic, believes that DeepSeek has hired "a group of unfathomable geniuses". What kind of people made DeepSeek v2?
Liang Wenfeng: There are no unfathomable geniuses. They are all fresh graduates from top universities, interns in their fourth and fifth doctoral programs who have not graduated, and some young people who have graduated only a few years ago.
「Undercurrent」: Many large model companies are persistent in recruiting people from overseas. Many people think that the top 50 talents in this field may not be in Chinese companies. Where do your people come from?
Liang Wenfeng: There are no people returning from overseas in the V2 model, all of them are local. The top 50 talents may not be in China, but maybe we can create such people ourselves.
「Undercurrent」: How did this MLA innovation happen? I heard that the idea first came from the personal interest of a young researcher?
Liang Wenfeng: After summarizing some mainstream changes in the Attention architecture, he suddenly had the idea to design an alternative. However, it is a long process from idea to implementation. We formed a team for this and it took several months to run it through.
「Undercurrent」: The birth of this divergent inspiration is closely related to the structure of your completely innovative organization. In the era of magic squares, you rarely assigned goals or tasks from top to bottom. But for AGI, a frontier exploration full of uncertainty, are there more management actions?
Liang Wenfeng: DeepSeek is also all bottom-up. And we generally don't divide the work in advance, but naturally divide the work. Everyone has his own unique growth experience and comes with his own ideas, so there is no need to push him. During the exploration process, when he encounters a problem, he will pull people to discuss it. But when an idea shows potential, we will also allocate resources from top to bottom.
「Undercurrent」: I heard that DeepSeek is very flexible in the deployment of cards and people.
Liang Wenfeng: There is no upper limit for each of us to transfer cards and people. If there is an idea, everyone can call the training cluster card at any time without approval. At the same time, because there is no hierarchy and cross-department, everyone can be flexibly called as long as the other party is also interested.
"Undercurrent": A loose management style also depends on you screening a group of people with strong passion drive. I heard that you are very good at recruiting people from details, and you can select excellent people from some non-traditional evaluation indicators.
Liang Wenfeng: Our selection criteria have always been passion and curiosity, so many people will have some strange experiences, which are very interesting. Many people's desire to do research far exceeds their concern for money.
「暗涌」: Transformer was born in Google's AI Lab, and ChatGPT was born in OpenAI. What do you think is the difference between the value of innovation generated by a large company's AI Lab and a startup?
梁文锋: Whether it is Google Lab, OpenAI, or even the AI Lab of a large Chinese company, they are all valuable. In the end, it was OpenAI that made it, and there was also historical contingency.
「暗涌」: Is innovation also a kind of contingency to a large extent? I see that the left and right sides of the row of conference rooms in the middle of your office area are set with doors that can be pushed open at will. Your colleagues said that this is to leave space for contingency. In the birth of Transformer, there was a story that someone who happened to pass by heard it and joined in, and finally turned it into a general framework.
梁文锋: I think innovation is first of all a question of belief. Why is Silicon Valley so innovative? First of all, we must dare. When Chatgpt came out, the whole country lacked confidence in cutting-edge innovation. From investors to large companies, they all felt that the gap was too big, so they should just do applications. But innovation first requires confidence. This confidence is usually more obvious in young people.
"Undercurrent": But you don't participate in financing, rarely speak out, and your social voice is definitely not as good as those companies with active financing. How can you ensure that DeepSeek is the first choice for people who make large models?
Liang Wenfeng: Because we are doing the most difficult things. The biggest attraction for top talents is definitely to solve the most difficult problems in the world. In fact, top talents are underestimated in China. Because there are too few hard-core innovations at the social level, they have no chance to be identified. We are doing the most difficult things, which is attractive to them.
「Undercurrent」: The previous release of OpenAI did not wait for GPT5. Many people think that this is a clear slowdown in the technology curve, and many people have begun to question the Scaling Law. What do you think?
Liang Wenfeng: We are optimistic, and the entire industry seems to be in line with expectations. OpenAI is not a god, and it is impossible to always rush to the front.
「Undercurrent」: How long do you think AGI will take to achieve? Before the release of DeepSeek V2, you released code generation and mathematical models, and also switched from dense models to MOE, so what are the coordinates of your AGI roadmap?
Liang Wenfeng: It may be 2 years, 5 years or 10 years. In short, it will be realized in our lifetime. As for the roadmap, even within our company, there is no consensus. But we do bet on three directions. The first is mathematics and code, the second is multimodality, and the third is natural language itself. Mathematics and code are the natural testing grounds for AGI, a bit like Go, a closed and verifiable system that can potentially achieve high intelligence through self-learning. On the other hand, multimodality and participation in learning in the real world of humans are also necessary for AGI. We remain open to all possibilities.
「暗涌」: What do you think the end of the big model will be like?
梁文锋: There will be specialized companies that provide basic models and basic services, and there will be a long chain of professional division of labor. More people will be on top to meet the diverse needs of the entire society.
"Undercurrent": In the past year, there have been many changes in China's large-scale model entrepreneurship. For example, Wang Huiwen, who was still very active at the beginning of last year, withdrew midway, and the companies that joined later also began to show differentiation.
Liang Wenfeng: Wang Huiwen took all the losses himself and let others get out of it. He made a choice that was the most disadvantageous to himself but good for everyone, so he is a very kind person, which I admire very much.
"Undercurrent": Where do you focus most of your energy now?
Liang Wenfeng: My main energy is researching the next generation of large models. There are still many unresolved issues.
「暗涌」: Other large model startups insist on having both. After all, technology will not bring permanent leadership. It is also important to seize the time window to bring technological advantages to products. Is DeepSeek brave enough to focus on model research because the model capabilities are not enough?
梁文锋: All routines are the product of the previous generation and may not be valid in the future. Using the business logic of the Internet to discuss the profit model of AI in the future is like discussing General Electric and Coca-Cola when Ma Huateng started his business. It is likely to be a kind of carving a boat to find a sword.
「暗涌」: In the past, Huanfang had a strong technology and innovation gene, and its growth was relatively smooth. Is this the reason why you are optimistic?
梁文锋: Huanfang has strengthened our confidence in technology-driven innovation to some extent, but it is not always a smooth road. We have gone through a long process of accumulation. What the outside world sees is the part of Huan Fang after 2015, but in fact we have been doing it for 16 years.
「Undercurrent」: Back to the topic of original innovation. Now the economy has begun to go down and capital has also entered a cold cycle, so will it bring more inhibition to original innovation?
Liang Wenfeng: I don’t think so. The adjustment of China’s industrial structure will rely more on the innovation of hard-core technology. When many people find that making quick money in the past may come from the luck of the times, they will be more willing to bend down to do real innovation.
「Undercurrent」: So you are also optimistic about this?
Liang Wenfeng: I grew up in a fifth-tier city in Guangdong in the 1980s. My father was an elementary school teacher. In the 1990s, there were many opportunities to make money in Guangdong. At that time, many parents came to my house, and basically the parents thought that studying was useless. But now looking back, the concept has changed. Because money is not easy to make, even the opportunity to drive a taxi may be gone. A generation has changed.
In the future, there will be more and more hard-core innovations. It may not be easy to understand now because the whole society needs to be educated by facts. When this society allows hard-core innovators to succeed, the collective thinking will change. We just need a bunch of facts and a process.
Binance.US halts services in Florida and Alaska post-CZ's legal woes; states express regulatory concerns.
BrianOKX ceases mining services amidst strategic refocus.
WeiliangTrezor grapples with a phishing scam, urging user vigilance following an email system breach.
WeiliangSolana and CoinCDX's $3 million investment aims to transform India's Web3 sector by empowering developers through education, collaboration, and innovation.
MiyukiUSDC's supply ascends post-SVB crisis, hinting at market recovery.
AlexOpenSea navigates market turbulence, strategizing through acquisitions and broadening NFT applications, despite a significant decline in trading volume and market share.
MiyukiFTX is liquidating assets and navigating legal challenges to rebuild financial stability and customer trust.
WeiliangXRP Ledger witnesses a surge in small wallets and whale activity, indicating rising interest and broader adoption, while Gemini exchange's inclusion of XRP contracts signifies growing market acceptance.
BrianCrypto market anticipates SEC's bitcoin ETF decision and U.S. Treasury's refinancing announcement, pivotal moments for investments.
Alex