Author: Haotian
When I woke up, many friends asked me to read manus, which is known as a truly universal AI Agent in the world, capable of independent thinking, planning and executing complex tasks, and delivering complete results. It sounds very cool, but apart from the voices of many friends who are anxious about losing their jobs, what will it bring to the explosion of web3 DeFai scenarios? Here are my thoughts:
1) About a month ago, OpenAI launched a similar product Operator. AI can independently complete tasks including restaurant reservations, shopping, ticket booking, takeaway ordering, etc. in the browser. Users can visually supervise and take over control at any time.
The emergence of this Agent has not been discussed by many people because it is a single model-driven, or tool-called framework. Once users think that key decisions still need to be intervened, they lose the idea of relying on it to perform tasks.
2) On the surface, manus seems to be not much different, but it has many more application scenarios, including screening resumes, researching stocks, buying real estate, etc., but in fact, the difference is the framework and execution system behind it. Manus is driven by a multimodal large model and innovatively adopts a multi-signature system.
In short, AI should imitate the PDCA cycle of human execution (plan-execute-check-act), which will be completed by multiple large models working together. Each model focuses on a specific link, which can not only reduce the decision-making risk of a single model in executing tasks, but also improve execution efficiency. The so-called "multi-signature system" is actually a decision-making verification mechanism for multi-model collaboration, which ensures the reliability of decision-making and execution by requiring the joint confirmation of multiple professional models.
3) In this comparison, the advantages of manus are obviously highlighted, and the series of operation experiences shown in the video demo really give people an extraordinary experience. But objectively speaking, Manus's iterative innovation of Operator is just the beginning, and it has not yet reached a subversive revolutionary significance.
The key point lies in the complexity of the execution task, as well as the definition of the fault tolerance rate and delivery result success rate of the large model after the non-uniform standard user input prompt enters. Otherwise, following this set of innovations, can the DeFai scenario of web3 be maturely applied immediately? Obviously, it cannot be done yet:
For example: In the DeFai scenario, the Agent needs to execute transaction decisions. There needs to be an Oracle-layer Agent responsible for on-chain data collection and verification, data integration and analysis, and real-time monitoring of on-chain prices to capture transaction opportunities. This process is very challenging for real-time analysis. There may be a useful transaction opportunity one second ago, but after the Oracle large model is transmitted to the transaction execution Agent, the transaction opportunity no longer exists (arbitrage window);
This actually exposes the biggest weakness of this type of multimodal large model in making execution decisions, how to connect to the Internet, touch the chain to retrieve and analyze Real-Time-level data, and analyze transaction opportunities from it, and then capture transactions. The network environment is actually okay. The order prices of many e-commerce websites do not change in real time, which is not easy to cause huge dynamic balance problems for the entire multimodal collaboration. If it is on the chain, such challenges exist almost all the time.
4) Therefore, the emergence of manus will indeed set off a wave of anxiety in the circle of friends in the web2 field. After all, many repetitive clerical and information processing jobs may face the risk of being replaced by AI. But let them worry about theirs.
We have to objectively understand the role of web3 in promoting DeFai application scenarios:
We must admit that it is of great significance. After all, the LLM OS and Less Structure more intelligence concepts it proposed, especially the multi-signature system, will give web3 a lot of inspiration to expand the combination of DeFi and AI.
This actually corrects the major misunderstanding of most DeFai projects. Don't rely on a large model to achieve complex goals such as autonomous thinking and decision-making of AI Agents. This is not practical in financial scenarios.
The realization of the true DeFai vision requires solving complex problems such as the upper limit of the capacity of single AI models, the atomicity guarantee of multimodal interactive collaboration, the unified resource scheduling and control of multimodal systems, and the system fault tolerance and fault handling mechanism.
For example: Oracle layer Agent, responsible for collecting and analyzing on-chain data, monitoring prices, and forming an effective data source;
Decision layer Agent, analyzes and assesses risks based on the data fed by Oracle, and formulates a set of decision and action plans;
Execution layer Agent, based on the various solutions given by the decision layer, and considering the actual situation, executes, including gas fee optimization, cross-chain status, transaction sorting conflicts, etc.
Only when this series of agents are powerful at the same time and a huge system framework is settled, a true DeFai revolution will be set off.