Everyone is talking about AI Agent, but everyone is not talking about the same thing, which leads to differences between the AI Agent we care about and the public's perspective and the perspective of AI practitioners.
Long ago, I wrote that Crypto is an illusion of AI. From then until now, the combination of Crypto and AI has always been a one-sided love. AI practitioners rarely mention terms such as Web3/blockchain, while Crypto practitioners are deeply in love with AI. After seeing the wonder that AI Agent frameworks can be tokenized, I don’t know whether AI practitioners can be truly introduced into our world.
AI is Crypto’s agent. This is the best commentary on this round of AI craze from the perspective of encryption. Crypto’s enthusiasm for AI is different from other industries. We especially hope to integrate the issuance and operation of financial assets with it.
Agent evolution, the origin of technology marketing
In its origin, AI Agent has at least three sources, and OpenAI's AGI (artificial general intelligence) lists it as an important step, making the word a buzzword beyond the technical level, but in essence, Agent is not a new concept, even with the empowerment of AI, it is difficult to say that it is a revolutionary technological trend.
One is the AI Agent in the eyes of OpenAI, similar to L3 in the classification of autonomous driving. AI Agent can be regarded as having certain high-level assisted driving capabilities, but it cannot completely replace humans.
Image Description: AGI stage planned by OpenAI Image source: https://www.bloomberg.com/
Secondly, as the name suggests, AI Agent is an Agent empowered by AI. Agent mechanisms and modes are not uncommon in the computer field. Under OpenAI's plan, Agent will become the L3 stage after the dialogue form (ChatGPT) and reasoning form (various Bots). Its characteristic is that it "autonomously performs certain behaviors", or use the definition of Harrison Chase, founder of LangChain: "AI Agent is an Agent that uses LLM This is the mystery. Before the emergence of LLM, Agent mainly executed the automated process set by humans. For example, when programmers designed crawler programs, they would set User-Agent to imitate the browser version, operating system and other details used by real users. Of course, if AI Agent is used to imitate human behavior more carefully, an AI Agent crawler framework will appear, which will make the crawler "more like a human".
In such a change, the addition of AI Agent must be combined with existing scenarios. There is almost no completely original field. Even the code completion and generation capabilities of Curosr, Github copilot, etc. are further functional enhancements under the thinking of LSP (Language Server Protocol). There are many such examples:
Human-computer interaction: Web 1.0 CLI TCP/IP Netscape Navigator--Web 2.0 GUI/RestAPI/search engine/Google/Super App --Web 3.0 AI Agent + dapp?
To explain briefly, in the process of human-computer interaction, the combination of Web 1.0 GUI and browser truly allowed the public to use computers without any barriers. The representative one is the combination of Windows+IE, and API is the data abstraction and transmission standard behind the Internet. The browser in the Web 2.0 era is the era of Chrome, and the shift to mobile terminals has changed people's Internet usage habits. Apps on super platforms such as WeChat and Meta cover all aspects of people's lives.
Third, the concept of intent in the Crypto field is the forerunner of the explosion in the AI Agent circle, but it should be noted that this is only valid within Crypto. From the incomplete Bitcoin script to the Ethereum smart contract, it is itself a general application of the Agent concept, and the cross-chain bridge that was spawned later - chain abstraction, EOA-AA wallet are all natural extensions of this kind of thinking, so after AI Agent "invades" Crypto, it is not surprising that it leads to DeFi scenarios.
This is the confusion of the AI Agent concept. In the context of Crypto, what we actually want to achieve is an Agent that "automatically manages finances and automatically buys new Memes", but under the definition of OpenAI, such a dangerous scenario even requires L4/L5 to be truly realized, and then the public is playing with automatic code generation or AI one-click summary, ghostwriting and other functions, and the communication between the two sides is not in the same dimension.
After understanding what we really want, let's focus on the organizational logic of AI Agent. The technical details will be hidden behind it. After all, the agent concept of AI Agent is to remove the obstacles to large-scale popularization of technology, just like the browser turning stones into gold for the personal PC industry. Therefore, our focus will be on two points: looking at AI Agent from the perspective of human-computer interaction, and the difference and connection between AI Agent and LLM, which will lead to the third part: what will be left after the combination of Crypto and AI Agent.
let AI_Agent = LLM+API;
Before the chat-style human-computer interaction mode such as ChatGPT, the interaction between humans and computers was mainly in the form of GUI (graphical user interface) and CLI (command-line interface). GUI thinking continued to derive multiple specific forms such as browsers and apps, and the combination of CLI and Shell rarely changed.
But this is only the human-computer interaction on the "front end". With the development of the Internet, the increase in the amount and type of data has led to an increase in the "back end" interaction between data and data, and between Apps and Apps. The two rely on each other. Even simple web browsing behavior actually requires the coordination and cooperation of the two.
If the interaction between people and browsers and Apps is the user entrance, then the links and jumps between APIs support the actual operation of the Internet. In fact, this is also part of the Agent. Ordinary users do not need to understand terms such as command lines and APIs to achieve their goals.
LLM is the same. Now users can go a step further and don’t even need to search. The whole process can be described as the following steps:
The user opens a chat window;
The user uses natural language, that is, text or voice to describe his or her needs;
LLM parses it into process-based operation steps;
LLM returns the results to the user.
It can be found that in this process, Google is the one that faces the greatest challenge, because users do not need to open the search engine, but various GPT-like dialogue windows. The traffic entrance is quietly changing, and this is why some people think that this round of LLM is the death of the search engine.
So what role does AI Agent play in this?
In a nutshell, AI Agent is a specialization of LLM.
The current LLM is not AGI, that is, it is not the ideal L5 organizer of OpenAI. Its capabilities are greatly limited. For example, it is easy to hallucinate if users input too much information. One of the important reasons is the training mechanism. For example, if you repeatedly tell GPT 1+1=3, then there is a certain probability that the answer given in the next interaction is 4 when you ask 1+1+1=?
Because at this time the feedback of GPT comes entirely from the individual user, if the model is not connected to the Internet, then it is entirely possible that your information will change the operating mechanism, and in the future it will be a mentally retarded GPT that only knows 1+1=3. However, if the model is allowed to connect to the Internet, then the feedback mechanism of GPT will be more diverse, after all, the majority of people on the Internet think that 1+1=2.
Continue to increase the difficulty. If we must use LLM locally, how can we avoid such problems?
A simple and crude way is to use two LLMs at the same time, and stipulate that each time the answer to a question must be verified by the two LLMs to reduce the probability of error. If that doesn't work, there are some other ways, such as letting two users handle a process each time, one responsible for asking, and the other responsible for fine-tuning the question, to try to make the language more standardized and rational. Of course, sometimes being connected to the Internet cannot completely avoid problems. For example, if LLM retrieves answers that are stupid, that may be even worse. However, avoiding these materials will reduce the amount of available data. In this case, it is entirely possible to split and reorganize the existing data, or even generate some new data based on the old data to make the answers more reliable. In fact, this is the natural language understanding of RAG (Retrieval-Augmented Generation). Humans and machines need to understand each other. If we allow multiple LLMs to understand and collaborate with each other, we are essentially touching upon the operating mode of AI Agents, that is, human agents call other resources, which can even include large models and other agents.
Thus, we have grasped the connection between LLM and AI Agent: LLM is a collection of knowledge, and humans can communicate with it through a dialogue window. However, in practice, we find that some specific task flows can be summarized as specific applets, bots, and instruction sets, and we define these as Agents.
AI Agent is still part of LLM, and the two cannot be equated. The calling method of AI Agent, based on LLM, places special emphasis on the collaboration with external programs, LLM and other Agents, so there is a feeling that AI Agent = LLM+API.
Then, in the LLM workflow, you can add AI Agent instructions. Let's take the API data of calling X as an example:
The human user opens the chat window;
The user uses natural language, that is, text or voice to describe his or her needs;
LLM parses it into an API call AI Agent task and transfers the conversation authority to the Agent;
The AI Agent asks user X for his or her account and API password, and communicates with X online according to the user's description;
The AI Agent returns the final result to the user.
Remember the evolution of human-computer interaction? The browsers and APIs that existed in Web 1.0 and Web 2.0 will still exist, but users can completely ignore their existence and only need to interact with AI Agents. API calls and other processes can be used in a conversational manner, and these API services can be of any type, including local data, network information, and data from external apps, as long as the other party opens the interface and the user has the right to use it.
A complete AI Agent usage process is shown in the figure above. LLM can be regarded as a separate part from AI Agent, or as two sub-links of a process. However, no matter how it is divided, it is serving the needs of users.
From the perspective of the human-computer interaction process, it is even the user who is talking to himself. You only need to express your thoughts and ideas, and the AI/LLM/AI Agent will guess your needs again and again. The addition of the feedback mechanism and the requirement for LLM to remember the current context can ensure that the AI Agent will not suddenly forget what it is doing.
In short, AI Agent is a more personalized product, which is the essential difference between it and traditional scripts and automation tools. It considers the real needs of users like a private butler. However, it must be pointed out that this personality is still the result of probabilistic speculation. L3-level AI Agent does not have human understanding and expression capabilities, so it is dangerous to connect it with external APIs.
After the AI framework is monetized
The fact that the AI framework can be monetized is an important reason why I am interested in Crypto. In the traditional AI technology stack, the framework is not very important, at least not as important as data and computing power, and it is difficult to monetize AI products from the framework. After all, most AI algorithms and model frameworks are open source products, and the truly closed source is sensitive information such as data.
In essence, the AI framework or model is a container and combination of a series of algorithms, which is equivalent to an iron pot for stewing goose, but the variety of goose and the control of the heat are the key to distinguishing the taste. The product sold should have been the goose, but now there are Web3 customers, they want to buy the casket and return the pearl, buy the pot and abandon the goose.
The reason is not complicated. Web3 AI products are basically copycats. They are all customized products based on existing AI frameworks, algorithms and products. Even the technical principles behind different Crypto AI frameworks are not much different. Since they cannot be distinguished technically, they need to make changes in terms of names, application scenarios, etc. Therefore, some minor adjustments to the AI framework itself have become the support for different tokens, thus creating a framework bubble for Crypto AI Agent.
Since you don’t need to invest heavily in training data and algorithms, the name distinction method is particularly important. No matter how cheap DeepSeek V3 is, it still requires a lot of PhD hair, GPU and electricity consumption.
In a sense,this is also the consistent style of Web3 recently, that is, the token issuance platform is more valuable than the token, Pump.Fun/Hyperliquid are all like this. Agents should be applications and assets, but the Agent issuance framework has become the most popular product.
In fact, this is also a value anchoring idea. Since there is no distinction between various types of Agents, the Agent framework is more stable and can produce a value siphon effect for asset issuance. This is the current 1.0 version of the combination of Crypto and AI Agent.
Version 2.0 is emerging, which is a typical combination of DeFi and AI Agent. The concept of DeFAI is of course a market behavior stimulated by heat, but if we take the following situations into consideration, we will find that it is different:
Morpho is challenging old lending products such as Aave;
Hyperliquid is replacing dYdX's on-chain derivatives and even challenging Binance's CEX listing effect;
Stablecoins are becoming a payment tool for off-chain scenarios.
It is in the context of DeFi's transformation that AI is improving the basic logic of DeFi. If the biggest logic of DeFi before was to verify the feasibility of smart contracts, then AI Agent changes the manufacturing logic of DeFi. You don't need to understand DeFi to make DeFi products. This is a bottom-level empowerment that goes further than chain abstraction.
The era when everyone is a programmer is coming. Complex calculations can be outsourced to the LLM and API behind AI Agent, and individuals only need to focus on their own ideas. Natural language can be efficiently converted into programming logic.
Conclusion
This article does not mention any Crypto AI Agent tokens and frameworks, because Cookie.Fun has done well enough, AI Agent information aggregation and token discovery platform, then AI Agent framework, and finally Agent tokens that are born and die. It is no longer valuable to continue listing information in the article.
However, in the observation during this period, the market still lacks a real discussion on what Crypto AI Agent points to. We cannot always discuss pointers, memory changes are the essence.
It is the ability to continuously turn various types of targets into assets that is the charm of Crypto.
Preview
Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG