Author: accelxr, 1KX; Translation: 0xjs@黄金财经
The main purpose of current generative models is content creation and information filtering. However, recent research and discussion on AI agents (autonomous actors that use external tools to complete user-defined goals) show that AI may gain substantial unlocking if it is provided with economic access similar to the Internet in the 1990s.
To do this, agents need to act as agents over assets they can control, because the traditional financial system is not set up for them.
This is where crypto comes into play: crypto provides a digital payment and ownership layer with fast settlement that is particularly suitable for building AI agents.
In this article, I will introduce you to the concepts of agents and agent architectures, how examples from research demonstrate that agents have emerging properties beyond traditional LLMs, and projects that build solutions or products around crypto-based agents.
What is an agent
AI agents are LLM-driven entities that are able to plan and take actions to achieve goals over multiple iterations.
Agent architectures consist of a single agent or multiple agents working together to solve a problem.
Typically, each agent is given a personality and has access to a variety of tools that will help them get the job done, either independently or as part of a team.
Agent architectures differ from how we typically interact with LLMs today:
Zero-shot prompting is how most people interact with these models: you input a prompt, and the LLM generates a response based on its pre-existing knowledge.
In an agent architecture, you initialize a goal, the LLM breaks it down into subtasks, and then it recursively prompts itself (or other models) to complete each subtask autonomously until the goal is reached.
Single-Agent Architectures vs. Multi-Agent Architectures
Single-Agent Architectures: One language model performs all reasoning, planning, and tool execution on its own. There is no feedback mechanism from other agents, but humans can choose to provide feedback to the agent.
Multi-Agent Architectures: These architectures involve two or more agents, where each agent can use the same language model or a set of different language models. Agents can use the same tools or different tools. Each agent usually has its own role.
Vertical structure: One agent acts as a leader and other agents report to it. This helps organize the group's output.
Horizontal structure: A large group discussion about a task, where each agent can see other messages and volunteer to complete a task or call a tool.
Agent Architecture: Profiles
Agents have profiles or personalities, which define roles as prompts to influence the behavior and skills of the LLM. This depends a lot on the specific application.
Probably many people already use this today as a prompting technique: "You are a nutrition expert. Provide me with a meal plan...". Interestingly, providing roles to the LLM can improve its output compared to the baseline.
Profiles can be crafted by the following methods:
Handcrafted: Profiles manually specified by a human creator; most flexible, but also time consuming.
LLM-generated: Profiles generated using an LLM that contains a set of rules around composition and attributes + (optionally) a small number of sample examples.
Dataset-aligned: Profiles are generated from a real-world dataset of people.
Agent Architecture: Memory
An agent’s memory stores information it senses from the environment and uses that information to make new plans or actions. Memory enables the agent to self-evolve and act based on its experience.
Unified Memory: Similar to short-term memory achieved through contextual learning/through constant cues. All relevant memories are passed to the agent at each prompt. Limited mainly by the size of the context window.
Hybrid: Short-term + Long-term memory. Short-term memory is a temporary buffer for the current state. Reflections or useful long-term information are permanently stored in a database. There are several ways to do this, but a common approach is to use a vector database (memories are encoded as embeddings and stored; recall comes from similarity search)
Formats: Natural Language, Databases (e.g. SQL fine-tuned to understand SQL queries), Structured Lists, Embeddings
Agent Architecture: Planning
Complex tasks are deconstructed into simpler subtasks to be solved individually.
Planning without Feedback:
In this approach, the agent does not receive feedback after taking an action that affects future behavior. An example is Chain of Thought (CoT), where the LLM is encouraged to express its thought process when providing an answer.
Single-path reasoning (e.g. zero-shot CoT)
Multi-path reasoning (e.g. self-consistent CoT, where multiple CoT threads are spawned and the highest frequency answer is used)
External planner (e.g. Planning Domain Definition Language)
Planning with feedback:
Iteratively refine subtasks based on external feedback
Environmental feedback (e.g. game task completion signals)
Human feedback (e.g. soliciting feedback from users)
Model feedback (e.g. soliciting feedback from another LLM - crowdsourcing)
Agent Architecture: Action
Actions are responsible for translating the agent's decisions into concrete results.
Behavior goals can take many possible forms, such as:
Task completion (e.g. crafting an iron pickaxe in Minecraft)
Communication (e.g. sharing information with another agent or a human)
Environment exploration (e.g. searching its action space and learning its abilities).
Behaviors are typically generated from memory recall or plan following, and the behavior space is composed of internal knowledge, APIs, databases/knowledge bases, and external models of itself.
Agent Architecture: Capability Acquisition
For an agent to correctly execute actions within its action space, it must have task-specific abilities. There are two main ways to achieve this:
With fine-tuning: train the agent on a dataset of human-annotated, LLM-generated, or real-world example behaviors.
Without fine-tuning: the innate capabilities of the LLM can be exploited through more sophisticated cue engineering and/or mechanism engineering (i.e., incorporating external feedback or experience accumulation while performing trial and error).
Agent Examples from the Literature
Generative Agents: Interactive Simulations of Human Behavior: Generative agents were instantiated in a virtual sandbox environment, showing that multi-agent systems have emergent social behaviors. Starting with a single user-specified prompt about an upcoming Valentine’s Day party, the agent automatically sends invitations, meets new people, goes on dates with each other, and coordinates to attend the party together at a suitable time over the next two days. You can try this yourself using the a16z AI Town implementation.
Description Explained Plan Selection (DEPS): The first zero-shot multi-task agent that can complete over 70 Minecraft tasks.
Voyager: The first LLM-powered lifelong learning agent in Minecraft that continuously explores the world, acquires skills, and makes new discoveries without human intervention. Its skill execution code is continuously improved based on feedback from trial and error.
CALYPSO: An agent designed for the game Dungeons & Dragons that assists Dungeon Masters in creating and telling stories. Its short-term memory is built on scene descriptions, monster information, and previous summaries.
Ghost in Minecraft (GITM): An average agent in Minecraft with a 67.5% success rate in obtaining diamonds and a 100% completion rate for all items in the game.
SayPlan: Large-scale task planning for robots based on LLM, using 3d scene graph representation, demonstrating the ability to perform long-term task planning for robots from abstract and natural language instructions.
HuggingGPT: Task planning using ChatGPT based on user prompts, selecting models based on descriptions on Hugging Face, and performing all subtasks, achieving impressive results in language, vision, speech, and other challenging tasks.
MetaGPT: Takes input and outputs user stories/competitive analysis/requirements/data structures/APIs/documentation, etc. Internally, there are multiple agents that make up various functions of the software company.
ChemCrow: An LLM chemical agent designed to complete tasks such as organic synthesis, drug discovery, and material design using 18 expert-designed tools. Autonomously planned and executed the synthesis of insect repellents, three organic catalysts, and guided the discovery of a new chromophore.
BabyAGI: Generic infrastructure for creating, prioritizing, and executing tasks using OpenAI and a vector database such as Chroma or Weaviate.
AutoGPT: Another example of generic infrastructure for launching LLM agents.
Agent Examples in Crypto
(Note: not all examples are LLM based + some may be more loosely based on the agent concept)
FrenRug from Ritualnet: Based on the GPT-4 Turkish Carpet Salesman game { https:// aiadventure.spiel.com/carpet }. Frenrug is a broker that anyone can try to convince to buy their Friend.tech Key. Each user message is passed to multiple LLMs run by different Infernet nodes. These nodes respond on-chain and the LLMs vote on whether the agent should buy the proposed key. When enough nodes respond, the votes aggregate, the supervised classifier model determines the action and passes a proof of validity on-chain, which allows the off-chain execution of the multiclassifier to be verified.
Prediction market agent on Gnosis using autonolas: An AI bot is essentially a smart contract wrapper around an AI service that anyone can call by paying and asking a question. The service monitors the request, performs the task, and returns the answer on-chain. This AI bot infrastructure has been extended to prediction markets via Omen, with the basic idea that agents will actively monitor and bet on predictions from news analysis, ultimately producing aggregated predictions that are closer to the true odds. Agents search for markets on Omen, autonomously pay the "bot" for predictions on the topic, and trade using the market.
ianDAOs GPT<>Safe Demo: GPT autonomously manages USDC in its own Safe multi-signature wallet on the Base chain using the syndicateio Trading Cloud API. You can talk to it and make suggestions on how to best use its capital, and it may allocate it based on your suggestions.
Game Agents: There are multiple ideas here, but in short, AI agents in a virtual environment are both companions (like AI NPCs in Skyrim) and competitors (like a group of chubby penguins). Agents can automatically execute revenue strategies, provide goods and services (like: shopkeepers, traveling merchants, sophisticated generative quest givers), or be semi-playable characters like in Parallel Colony and Ai Arena.
Safe Guardian Angels: Use a team of AI agents to monitor wallets and defend against potential threats to protect user funds and improve wallet security. Features include automatic revocation of contract permissions and withdrawal of funds in the event of anomalies or hacks.
Botto: While Botto is a loosely defined example of an on-chain agent, it demonstrates the concept of autonomous on-chain artists, creating works that are voted on by token holders and auctioned on SuperRare. One can imagine various extensions that adopt multimodal agent architectures. ---
Some notable agent projects
(Note: not all are LLM-based + some may be more loosely based on the agent concept)
AIWay Finder - A decentralized knowledge graph of protocols, contracts, contract standards, assets, functions, API functions, routines + paths (i.e. a virtual roadmap of the blockchain ecosystem that pathfinder agents can navigate). Users are rewarded for identifying viable paths for agents to use. Additionally, you can mint shells (i.e. agents) containing character settings and skill activations, which can then be plugged into the pathfinder knowledge graph.
Ritualnet - As shown in the frenrug example above, Ritual infernet nodes can be used to set up multi-agent architectures. Nodes listen for on-chain or off-chain requests and provide outputs with optional proofs.
Morpheus - A peer-to-peer network of personal general AIs that can execute smart contracts on behalf of users. This can be used for web3 wallets and tx intent management, data parsing via chatbot interfaces, recommendation models for dapps and contracts, and scaling agent operations through long-term memory that connects application and user data.
Dain Protocol - Exploring multiple use cases for deploying agents on Solana. Recently demonstrated the deployment of a crypto trading bot that can extract on-chain and off-chain information to act on behalf of a user (e.g. sell BODEN if Biden loses)
Naptha - Agent orchestration protocol with an on-chain task marketplace for contracting agents, operator nodes to orchestrate tasks, an LLM workflow orchestration engine that supports asynchronous messaging across different nodes, and a workflow proof system to verify execution.
Myshell - An AI character platform similar to http:// character.ai where creators can monetize agent profiles and tools. Multimodal infrastructure with some interesting example agents including translation, education, companionship, coding, and more. Contains both simple code-free agent creation and a more advanced developer mode for assembling AI widgets.
AI Arena - A competitive PvP fighting game where players can buy, train, and fight against AI-powered NFTs. Players train their agents NFTs through imitation learning, where the AI learns how to play the game in different maps and scenarios by learning the associated probabilities of player actions. Once trained, players can send their agents into ranked battles for token rewards. Not based on LLM, but still an interesting example of the possibilities of agent-based gaming.
Virtuals Protocol - A protocol for building and deploying multimodal agents to games and other online spaces. The three main archetypes of virtuals today include IP character mirrors, specific function agents, and personal avatars. Contributors contribute data and models to virtuals, and validators act as gatekeepers. There is an economic incentive mechanism to promote development and monetization.
Brianknows - Provides a user interface for users to interact with agents that can perform transactions, research cryptocurrency-specific information, and deploy smart contracts in real time. Currently supports over 10 actions out of over 100 integrations. A recent example is enabling agents to stake ETH in Lido on behalf of users using natural language.
Autonolas - Provides lightweight local and cloud-based agents, consensus-operated decentralized agents, and specialized agent economies. Prominent examples include DeFi and prediction-based agents, AI-driven governance representatives, and agent-to-agent tool markets. Provides a protocol for coordinating and incentivizing agent operations + the OLAS stack, an open source framework for developers to build co-owned agents.
Creator.Bid - Provides users with social media persona agents that connect to X and Farcaster real-time APIs. Brands can launch knowledge-based agents to execute brand-aligned content on social platforms.
Polywrap — provides a variety of agent-based products such as Indexer (social media agent by Farcaster), AutoTx (planning and trade execution agent built with Morpheus and flock.io), predictionprophet.ai (prediction agent with Gnosis and Autonolas), and fundpublicgoods.ai (agent for grant resource allocation).
Verification — Since economic flows will be directed by agents, output verification will be very important (more on this in a future post). Verification methods include those from Ora Protocol, zkML from teams like Modulus Labs+Giza+ EZKL, game theory solutions, and hardware-based solutions like TEEs.
Some thoughts on on-chain agents
Ownable, tradable, token-gated agents that can perform various types of functions, from companionship to financial applications,
Agents that can identify, learn, and participate in game economies on your behalf; or autonomous agents that can act as players in collaborative, competitive, or fully simulated environments.
Agents that can simulate real human behavior for yield opportunities
Multi-agent managed smart wallets that can act as autonomous asset managers
AI-managed DAO governance (e.g. token delegation, proposal creation or management, process improvement, etc.)
Use web3 storage or database as a composable vector embedding system for shared and persistent memory state
Agents running locally that participate in the global consensus network to perform user-defined tasks
Knowledge graph of existing and new protocol interactions and APIs
Autonomous guardian networks, multi-signature security, smart contract security and functionality enhancements
True autonomous investment DAOs (e.g., collectors using art historian, investment analyst, data analyst, and degen agent roles) DAOs)
Token economics and contract security simulation and testing
General intent management, especially in the context of crypto UX like bridging or DeFi
Artistic or experimental projects
Attracting the next billion users
As Jesse Walden, co-founder of the Varaint Fund, recently said, autonomous agents are an evolution, not a revolution, in how blockchain is used: We already have protocol task bots, sniper bots, MEV seekers, robotics toolkits, etc. Agents are just an extension of all that.
Many areas of crypto are built in a way that is conducive to agent execution, such as fully on-chain gaming and DeFi. Assuming the cost of LLMs is trending down relative to task performance + the accessibility of creating and deploying agents is increasing, it’s hard to imagine a world where AI agents don’t dominate on-chain interactions and become the next billion users of crypto.
Readings:
AI Agents That Can Bank Themselves Using Blockchains
The new AI agent economy will run on Smart Accounts
A Survey on Large Language Model based Autonomous Agents (I used this for identifying the taxonomy of agentic architectures above, highly recommend)
ReAct: Synergizing Reasoning and Acting in Language Models
Generative agents: Interactive simulacra of human behavior
Reflexion: Language Agents with Verbal Reinforcement Learning
Toolformer: Language Models Can Teach Themselves to Use Tools
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
Voyager: An Open-Ended Embodied Agent with Large Language Models
LLM Agents Papers GitHub Repo
Original link