Inventory of AI autonomous agents, full coverage of construction, application, and evaluation, 32-page overview by Wen Jirong from Gaoling of the National People’s Congress

Editor: Du Wei, Chen Ping

This paper provides a comprehensive introduction to the construction, potential application, and evaluation of agents based on large language models (LLMs), which is of great significance for a comprehensive understanding of the development of this field and for inspiring future research.

Image source: Generated by Unbounded AI‌

In today's AI era, autonomous agents are considered a promising path towards artificial general intelligence (AGI). The so-called autonomous agent is capable of completing tasks through autonomous planning and instructions. In early development paradigms, the policy function that determines the agent's actions is dominated by heuristics, which are gradually refined in the environment interaction.

However, in unconstrained open-domain environments, it is often difficult for autonomous agents to act with human-level proficiency.

With the great success of large language models (LLMs) in recent years, it has shown the potential to achieve human-like intelligence. Therefore, thanks to its powerful capabilities, LLM is increasingly used as the core coordinator for creating autonomous agents, and various AI agents have emerged successively. These agents offer a viable path to more complex and adaptable AI systems by mimicking human-like decision-making processes.

*A list of LLM-based autonomous agents, including tool agents, simulated agents, general agents, and domain agents. *

At this stage, it is very important to conduct a holistic analysis of the emerging LLM-based autonomous agents, and it is of great significance to fully understand the development status of this field and inspire future research.

In this paper, researchers from the Hillhouse School of Artificial Intelligence at Renmin University of China conducted a comprehensive survey of LLM-based autonomous agents, focusing on three aspects of their construction, application, and evaluation.

Paper address:

For the construction of the agent, they proposed a unified framework consisting of four parts, which are the configuration module to represent the attributes of the agent, the memory module to store historical information, the planning module to formulate future action strategies, and the action module to execute planning decisions. After introducing the typical agent modules, the researchers also summarize the commonly used fine-tuning strategies to enhance the adaptability of agents to different application scenarios.

The researchers then outline potential applications of autonomous agents, exploring how they could benefit the fields of social sciences, natural sciences, and engineering. Finally, evaluation methods for autonomous agents are discussed, including subjective and objective evaluation strategies. The figure below shows the overall structure of the article.

Source:

Construction of autonomous agents based on LLM

In order to make the LLM-based autonomous agent more efficient, there are two aspects to consider: first, what kind of architecture should be designed so that the agent can make better use of LLM; second, how to effectively learn parameters.

Agent architecture design: This paper proposes a unified framework to summarize the architecture proposed in previous studies. The overall structure is shown in Figure 2, which consists of profiling module, memory module, planning module and action module.

In summary, the analysis module aims to identify what role the agent is; the memory and planning module places the agent in a dynamic environment, enabling the agent to recall past behaviors and plan future actions; Decisions are translated into concrete outputs. Among these modules, the analysis module affects the memory and planning modules, and these three modules together affect the action module.

Analysis Module

Autonomous agents perform tasks through specific roles, such as programmers, teachers, and domain experts. The analysis module aims to indicate what the agent's role is, and this information is usually written into the input prompts to influence the LLM behavior. In existing works, there are three commonly used strategies to generate agent profiles: hand-crafted methods; LLM-generation methods; dataset alignment methods.

Memory module

Memory modules play a very important role in the construction of AI agents. It memorizes information perceived from the environment and uses the recorded memory to facilitate future actions of the agent. Memory modules can help agents accumulate experience, realize self-evolution, and complete tasks in a more consistent, reasonable, and effective manner.

Planning Module

When humans are faced with a complex task, they first break it down into simple subtasks, and then solve each subtask one by one. The planning module endows the LLM-based agent with the thinking and planning capabilities needed to solve complex tasks, making the agent more comprehensive, powerful and reliable. This article presents two planning modules: planning without feedback and planning with feedback.

Action Module

The action module aims to transform the decision of the agent into a specific result output. It directly interacts with the environment and determines the effectiveness of the agent in completing tasks. This section introduces from the perspective of action goal, policy, action space and action influence.

In addition to the above 4 parts, this chapter also introduces the learning strategies of the agent, including learning from examples, learning from environmental feedback, and learning from interactive human feedback.

Table 1 lists the correspondence between previous work and our taxonomy:

LLM-based autonomous agent application

This chapter explores the transformative impact of LLM-based autonomous agents in three distinct fields: social sciences, natural sciences, and engineering.

For example, LLM-based agents can be used to design and optimize complex structures such as buildings, bridges, dams, roads, etc. Previously, some researchers proposed an interactive framework in which human architects and AI agents work together to build structural environments in 3D simulations. Interactive agents can understand natural language instructions, place modules, seek advice, and incorporate human feedback, showing the potential of human-machine collaboration in engineering design.

In computer science and software engineering, for example, LLM-based agents offer the potential to automate coding, testing, debugging, and documentation generation. Some researchers have proposed ChatDev, which is an end-to-end framework in which multiple agents communicate and collaborate through natural language dialogue to complete the software development life cycle; ToolBench can be used for tasks such as code auto-completion and code recommendation; MetaGPT can play the role of product manager, architect, project manager and engineer, internally supervise code generation and improve the quality of the final output code, etc.

The following table shows representative applications of LLM-based autonomous agents:

LLM-Based Evaluation of Autonomous Agents

This article introduces two commonly used evaluation strategies: subjective evaluation and objective evaluation.

Subjective evaluation refers to the ability of human beings to test LLM-based agents through various means such as interaction and scoring. In this case, the people participating in the evaluation are often recruited through crowdsourcing platforms; and some researchers believe that crowdsourcing personnel are unstable due to individual ability differences, so expert annotations are also used for evaluation.

Besides, in some current studies, we can use LLM agents as subjective evaluators. In the ChemCrow study, for example, uatorGPT evaluates experimental results by assigning a rating that considers both the successful completion of the task and the accuracy of the underlying thought process. Another example is that Chat formed a LLM-based multi-agent referee team to evaluate the model's generation results through debate.

Objective evaluation has several advantages over subjective evaluation, which refers to the use of quantitative metrics to evaluate the capabilities of LLM-based autonomous agents. This section reviews and synthesizes objective evaluation methods from the perspective of metrics, strategies, and benchmarks.

We can combine these two methods during usage assessment.

Table 3 summarizes the correspondence between previous work and these evaluation strategies:

For more information, please refer to the original paper.

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments