Back to AI Research

AI Research

Humans' ALMANAC: A Human Collaboration Dataset... | AI Research

Key Takeaways

  • Recent advances in Large Language Model (LLM) agents have enabled them to perform complex tasks like planning and tool use, yet they often struggle to act as...
  • Recent advances in LLM agents have enabled complex cognitive capabilities, such as multi-step reasoning, planning, and tool use, that increasingly position these agents as human collaborators.
  • Effective collaboration, however, requires collaborators to continuously maintain and align mental models of their own reasoning,partners' intentions, and shared goals during the collaborative process.
  • To bridge this gap, we present ALMANAC, a dataset of Action-Level Mental model ANnotations for Agent Collaboration built from the Map Task, a classic dyadic routing task from social science.
  • ALMANAC contains 2,987 collaboration actions, each paired with theory-informed mental model annotations that record the participants' self-reasoning, perceived partner intent, and perceived team goal.
Paper AbstractExpand

Recent advances in LLM agents have enabled complex cognitive capabilities, such as multi-step reasoning, planning, and tool use, that increasingly position these agents as human collaborators. Effective collaboration, however, requires collaborators to continuously maintain and align mental models of their own reasoning,partners' intentions, and shared goals during the collaborative process. Today's agents rarely develop such capabilities since they are primarily optimized for task completion, and the community lacks authentic human collaboration data with action-level mental model annotations that could guide agents toward process-level collaborative competence. To bridge this gap, we present ALMANAC, a dataset of Action-Level Mental model ANnotations for Agent Collaboration built from the Map Task, a classic dyadic routing task from social science. ALMANAC contains 2,987 collaboration actions, each paired with theory-informed mental model annotations that record the participants' self-reasoning, perceived partner intent, and perceived team goal. We benchmark six LLMs on predicting humans' next-turn behavior and mental models. Our results demonstrate ALMANAC's utility in evaluating models' ability to simulate human collaborative behaviors and infer their underlying mental models.

Recent advances in Large Language Model (LLM) agents have enabled them to perform complex tasks like planning and tool use, yet they often struggle to act as true collaborators. While these agents are excellent at completing specific tasks, they frequently lack the ability to maintain the "mental models"—the internal understanding of self-reasoning, partner intentions, and shared goals—that are essential for effective human teamwork. To address this, researchers have introduced ALMANAC, a new dataset designed to help agents develop the cognitive competence required for genuine human-agent collaboration.

Understanding Collaborative Mental Models

Effective collaboration relies on more than just exchanging information; it requires partners to constantly align their understanding of the task. The ALMANAC dataset is built upon the "Map Task," a classic social science experiment where two people work together to reproduce a route on a map through text communication and drawing. By capturing 2,987 distinct collaboration actions, the researchers provide a window into the cognitive processes that occur during teamwork. Each action is paired with annotations that detail the participant’s self-reasoning, their perception of their partner’s intent, and their view of the shared team goal.

How the Data Was Collected

To ensure the data reflects authentic human thought processes, the researchers implemented a two-step annotation framework. During the Map Task, participants provided brief, real-time updates at specific intervals to capture their mental state as the task progressed. Immediately after finishing the task, participants reviewed their own actions and provided retrospective, detailed explanations for their decisions. This method allows the dataset to link observable behaviors—such as drawing a line or sending a message—to the underlying rationale that drove those actions.

Evaluating AI Performance

The researchers used ALMANAC to benchmark six state-of-the-art LLMs, testing their ability to predict both a human’s next move and their internal mental state. The results indicate that while mental model annotations provide valuable signals for predicting human behavior, current models still face significant challenges in accurately inferring the internal reasoning of their human partners. This suggests that while AI can learn to mimic surface-level collaborative patterns, there is still a gap in their ability to truly understand the cognitive dynamics of a collaborative partnership.

Implications for Future Agents

By providing a dataset that bridges the gap between task execution and cognitive alignment, ALMANAC offers a new way to evaluate and train AI agents. Instead of focusing solely on whether an agent can complete a task, this research encourages the development of agents that can actively maintain awareness of their partner’s needs and intentions. This shift is a critical step toward creating AI that functions as a supportive, intuitive collaborator rather than just a tool for executing instructions.

Comments (0)

No comments yet

Be the first to share your thoughts!