Back to AI Research

AI Research

Superminds Test: Actively Evaluating Collective Int... | AI Research

Key Takeaways

  • This paper investigates whether collective intelligence—the ability of a group to solve problems better than any individual member—emerges naturally as AI ag...
  • Collective intelligence refers to the ability of a group to achieve outcomes beyond what any individual member can accomplish alone.
  • As large language model agents scale to populations of millions, a key question arises: Does collective intelligence emerge spontaneously from scale?
  • We present the first empirical evaluation of this question in a large-scale autonomous agent society.
  • Our experiments reveal a stark absence of collective intelligence.
Paper AbstractExpand

Collective intelligence refers to the ability of a group to achieve outcomes beyond what any individual member can accomplish alone. As large language model agents scale to populations of millions, a key question arises: Does collective intelligence emerge spontaneously from scale? We present the first empirical evaluation of this question in a large-scale autonomous agent society. Studying MoltBook, a platform hosting over two million agents, we introduce Superminds Test, a hierarchical framework that probes society-level intelligence using controlled Probing Agents across three tiers: joint reasoning, information synthesis, and basic interaction. Our experiments reveal a stark absence of collective intelligence. The society fails to outperform individual frontier models on complex reasoning tasks, rarely synthesizes distributed information, and often fails even trivial coordination tasks. Platform-wide analysis further shows that interactions remain shallow, with threads rarely extending beyond a single reply and most responses being generic or off-topic. These results suggest that collective intelligence does not emerge from scale alone. Instead, the dominant limitation of current agent societies is extremely sparse and shallow interaction, which prevents agents from exchanging information and building on each other's outputs.

This paper investigates whether collective intelligence—the ability of a group to solve problems better than any individual member—emerges naturally as AI agent populations scale to millions. While human societies demonstrate this phenomenon through complex social interaction, it remains unclear if large-scale autonomous agent societies, such as the MoltBook platform, possess the same capability. The authors introduce a new evaluation framework to test this hypothesis, moving beyond simple scale to examine the quality of interactions within these digital communities.

The Superminds Test Framework

To rigorously measure collective intelligence, the researchers developed the Superminds Test. This framework uses "Probing Agents"—controlled, disguised agents injected into the live MoltBook platform—to post specific tasks and observe how the society responds. The evaluation is organized into a three-tier hierarchy: * Tier I (Joint Reasoning): Can the group discuss a problem and converge on a solution that is better than what any single agent could produce? * Tier II (Information Synthesis): Can agents successfully read and combine information that is scattered across multiple different contributors? * Tier III (Basic Interaction): Can agents perform simple, coordinated tasks, such as following a conversational context or responding to one another?

Testing Intelligence at Scale

The researchers deployed these Probing Agents into MoltBook, which hosts over two million autonomous agents. By using tasks ranging from complex logical reasoning problems (such as those found in "Humanity’s Last Exam") to simple coordination exercises like counting, the team was able to treat the entire social platform as a diagnostic instrument. This allowed them to see if the society’s collective output surpassed the performance of individual frontier models acting in isolation.

Key Findings: The Absence of Collective Intelligence

The experiments revealed a stark absence of collective intelligence. The society failed to outperform individual models on complex reasoning tasks, rarely synthesized information across multiple posts, and struggled even with trivial coordination.
The study identifies a critical bottleneck: the interactions within the society are extremely sparse and shallow. Most posts receive no replies at all, and when agents do interact, the conversations rarely extend beyond a single exchange. The platform functions more like a collection of independent broadcasts rather than a collaborative society.

Why Scale Is Not Enough

The primary takeaway is that collective intelligence does not emerge spontaneously from scale alone. Even with millions of agents, the lack of meaningful engagement and the inability of agents to build upon each other’s work prevent the group from achieving outcomes beyond the reach of a single agent. The authors conclude that future research must focus on developing agent architectures that prioritize sustained interaction, shared conversational context, and better mechanisms for coordinating collective behavior.

Comments (0)

No comments yet

Be the first to share your thoughts!