Back to AI Research

AI Research

Agentic System as Compressor: Quantifying System In... | AI Research

Key Takeaways

  • Agentic System as Compressor: Quantifying System Intelligence in Bits This paper introduces a new way to measure the intelligence of AI agentic systems by ap...
  • Agentic System as Compressor: Quantifying System Intelligence in Bits
  • ## Measuring Intelligence Through Compression
  • The core idea is that if a system is truly intelligent, it should be able to use its tools, retrieval capabilities, and environment feedback to "compress" the information needed to solve a task.
  • In this framework, the system does not need to transmit the entire solution from scratch.
Paper AbstractExpand

Large language models are turning from isolated predictors into agentic systems: they call tools, retrieve evidence, obey environment constraints, use verifiers, and complete tasks through search and multi-turn interaction. We adopts an analytical viewpoint based on "compression is intelligence": under a fixed task distribution, interface, and compute budget, a stronger agentic system lets a target object be reconstructed with fewer bits. We operationalize the measure with arithmetic coding, seed coding, and a fallback, and evaluate it in five settings: reversed text, chess moves, protein sequences, retrieval-augmented question answering, and semantic story compression; in all of them agentic components reduce codelength. These small, controlled experiments cover component types typical of real agentic systems, show that codelength can analyze how components, observers, and budgets change residual uncertainty, and offer guidance for evaluating real agent systems.

Agentic System as Compressor: Quantifying System Intelligence in Bits
This paper introduces a new way to measure the intelligence of AI agentic systems by applying the principle that "compression is intelligence." While traditional benchmarks often focus on success rates, this research proposes that a more capable system is one that can reconstruct a target task using fewer bits of information. By treating tools, environment constraints, and search processes as shared resources between an encoder and a decoder, the authors provide a unified framework to quantify how much each component contributes to a system's overall performance.

Measuring Intelligence Through Compression

The core idea is that if a system is truly intelligent, it should be able to use its tools, retrieval capabilities, and environment feedback to "compress" the information needed to solve a task. In this framework, the system does not need to transmit the entire solution from scratch. Instead, it sends a compact "hint" or code that allows a decoder—which shares the same tools and environment—to reconstruct the correct output. The fewer bits required to complete this reconstruction, the more "intelligent" the system is considered to be, as it has effectively offloaded the complexity of the task into its own internal structure and environment interactions.

How the Protocol Works

To turn this theory into a practical measurement tool, the authors developed a three-part protocol:

  • Arithmetic Coding: Used for exact, token-by-token reconstruction, measuring the system’s raw predictive ability.

  • Seed Coding: Used in environments where multiple outputs might be acceptable. The system transmits the index of a successful random seed, which the decoder then replays to arrive at a valid result.

  • Fallback: If the system cannot find a solution within its sampling budget, it automatically switches to arithmetic coding to ensure the task is still completed.
    By comparing the average codelength of a system with and without a specific component (such as a retriever or a verifier), the researchers can calculate the "marginal bit value" of that component, effectively assigning a quantitative score to its contribution.

Validating the Framework

The authors tested this approach across five distinct settings: reversed text, chess moves, protein sequences, retrieval-augmented question answering, and semantic story compression. These experiments demonstrated that adding agentic components—like rule-based constraints or verifier feedback—consistently reduced the number of bits required to solve the tasks. The results confirm that this method can successfully isolate the value of different system parts, such as how much a retriever helps or how a tighter observation standard changes the difficulty of a task.

Key Considerations

It is important to note that this research is intended as a mechanistic exploration rather than a final benchmark for large-scale deployments. The authors highlight that there is a clear trade-off between the compute budget invested and the resulting compression ability. Furthermore, they emphasize that the "intelligence" of a system is highly dependent on the observation standard used; if the criteria for success change, the codelength will change accordingly. This framework serves as a guide for developers to analyze where their system's capabilities actually come from, moving beyond simple pass/fail metrics to understand the efficiency of the entire agentic workflow.

Comments (0)

No comments yet

Be the first to share your thoughts!