Back to AI Research

AI Research

Strat-Reasoner: Reinforcing Strategic Reasoning of... | AI Research

Key Takeaways

  • Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games Large Language Models (LLMs) are highly capable at logical tasks, but they often...
  • While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents.
  • In multi-agent games, the non-stationarity of other agents brings significant challenges on the evaluation of the reasoning process and the credit assignment over multiple reasoning steps.
  • Existing single-agent reinforcement learning (RL) approaches and their multi-agent extensions fail to address these challenges as they do not incorporate other agents in the reasoning process.
  • In this work, we propose Strat-Reasoner, a novel RL-based framework that improves LLMs' strategic reasoning ability in multi-agent games.
Paper AbstractExpand

While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other agents brings significant challenges on the evaluation of the reasoning process and the credit assignment over multiple reasoning steps. Existing single-agent reinforcement learning (RL) approaches and their multi-agent extensions fail to address these challenges as they do not incorporate other agents in the reasoning process. In this work, we propose Strat-Reasoner, a novel RL-based framework that improves LLMs' strategic reasoning ability in multi-agent games. We introduce a novel recursive reasoning paradigm where an agent's reasoning also integrates other agents' reasoning processes. To provide effective reward signals for the intermediate reasoning sequences, we employ a centralized Chain-of-Thought (CoT) comparison module to evaluate the reasoning quality. Finally, we compute an accurate hybrid advantage and develop a group-relative RL approach to optimize the LLM policy. Experimental results show that Strat-Reasoner substantially improves strategic abilities of underlying LLMs, achieving 22.1\% average performance improvements across various multi-agent games.

Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games
Large Language Models (LLMs) are highly capable at logical tasks, but they often struggle in multi-agent games like poker or diplomatic negotiations. In these environments, success depends on how well an agent can anticipate and react to the strategies of others. Because other agents are constantly changing their tactics, traditional single-agent training methods fail to provide the necessary guidance for complex, multi-turn strategic reasoning. Strat-Reasoner is a new reinforcement learning framework designed to solve this by teaching LLMs to "think about what others think," leading to more effective and human-like strategic decision-making.

Recursive Reasoning

The core of the framework is a "Recursive Reasoning" module. Instead of acting in isolation, the agent is trained to follow a structured, multi-step thought process that mirrors the alternating nature of these games. At each turn, the agent is prompted to analyze the opponent’s past intent, predict how the opponent perceives the agent’s current move, formulate its own strategy, and finally predict the opponent’s next move. This "Past-Present-Future" loop ensures that the model’s reasoning is deeply integrated with the game's dynamics rather than being a generic response.

Centralized CoT Comparison

To guide the model, Strat-Reasoner uses a "Centralized Chain-of-Thought (CoT) Comparison" module. During training, the system treats the reasoning processes of both agents as global information. It evaluates the ego agent’s performance by checking how well its internal beliefs align with the opponent’s actual thoughts and actions. By comparing the agent’s predictions against the ground truth of the opponent’s behavior, the framework provides fine-grained, turn-by-turn feedback that is much more informative than simply waiting for a win or loss at the end of the game.

Hybrid Advantage Estimation

One of the biggest challenges in multi-agent reinforcement learning is the high level of uncertainty and the difficulty of assigning credit for a specific action. Strat-Reasoner addresses this by using a "Hybrid Advantage" approach. It combines the immediate, dense feedback from the CoT comparison with the long-term, outcome-based rewards of the game. By using "micro-rollouts"—where the model generates multiple potential reasoning paths in parallel—the framework creates a stable, low-variance baseline that helps the model learn which reasoning steps actually lead to better strategic outcomes.

Performance and Impact

Experimental results demonstrate that Strat-Reasoner significantly enhances the strategic capabilities of LLMs. By moving beyond simple outcome-based learning and incorporating explicit opponent modeling, the framework achieved an average performance improvement of 22.1% across various competitive and cooperative multi-agent games. This suggests that teaching models to explicitly model the cognitive states of others is a powerful way to improve their performance in complex, real-world strategic environments.

Comments (0)

No comments yet

Be the first to share your thoughts!