Back to AI Research

AI Research

PokerSkill: LLMs Can Play Expert-Level Poker withou... | AI Research

Key Takeaways

  • PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers Poker is a notoriously difficult challenge for artificial intelligence, traditionall...
  • Poker is a landmark challenge for artificial intelligence.
  • The dominant approach relies on equilibrium solvers built on counterfactual regret minimization, requiring millions of core-hours of training.
  • Large Language Models (LLMs) possess extensive poker knowledge but perform far below solver-based agents when asked to play directly.
  • Traditional rule-based poker agents are interpretable and training-free, but their strategic ceiling remains far below equilibrium play.
Paper AbstractExpand

Poker is a landmark challenge for artificial intelligence. The dominant approach relies on equilibrium solvers built on counterfactual regret minimization, requiring millions of core-hours of training. Large Language Models (LLMs) possess extensive poker knowledge but perform far below solver-based agents when asked to play directly. Traditional rule-based poker agents are interpretable and training-free, but their strategic ceiling remains far below equilibrium play. We introduce \textbf{PokerSkill}, a training-free and solver-free framework that bridges this gap by using detailed rule-based poker skills as a structured action-grounding interface for LLMs. A deterministic context engine analyzes the current state and retrieves only the relevant fragments from a layered skill library, which is entirely designed by human poker experts, constraining the LLM's choice to reasonable actions. Against GTOWizard, a state-of-the-art GTO benchmark, GPT-5.5 XHigh with PokerSkill achieves $-57 \pm 21$ mbb/hand, Claude Opus 4.6 achieves $-80 \pm 29$ mbb/hand and Claude Opus 4.7 achieves $-87\pm 64$ mbb/hand, reducing losses by 49--61\% compared to default-prompt baselines and outperforming the strong bot Slumbot. Our key finding is that rule-based skills alone do not constitute a strong strategy, and LLMs alone cannot play well, but their combination yields an agent that requires neither training nor solver access yet competes with systems built on millions of core-hours of computation. To our knowledge, this is the first demonstration of an LLM achieving competitive performance in a complex imperfect-information game without game-specific training or solver queries. Code is available at this https URL .

PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers
Poker is a notoriously difficult challenge for artificial intelligence, traditionally requiring massive computational power—often millions of core-hours—to train agents that can compete at a high level. While Large Language Models (LLMs) possess a vast amount of general poker knowledge, they typically struggle to apply this information effectively during a live game, often failing to make logical decisions under pressure. This paper introduces PokerSkill, a new framework that enables LLMs to play competitive, expert-level poker without the need for expensive training, game-tree traversal, or external solver queries. By acting as a structured interface between the LLM and the game, PokerSkill allows the model to leverage its existing knowledge while adhering to expert-designed strategic constraints.

Bridging the Gap with Structured Guidance

The core problem identified by the researchers is the "decision-binding problem." Even when an LLM understands complex poker concepts like pot odds or polarized ranges, it often fails to select the right concept for a specific game state. PokerSkill solves this by using a deterministic "context engine" that analyzes the current game situation—such as board texture, hand strength, and betting history—and retrieves only the most relevant expert-designed strategic fragments from a layered library. This prevents the model from becoming overwhelmed by irrelevant information and ensures it stays within the bounds of strategically sound play.

How the Framework Works

PokerSkill functions as a cognitive scaffold for the LLM. At every decision point, the framework performs three key steps: 1. Context Analysis: The system automatically labels the current state, including the stack-to-pot ratio, position, and action history. 2. Selective Retrieval: Instead of providing the entire strategy, the system injects only the specific, expert-authored guidelines relevant to the current scenario (e.g., how to play a specific hand on a "wet" board). 3. Bounded Decision-Making: The system uses an "attack/defense budget" to filter the available actions, ensuring the LLM only chooses from options that are strategically viable.
This approach mimics how human experts think: they do not recalculate game theory from scratch but instead recognize patterns and apply established principles to a limited set of reasonable actions.

Competitive Performance

The researchers tested PokerSkill against GTOWizard, a state-of-the-art benchmark that has previously outperformed strong bots like Slumbot. The results showed that PokerSkill significantly improved the performance of frontier LLMs, reducing their losses by 49–61% compared to default prompting. Models like GPT-5.5 XHigh and Claude Opus achieved performance levels that compete with systems built on massive computational resources. This demonstrates that by providing the right structure, LLMs can reach a high level of play in complex, imperfect-information games without needing to be trained specifically for the task.

Key Takeaways

The success of PokerSkill highlights that the primary barrier to high-level AI performance in poker is not a lack of knowledge, but the difficulty of applying that knowledge in real-time. By combining the contextual reasoning of LLMs with a deterministic, rule-based "skill library," the authors have created an agent that is both interpretable and highly effective. Because the framework does not require offline learning or solver access, it is easily reproducible and can be improved as base LLM technology continues to advance.

Comments (0)

No comments yet

Be the first to share your thoughts!