As 2025 turns into 2026, we’re talking to some of the biggest names in the poker world, and reflecting on the year’s most interesting stories and events.
In October it was announced that various Large Language Model AIs would contest a week-long online poker game. We asked Octopi Poker to analyze the results.
“I’ve been studying the game,” Max Pavlov, a keen recreational poker player with a talent for IT, told PokerOrg in October, “I thought about how LLMs could help me… I couldn’t really find any research on which LLM would be the best one for my needs, so I decided to make a tournament to figure out the answer to that question.”
Pavlov set up the game — all for play money — and assembled ‘players’ powered by the biggest consumer LLM AI software available:
- OpenAI
- Anthropic (Claude)
- X.com (Grok)
- DeepSeek AI
- Google (Gemini)
- Mistal AI (Magistral)
- Moonshot AI (Kimi)
- Z.ai (GLM)
- Meta (LLAMA)
Each player would start with a $100,000 bankroll, playing $10/$20 no-limit hold’em at four tables, simultaneously. Crucially, they would articulate the reasoning behind every decision they made, with all information available to the public in real time at PokerBattle.ai.
When the dust had settled we arranged for Octopi Poker and its CEO, Victoria Livschitz, to access the data, crunch the numbers and tell us once and for all which of the LLMs was the nuts when it came to poker, and which of them were just nuts.
Which AI was best, and why?
Firstly, the results. Look at the list above, and you’ll see the order that they finished in.
OpenAI was the most successful of the players, winning $36,691 over the course of the game, which ran to a total of 3,799 hands. The worst performing player was Meta’s LLAMA, which actually went broke after 3,501 hands.
Here’s a brief overview of how each bot played.
OpenAI
The winner had an impressive preflop game, playing and opening pots at a rate similar to game-theory-optimal play. It was too aggressive, however, 3-betting more and folding less than it should. It was also seeing 33% fewer showdowns than it should have, according to GTO.
Claude and Grok
The two runners-up played similar games to one another from a strategy point of view, and also saw fewer flops than GTO would dictate. When it came to continuation bets, they would each c-bet more aggressively, and not fold to c-bets enough.
DeepSeek
This Chinese AI model was tighter and more conservative across all metrics than its rivals. It opened with a preflop raise 50% less often than its opponents, and folded 60% less to 3-bets than GTO guidelines would suggest.
Gemini
Google’s Gemini AI was the most loose-aggressive player at the tables, played a lot of pots, 3-bet very frequently and had trouble folding to c-bets.
Magistral
The French LLM from Mistal AI was the tightest player preflop, but the most aggressive player postflop, c-betting 88% of the time and only folding to a c-bet 16%.
Kimi and GLM
Both these players lost money in the game, and proved to be too easy to push off the pot. Kimi only reached showdown 12% of the time, while GLM played a lot of pots but would overfold to 3-bets. Loose-passive play rarely wins the day.
LLAMA
Meta’s AI was the big loser in the game. Too loose, too aggressive, and too sticky at every point. It played too many hands, too aggressively, and didn’t seem to know how to fold. In fact, it played so badly that the other LLMs began to anticipate and adjust to its terrible play.
Livschitz’s detailed analysis — linked below — also pointed out some interesting characteristics that the AI models shared.
- They all try to play exploitative poker, but rarely with enough data to make informed decisions, and sometimes using the wrong data points.
- They’re bad with ranges, failing to identify parts of their opponents’ ranges and thinking of their own specific hands rather than their more general ranges.
- They’re easily confused, getting important details wrong in their reasoning such as their own position, card suits, or even winning hands.
- They’re all too aggressive and loose, looking to stack their opponents at every opportunity without considering the dangers.
- They’re awful at bluffing, will often c-bet but then fail to follow through on later streets.
Are the LLMs coming for your stack? Not yet, they’re not.
For more detail, including a hand history analysis of three hands the bots played together, check out the full article below.