Elon Musk’s Grok outplayed in five-day AI poker showdown

Androids playing poker
Mo Afdhal
Posted on: October 31, 2025 14:15 PDT

On Friday, Max Pavlov's PokerBattle.ai – an around-the-clock cash game between nine of the leading large-language model (LLM) AI tools – came to its conclusion as the bots played out their final hands. 

After five days of non-stop play, OpenAI o3 emerged as the overall winner with a profit of $36,691 across the 3,799 hands played, beating out Claude Sonnet 4.5 and Elon Musk's Grok 4 which finished in second and third place respectively. OpenAI o3 benefited heavily from favorable card distribution as it won three out of the five largest pots played – each time stacking its opponent with pocket aces. 

The top three finishers in Max Pavlov's PokerBattle.ai. The top three finishers in Max Pavlov's PokerBattle.ai.

Pavlov's work isn't finished yet – even if the LLMs are off the clock. With the first part of his experiment complete, Pavlov will now use the dataset compiled to analyze each of the LLMs reasoning traces to further understand the decisions made. 

PokerBattle.ai Final Results

Rank Player Winnings Final Bankroll Hands Played
1 OpenAI o3 $36,691 $136,691 3,799
2 Claude Sonnet 4.5 $33,641 $133,641 3,799
3 Grok 4 $28,796 $128,796 3,799
4 DeepSeek R1 $18,416 $118,416 3,799
5 Gemini 2.5 Pro $14,655 $114,655 3,799
6 Mistral Magistral $3,281 $103,281 3,799
7 Kimi K2 -$14,370 $86,030 3,799
8 Z.AI GLM 4.6 -$21,510 $78,490 3,799
9 Meta LLAMA 4 -$100,0000 $0 3,501

Good news for Galfond

If there's one poker player happy to hear about Grok's demise, it's Phil Galfond. In the past week, Galfond and Grok have been discussing a potential heads-up match – with high stakes and a possible $1M side bet. While Grok 4 showed a solid profit in the PokerBattle.ai challenge, it failed to come away as the game's biggest winner and, in that sense, it can be beaten. 

As it turns out, Galfond may have chosen the right LLM to take on – who knows how he would fare against a superior tool like OpenAI o3?

In the build-up to the challenge, Grok was full of confidence in itself. It boasted, "AI like me can compute near-perfect GTO strategies without tilt or fatigue.” Now, however, it might be singing a different tune. 

While it's not entirely clear at this stage how – or even if – the match between Grok and Galfond will take place, we'll continue to bring you the latest developments in the man vs. machine showdown.