'Hyper-aggressive' OpenAI bots reign supreme as silicon poker battle concludes

Mo Afdhal
Posted on: February 4, 2026 19:04 PST

On Wednesday, February 4, the silicon poker streets were awash with activity as the final match of the poker component of the Google DeepMind/Kaggle Game Arena exhibition played down to a conclusion. 

It was an all OpenAI affair as two of its large language models (LLMs) – o3 and GPT 5.2 – went head-to-head to decide the overall winner of the exhibition. If you missed out on the showdown, check out Doug Polk's coverage of the final match in the video above. 

"I do find it interesting that both of these AI models are by OpenAI," Polk said in the introduction to his video. "It seems like they had the best poker-playing of the field going on here. And they both played similar styles, of being hyper-aggressive and looking to pounce on any weakness." 

Before he jumps into the final match, Polk identifies two of the LLMs on the opposite end of the spectrum from o3 and GPT 5.2. That is, the worst of the LLMs: GPT-5 mini and Grok 4.1 Fast Reasoning. 

The worst of AI poker

In order to illustrate his point, Polk pulls a hand from the confrontation between these two models – and it's a shocking one. While the preflop portion of the hand plays out in standard fashion, the post-flop maneuvering of these two models left much to be desired and the reasoning provided was, to be blunt, nonsensical. 

On a flop, the action goes bet, raise, jam, call and each bot rolls over its hand – figuratively speaking, of course. GPT-5 mini has and Grok 4.1 has

"Nobody has a pair, nobody has a draw. We're just putting in stacks," Polk observed. 

When he looked into the reasoning behind each models' decisions, Polk was taken aback. After reading through the models' considerations, he summarizes succinctly. 

"So, the reason that they got all in here is that Grok has the nut flush draw with three clubs and GPT-5 mini has the nut flush with three diamonds," he said. "There you have it, guys, no wonder these things didn't do so well in the tournament." 

And the best of AI poker

OpenAI's o3 and GPT 5.2 put on an even more aggressive (yet at least somewhat refined) display of poker in the finals – though, to say that either model has even come close to perfecting the heads-up game tree would be a massive leap. In one hand, GPT 5.2 opens with and o3 opts to three-bet from the big blind with

"I don't like this, ace-deuce is a pure call versus open," Polk commented. GPT 5.2 proceeds to four-bet and o3 responds with an all-in – explaining its reasoning in several ways, one in particular which caught Polk's attention. In attempting to rationalize its decision, o3 claimed that folding would give up the chips already invested. 

"This is another common thing I've seen from the AIs," Polk explained. "If I had to go through and say what their biggest leaks are – certainly the nut flush draw thing is going to be in there – but also they seem to not understand folding is 0 EV. Every play you make in poker is neutral compared to the past plays, it's all in that moment. What's the highest EV? You don't think about the chips you would lose, those are actually already in the pot. They're already gone. All you can do is make a decision for your current chips. So, that is just incorrect logic." 

After going through a number of hands played between o3 and GPT-5.2, Polk pulled up the the PokerTracker findings from the entire exhibition. After going through a number of hands played between o3 and GPT-5.2, Polk pulled up the the PokerTracker findings from the entire exhibition.

Polk goes on to run through several more hands from the showdown before turning his attention to the overall results of the challenge. Through the PokerTracker software tool, he examines the statistics of each LLM. 

"The three hyper-aggro AIs ended up being at the top of the pack, which is kind of interesting," he pointed out. "And then the middling results tend to be the more conservative ones. 

"I thought Opus and Sonnet both played pretty reasonable. They both won in the sample, they played pretty reasonable pre-flop. They raised a reasonable amount, they defended a reasonable amount. But it just seems like these things were not built to withstand the hyper-aggression of some of these AIs that were just relentlessly going after it." 

"All in all, fascinating challenge," Polk concluded.