Carnegie Mellon and Facebook's new AI is TOO good at poker

More of the world than ever getting its poker fix down the fiber-optic lines. Covid has driven us indoors. With this, the old fear of bots has begun shaking off the dust of sleep again. Carnegie Mellon in conjunction with Facebook have just tested their newest all-purpose AI. Once again — to mix my gaming metaphors — the humans shanked it into the grass.

The new AI, called ReBeL, managed to beat their previous AI’s win rate at poker and did so with “far less domain knowledge than any prior poker AI.” That quote comes from their paper published on ArXiv last month.

“Domain knowledge” refers to topic-specific knowledge, in this case, knowledge about poker. This is as opposed to “general knowledge” in this case the AI’s knowledge of how to strategize for imperfect-information games.

In the paper, they also demonstrate that the AI can achieve Nash equilibrium in heads up matches.

The prelapsarian lapses

There was a time when no-limit poker was seen as an unpassable barrier to AI. Too many factors, too much unknown information, too many bet sizes. Each decision breeding with each subsequent round of betting.

But then again there was a time when the walls of Constantinople were considered unpassable. And in the end, all it took was the Ottoman Empire getting the hang of gunpowder.

In reality, by the time Facebook’s new killer app came along the saltpeter and sulfur that was going to render no-limit hold’em pregnable was already drying in the sun.

Carnegie Mellon’s Libratus AI made fools of four top pros back in 2017 to the tune of 14.8 big blinds per 100 hands. That was playing heads up, with a duplicate bridge style system to minimize variance, where the same hand was dealt on separate tables with the AI getting different sides of the hands at each table.

The researchers at Carnegie Mellon were feeling pretty smug. Back then Libratus managed 14.8 big blinds per hundred hands.

Then last year their Pluribus AI demonstrated that it could win in the vastly more complicated format of 6-max. This included matches of five copies of Pluribus versus players like Chris “Jesus” Ferguson – Ferguson is already famous for his almost automaton-like play style – and matches where Pluribus was the only non-human player of six. It won under both circumstances.

SPECIAL OFFERS

The next-gen ReBeLs

ReBeL is back to heads up for now because what it is attempting to do is rather more complex than previous AIs. In situations like chess or Go, where information is complete, ReBeL thinks very like AlphaGo does. In these cases, it uses a combination of Reinforcement Learning (RL) which is where the game looks at possible outcomes and tries to maximize its rewards. Depending on the game, these rewards are points, goals, money, lives saved, or lives ended.

But when information is concealed, as with an opponent's hand in poker, an entirely new algorithm kicks in. This uses what the researchers call Public Belief State (PBS). A PBS uses multiple AI models to work out what the various players in a game might believe based on what common knowledge is available. It uses this to model possible actions and choose its strategies.

The result is a more flexible AI, that can calculate strategies on the go, and beat Dong Kim for 16.9 big blinds per 100 hands. It needed less than five seconds per decision.

This outdoes Libratus, and makes all humans look bad.

“While AI algorithms already exist that can achieve superhuman performance in poker,” the team wrote. “These algorithms generally assume that participants have a certain number of chips or use certain bet sizes. Retraining the algorithms to account for arbitrary chip stacks or unanticipated bet sizes requires more computation than is feasible in real-time. However, ReBeL can compute a policy for arbitrary stack sizes and arbitrary bet sizes in seconds.”

An AI for all seasons

Flexibility lends itself to a wide variety of fields. Partly for this reason, Carnegie Mellon and Facebook have released the Liar’s Dice version. Not the poker version.

The other reason is: they are worried that players might use ReBeL to cheat at online poker. Aren’t we all. Liar's Dice is a much less tempting target.

But gaming AIs with this sort of flexibility have a ton of other application. Google’s AI may have publicly beaten the best humans at a board game when used in AlphaGo. But it was also used to manage data for the UK’s National Health Service and to increase electrical efficiency in Google’s server banks.

We’ll just have to wait and see. If Carnegie Mellon’s computer science department start showing up to work in Lamborgini’s, then we'll know they decided to use ReBeL for evil.