Game-theoretic reasoning requires best response in imperfect information games

09 Feb, 2026

[Edit: My core intuition here is correct, but I'm not expressing well how my approach to Nash differs from Nash computed by CFR. In fact, because of that, the argument is a bit messy and may sound like I'm mixing up concepts/terminology. I'll rewrite this essay soon.]

[Edit 2: My core intuition here may very well be wrong!]

Game-theoretic thinking around imperfect information games, especially attempts at solving them in computer science, revolve around Nash-equilibria and exploitability. Counterfactual Regret Minimization [1] converges both players towards Nash simultaneously. Libratus [2] used this idea to achieve superhuman play in poker. ReBeL [3] added neural networks and public belief states. The core approach is: Converge directly towards Nash, minimize exploitability, declare victory.

This differs from my understanding of how Nash equilibria are “supposed” to emerge. Game theory is about the science of playing games optimally, and Nash equilibria don’t emerge because I’m trying to play a Nash strategy, but rather because I play optimally, and if my opponent does too, this results in a Nash equilibrium. [Edit: Here I need to define better what I mean with playing “optimally.”]

Applying game theory to poker, I estimate my opponent’s range and strategy, calculate the EV of my hand (not range, by the way, contrary to popular belief), and take my highest EV action. To make assumptions about my opponent’s range and strategy, I take my perceived range into account. If I believe that my opponent is capable of exploiting me, I play a Nash strategy. But Nash emerges out of best response, and not as a goal in itself. [Edit: This is the part I need to clarify. How do I "play a Nash strategy?" Where does it come from? How does that differ from a CFR computed Nash strategy?]

This is what current game-theoretic literature, at least in ML, is missing. It’s impressive to create AI that plays Nash directly—but AI that emerge from this paradigm won’t be AI that reason well outside of knowing to start a container and blindly solve an equation or run CFR without understanding the underlying game. (This is what happens when I prompt LLMs well and they compute equilibria.) For real game-theoretic reasoning, you need AI that understands best response in imperfect information games. This has been attempted [4], [5], but it hasn’t been done. Solving this may very well be the gap to ASI.

[Edit: Regarding the whole “best response” framing, I believe what I'm really looking for is AI that can understand any novel game well enough to reason up, from first principles, a minimax strategy that is equivalent to a Nash strategy.]

[1] Zinkevich, M., Johanson, M., Bowling, M., & Piccione, C. (2007). Regret Minimization in Games with Incomplete Information. NIPS 2007.

[2] Brown, N. & Sandholm, T. (2018). Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374), 418–424.

[3] Brown, N., Bakhtin, A., Lerer, A., & Gong, Q. (2020). Combining Deep Reinforcement Learning and Search for Imperfect-Information Games. NeurIPS 2020.

[4] Heinrich, J., Lanctot, M., & Silver, D. (2015). Fictitious Self-Play in Extensive-Form Games. ICML 2015.

[5] Heinrich, J. & Silver, D. (2016). Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. arXiv:1603.01121.