Game-theoretic reasoning requires best response in imperfect information games
[Edit: My core intuition here is correct, but I'm not expressing well how my approach to Nash differs from Nash computed by CFR. In fact, because of that, the argument is a bit messy and may sound like I'm mixing up concepts/terminology. I'll rewrite this essay soon.]
[Edit 2: My core intuition here may very well be wrong!]
Game-theoretic thinking around imperfect information games, especially attempts at solving them in computer science, revolve around Nash-equilibria and exploitability. Counterfactual Regret Minimization [1] converges both players towards Nash simultaneously. Libratus [2] used this idea to achieve superhuman play in poker. ReBeL [3] added neural networks and public belief states. The core approach is: Converge directly towards Nash, minimize exploitability, declare victory.
This differs from my understanding of how Nash equilibria are āsupposedā to emerge. Game theory is about the science of playing games optimally, and Nash equilibria donāt emerge because Iām trying to play a Nash strategy, but rather because I play optimally, and if my opponent does too, this results in a Nash equilibrium. [Edit: Here I need to define better what I mean with playing āoptimally.ā]
Applying game theory to poker, I estimate my opponentās range and strategy, calculate the EV of my hand (not range, by the way, contrary to popular belief), and take my highest EV action. To make assumptions about my opponentās range and strategy, I take my perceived range into account. If I believe that my opponent is capable of exploiting me, I play a Nash strategy. But Nash emerges out of best response, and not as a goal in itself. [Edit: This is the part I need to clarify. How do I "play a Nash strategy?" Where does it come from? How does that differ from a CFR computed Nash strategy?]
This is what current game-theoretic literature, at least in ML, is missing. Itās impressive to create AI that plays Nash directlyābut AI that emerge from this paradigm wonāt be AI that reason well outside of knowing to start a container and blindly solve an equation or run CFR without understanding the underlying game. (This is what happens when I prompt LLMs well and they compute equilibria.) For real game-theoretic reasoning, you need AI that understands best response in imperfect information games. This has been attempted [4], [5], but it hasnāt been done. Solving this may very well be the gap to ASI.
[Edit: Regarding the whole ābest responseā framing, I believe what I'm really looking for is AI that can understand any novel game well enough to reason up, from first principles, a minimax strategy that is equivalent to a Nash strategy.]
[1] Zinkevich, M., Johanson, M., Bowling, M., & Piccione, C. (2007). Regret Minimization in Games with Incomplete Information. NIPS 2007.
[2] Brown, N. & Sandholm, T. (2018). Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374), 418ā424.
[3] Brown, N., Bakhtin, A., Lerer, A., & Gong, Q. (2020). Combining Deep Reinforcement Learning and Search for Imperfect-Information Games. NeurIPS 2020.
[4] Heinrich, J., Lanctot, M., & Silver, D. (2015). Fictitious Self-Play in Extensive-Form Games. ICML 2015.
[5] Heinrich, J. & Silver, D. (2016). Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. arXiv:1603.01121.