WX Research Blog

I'm moving from game-theoretic reasoning to AI safety

A couple days ago I was watching my non computer scientist wife train a machine learning model on her astrophysics work, and I felt the acceleration. I think about AI, its rate of development, and its societal consequences a lot. But for the first time I felt visceral fear. AGI could happen. And it could be dangerous.

Dario and Demis had a chat at the World Economic Forum a few days ago, and it seems that they're quite confident about AGI coming within the next five years. Of course, they are incentivized to say so, and they may be biased, but they are two of the people that have the most clear view on capabilities and timelines on the planet. They seem like they value truth and rigor, and so I believe that when they're both confident about AGI coming soon, it has weight.

Over the last couple of days I was researching game-theoretic capability in LLMs. My initial sense was that it was lacking. I thought that I had some secret game-theoretic understanding that was not yet deployed in model training. I conceived of research directions and experiments. And ultimately I came to the conclusion that game-theoretic reasoning, even in the way that I think of it—not mere algorithmic convergence towards Nash but one based on genuine understanding of the strategic dynamics of a given game—is downstream from math reasoning. As in, if math reasoning improves, game-theoretic reasoning improves. I still think that a well designed game-theoretic benchmark could be valuable, but I no longer see it as this urgent research that I must uniquely contribute to, that would greatly advance model capability. I no longer think my contributions here would be all that world changing. I could perhaps give models a temporal boost in post-training, but that would not unlock new capability, only latent one. "Temporal" because sooner or later, scaling will solve it.

In other words, I no longer believe that game-theoretic reasoning stands in the way towards AGI.

I've been speaking to my old Professor whom I did my Bachelor's thesis with—in game theory—and we've discussed the idea of doing a PhD together. His expertise lies in behavioral game theory, especially around deception and institutional design. It's funny how the stars align sometimes. To spell it out: Our intersection is exactly at AI safety.

My skepticism towards model capabilities and their trajectory has led me to be in awe of them instead. I believe AGI is coming. And last time I checked (today), we don't have enough people working on AI safety. It's funny because when I returned back to Germany in 2020 after playing high-stakes poker, the reason I went back to uni to study computer science was because I thought that AI safety is crucial, and I needed to contribute. Here I am.

We watched Jiro Dreams of Sushi two days ago. It's about this three-star michelin chef in Japan, who's just been doing the same thing almost every day for 80 years: sushi. He just committed to it and never looked back. That feels weird given that it's sushi, but it also feels oddly inspiring.

Since quitting poker, and then quitting poker again (I founded GV [we solved game-theoretic equilibria in poker] mainly because I wanted to gain entrepreneurial experience. But then it was successful, and I realized I was still in poker…), I've been looking for the thing. I think I'm ready to commit to AI safety and not look back.