Why Anthropic’s Claude still hasn’t beaten Pokémon

May Be Interested In:Streaming in Canada on Crave, Disney+, Netflix and Prime Video [April 21-27]


One of the biggest things preventing the current version of Claude from getting better, Hershey said, is that “when it derives that good strategy, I don’t think it necessarily has the self-awareness to know that one strategy [it] came up with is better than another.” And that’s not a trivial problem to solve.

Still, Hershey said he sees “low-hanging fruit” for improving Claude’s Pokémon play by improving the model’s understanding of Game Boy screenshots. “I think there’s a chance it could beat the game if it had a perfect sense of what’s on the screen,” Hershey said, saying that such a model would probably perform “a little bit short of human.”

Expanding the context window for future Claude models will also probably allow those models to “reason over longer time frames and handle things more coherently over a long period of time,” Hershey said. Future models will improve by getting “a little bit better at remembering, keeping track of a coherent set of what it needs to try to make progress,” he added.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon.


Credit:

Claude Plays Pokemon / Twitch

Whatever you think about impending improvements in AI models, though, Claude’s current performance at Pokémon doesn’t make it seem like it’s poised to usher in an explosion of human-level, completely generalizable artificial intelligence. And Hershey allows that watching Claude 3.7 Sonnet get stuck on Mt. Moon for 80 hours or so can make it “seem like a model that doesn’t know what it’s doing.”

But Hershey is still impressed at the way that Claude’s new reasoning model will occasionally show some glimmer of awareness and “kind of tell that it doesn’t know what it’s doing and know that it needs to be doing something different. And the difference between ‘can’t do it at all’ and ‘can kind of do it’ is a pretty big one for these AI things for me,” he continued. “You know, when something can kind of do something it typically means we’re pretty close to getting it to be able to do something really, really well.”



share Share facebook pinterest whatsapp x print

Similar Content

Aston Villa v Chelsea: Premier League – live
Aston Villa v Chelsea: Premier League – live
5 heart-stopping cartoons about Valentine's Day
5 heart-stopping cartoons about Valentine’s Day
EA CEO says a "more meaningful update" of Apex Legends is in development
EA CEO says a “more meaningful update” of Apex Legends is in development
Pat Cummins Scripts History, Becomes Seventh Australian To Achieve This HUGE Feat
Pat Cummins Scripts History, Becomes Seventh Australian To Achieve This HUGE Feat
IIMA unveils a 2-phase development plan for its Dubai campus
IIMA unveils a 2-phase development plan for its Dubai campus
When Kiccha Sudeep expressed his desire to work with his 'favourite actress' Kajol after spat with Ajay Devgn on X: 'I can’t have her husband hate me' | - The Times of India
When Kiccha Sudeep expressed his desire to work with his ‘favourite actress’ Kajol after spat with Ajay Devgn on X: ‘I can’t have her husband hate me’ | – The Times of India
Changing Perspectives: A New Take on Global Events | © 2025 | Daily News