Google's most costly AI model appears to have achieved a significant milestone: beating a 29-year-old video game.
Last night, Google CEO Sundar Pichai wrote gleefully on X, "What a finish! "Gemini 2.5 Pro just finished Pokémon Blue!"
To be clear, the Gemini Plays Pokemon livestream was developed by (in his own words) "a 30 year old software engineer unaffiliated with Google" known as Joel Z. However, Google leaders have encouraged the endeavor.
Logan Kilpatrick, the product lead for Google AI Studio, posted last month that Gemini was "making great progress at completing Pokémon" and had "earned its 5th badge (next best model only has 3 so far, though with a different agent harness)," prompting Pichai to joke, "We are working on API, Artificial Pokémon Intelligence:)"
Why Pokemon? In February, Anthropic emphasized the improvement that its Claude AI models were making in "Pokémon Red," stating that Claude's "extended thinking and agent training" gave it "a major boost" on "more unexpected" tasks, such as playing a classic game. ("Pokémon Red" and "Blue" are separate versions of a GameBoy title launched in 1996 that is linked to the long-running Pokémon series. There’s even a Claude Plays Pokemon Twitch channel that Joel Z highlighted as an inspiration.
Despite his progress, Claude does not appear to have completed "Pokémon Red". Does this imply that Gemini is objectively superior at the game? On his Twitch feed, Joel Z cautioned viewers not to consider this a standard for how well an LLM can play Pokemon. You can't draw direct comparisons since Gemini and Claude use different tools and acquire different information.
And both AI models require assistance to play the game—that's where the aforementioned agent harnesses come in, providing the models with game screenshots overlaid with additional information, allowing the model to decide how to respond (which may include calling specialized agents), and then pressing the button that corresponds to the AI's instruction.
Joel Z said that there were additional "dev interventions" to assist Gemini finish the game, but argued that it was not cheating.
"My interventions improve Gemini's overall decision-making and reasoning abilities," he claims. "I don't provide explicit hints—there are no walkthroughs or precise directions for certain difficulties like Mt. Moon. The only thing that comes even close is letting Gemini know that it needs to converse to a Rocket Grunt twice to earn the Lift Key, which was a problem that was subsequently corrected in Pokemon Yellow.”
Additionally, he noted, "Gemini Plays Pokémon is still actively being developed, and the framework continues to evolve."
#google #googlegemini #googlegemini2_5 #pokemon #pokemonblue #gemini #ai #gemini2_5pro