So, AI has come a long way, right? But a recent study from the Hao AI Lab at UC San Diego shows something pretty interesting: Super Mario Bros. is tougher on AI than Pokémon. Researchers put various AI models to the test, having them play live to see how well they could handle real-time decision-making and strategies.

Claude 3.7 Shines, While GPT-4o and Gemini 1.5 Pro Stumble

In this study, Anthropic’s Claude 3.7 stood out, leaving competitors like Google’s Gemini 1.5 Pro and OpenAI’s GPT-4o in the dust. They used something called the GamingAgent framework, which gave instructions and screenshots from the game. This setup allowed the AI to whip up Python code to control Mario. Claude 3.7 was able to navigate the tricky game mechanics like a pro, while the other two struggled to keep up with the fast-paced action.

Why Reasoning Models Hit a Wall in Real-Time Games

Interestingly, the study showed that reasoning models like OpenAI’s o1 didn’t do as well as their non-reasoning counterparts. Sure, reasoning models are great at tasks needing logical thought, but when it comes to real-time decisions? Not so much. Super Mario Bros. requires lightning-fast reactions, and these models were just a bit too slow to figure out the best moves.

Are Gaming Benchmarks the Best Way to Measure AI Progress?

This raises some big questions about how we measure AI’s growth. Games like *Super Mario Bros.* and Pokémon offer tons of data, which can help AI learn the rules. However, experts are starting to wonder if these gaming challenges reflect AI’s abilities in the real world, especially when it comes to practical decision-making.

Looking Ahead for AI

While this study sheds light on AI’s struggles in quick-paced games, it also points to the need for more adaptable AI. Future models should blend reasoning with speedy decision-making—finding that balance between gaming skills and real-world tasks.

Thank you for reading this post, don't forget to follow my whatsapp channel

Leave a Reply

Your email address will not be published. Required fields are marked *

Discover more from TechKelly

Subscribe now to keep reading and get access to the full archive.

Continue reading