Is OpenAI Falling Behind In The Artificial Intelligence 'Arms Race'?

By Emma Street Dec. 23, 2025 10:15 am EST

OpenAI CEO Sam Altman in front of a blurred background which says "OpenAI"

Photo Agency/Shutterstock

Describing AI development as an "arms race" might seem needlessly bombastic, but there's a reason why this term has entered common usage. It encapsulates the speed and intensity at which companies are developing and deploying AI systems. Everyone has to move fast because their rivals are moving fast, and no one wants to fall behind.

On December 2, 2025, it was widely reported that Sam Altman had issued a Code Red in an internal memo. Google's release of Gemini 3 (including Gemini 3 Pro and 3 Deep Think) on November 18, and Anthropic's release of Claude Opus 4.5 on November 24, led to speculation that OpenAI's ChatGPT was losing its edge. The Atlantic published an article on December 9 saying that OpenAI was "falling behind in the AI race", declaring that "OpenAI has not had a stable or even convincing lead on major AI benchmarks for many months." However, two days is a long time in artificial intelligence updates because on December 11, OpenAI released version 5.2, and suddenly ChatGPT is riding high once again.

So is ChatGPT the best AI model out there? As you might expect, things are a lot more complicated than that. OpenAI claims GPT-5.2 is better at professional knowledge work, like multi-step projects, presentations, and spreadsheets. Gemini's most recent release from Google was all about multimodality and understanding nuance, while Anthropic's Claude does well at agentic coding and bug fixing. And because different AI systems are good at different things, no single model is best at everything.

How well ChatGPT is doing on benchmarks

Close up of AI icons on a phone screen, including Gemini, Grok, Perplexity, Claude, ChatGPT, and Copilot

Alexsl/Getty Images

When it comes to ranking AI models, you hear a lot of talk about benchmarks. People like to put numbers to things, and so AI researchers and companies put together standardized tests to measure and score how well an AI system performs on particular tasks. Benchmarks aren't without their limitations. High scores don't necessarily mean a better user experience, and models can be optimized to ace tests at the expense of actually being more useful. Nevertheless, benchmarks remain the best data we have to objectively compare AI models.

In GPQA Diamond, a complex science-based reasoning test, ChatGPT 5.2 Pro scored 93.2%, which is better than any other AI model. Gemini 3 Pro comes second with 91.9%. In the ARC-AGI 2 benchmark, ChatGPT did worse. This test uses visual puzzles that are intended to be intuitive for humans but tricky for AI, a bit like those CAPTCHA puzzles that prove you're not a robot. You can try the puzzles for yourself here. Claude Opus 4.5 significantly outperforms all its rivals here.

Another benchmark, Humanity's Last Exam (HLE), uses expert-level, open-ended problems that even the cleverest humans struggle with. It's intended to be the last meaningful academic exam humans can set for AI. Once AI consistently does better than humans here, it'll be smarter than us, and we won't be able to set meaningful tests to measure artificial intelligence anymore. The AI performing best on HLE at the moment is Gemini 3 Pro with 45.8%. OpenAI claims that ChatGPT 5.2 Pro scores 36.6%, which is an improvement on GPT-5's score of 35.2%, but still puts it in third place behind Gemini and a lesser-known open-source AI, Kimi K2 Thinking, which scores 44.9%.

So is OpenAI in trouble?

Hands holding a smartphone displaying ChatGPT with a laptop also displaying ChatGPT in the background

Ju Jae-young/Shutterstock

Across many benchmarks, OpenAI's ChatGPT is consistently in the top five of AI models, and in some specialties, it takes the top spot. So saying that it's falling behind seems like a bit of a stretch. That is, until you consider how much further it used to be ahead of its rivals. Throughout 2023 and most of 2024, it was far more likely to take the lead across benchmarks. And there are other ways to score AI models besides benchmarks, like LMArena, a public platform where users anonymously compare AI models head-to-head.

The overall favorite on LMArena at the time of writing is Gemini 3 Pro, with ChatGPT coming in 8th place. In 2023, ChatGPT used to come first consistently on the platform (then called Chatbot Arena). In mid-2024, it still topped the list. However, by late 2024, ChatGPT faced competition from Gemini, which took the lead for the first time. Throughout 2025, the big AI companies have been leapfrogging one another, often taking the lead after a new release, only to be eclipsed by a rival's next update.

Companies like Google and Microsoft have an advantage over OpenAI, as they can incorporate Gemini and Copilot into tools that people are already using in a bid to increase AI adoption. However, when it comes to user numbers, OpenAI's rivals have a long, long way to catch up before they're anywhere close to OpenAI's figures. ChatGPT has 5.6 billion monthly visits and accounts for around 60% of all AI use. Its market share is bigger than Gemini, Claude, Grok, Copilot, and every single other AI tool combined. While it might not necessarily be the tech experts' favorite, most people don't actually care about benchmarks. ChatGPT's number one spot with regular users remains uncontested.

Is OpenAI Falling Behind In The Artificial Intelligence 'Arms Race'?

How well ChatGPT is doing on benchmarks

So is OpenAI in trouble?

Recommended