Was Google's Big AI Announcement Ruined By OpenAI's GPT-4o?

Google just gave a keynote presentation where the company outlined all the different ways that it planned to integrate its Gemini AI into its various platforms. Its representatives presented Gemini-powered tools that would work with everything from photo and music generation to calendar management and team communication, to fundamentally changing the way people used its search engine, which is the foundation of the entire company. They also announced something called Project Astra, which is an AI agent with both voice and image recognition. It can identify visual objects and respond to both questions and requests.

It's clear that this keynote was meant to be a resounding declaration of the place that AI would be taking in the future of the company. However, it's hard not to wonder if Open AI might have taken a bit of the wind out of Google's sails before it even had a chance to leave the harbor. Open AI announced the release of Chat GPT-4o the day before Google's keynote, and it seems that it can do a lot of the same things that Project Astra can.

Open AI released its 'human' AI first

Project Astra is marketed as a "more human" form of artificial intelligence. DeepMind founder and CEO Demis Hassabis spoke about the technology during the keynote. He claimed that it is "a universal AI agent that can be truly helpful in everyday life." He used the term "multi-modal" to describe the way the program takes in information from various sources and uses these separate kinds of input to generate context for answering questions, generating content, and relaying information.

Google showcased Project Astra in motion, as its representative used their phone to identify a speaker, create alliterative comments about a jar of crayons, describe bits of code from a monitor. They also came up with a band name for a dog and stuffed tiger duo.

All of this is impressive, but seems to have been undercut by Open AI beating Google to the punch. According to the company's announcement, "GPT-4o ("o" for "omni") is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs." Sound familiar? It also claims that it has the full generative power of Chat GPT-4 Turbo, and it averages 320 milliseconds in response time, which mirrors human response times.