OpenAI took the wraps off ChatGPT’s long-promised video capabilities Thursday, letting users point their phones at objects for real-time AI analysis—a feature that’s been gathering dust since its first demo in May.
Previously, you could input text, charts, voice, or still photos and interact with GPT. This feature, released late Thursday, allows GPT to watch you in real time and conversationally provide feedback. For instance, in my tests, this mode was able to solve math problems, give food recipes, tell stories, and even turn itself into my daughter’s new best friend, interacting with her while making pancakes, giving suggestions and encouraging her learning process through different games.
The release comes just a day after Google showed its own take on a camera-enabled AI assistant powered by the newly minted Gemini 2.0. Meta’s been playing in this sandbox too, with its own AI that can see and chat through phone cameras.
ChatGPT’s new tricks aren’t for everyone though. Only Plus, Team, and Pro subscribers can access what OpenAI calls “Advanced Voice Mode with vision.” The Plus subscription costs $20 a month, and the Pro tier costs $200.
“We’re excited to announce that we’re bringing video to Advanced voice mode so you can bring live video and also live screen sharing into your conversations with ChatGPT,” Kevin Weil, OpenAI’s Chief Product Officer, said in a video Thursday.
The stream was part of its “12 Days of OpenAI” campaign that will show 12 different announcements in as many consecutive days. So far, OpenAI has launched its o1 model for all users and unveiled the ChatGPT Pro plan for $200 per month, introduced reinforcement fine-tuning for customized models, released its generative video app Sora, updated its canvas feature, and released ChatGPT to Apple devices via the tech giant’s Apple Intelligence feature.
The company gave a peek at what it can do during Thursday’s livestream. The idea is that users can activate the video mode, in the same interface as advanced voice, and start interacting with the chatbot in real time. The chatbot has great vision understanding and is capable of providing relevant feedback with low latency, making the conversation feel natural.
Getting here wasn’t exactly smooth sailing. OpenAI first promised these features “within a few weeks” in late April, but the feature was postponed following controversy over mimicking actress Scarlett Johansson’s voice—without her permission—in advanced voice mode. Since video mode relies on advanced voice mode, that apparently slowed the rollout.
And rival Google is not sitting idle. Project Astra just landed in the hands of “trusted testers” on Android this week, promising a similar feature: an AI that speaks multiple languages, taps into Google’s search and maps, and remembers conversations for up to 10 minutes.
However, this feature is not yet widely available, as a broader rollout is expected for early next year. Google also has more ambitious plans for its AI models, giving them the ability to execute tasks in real time, showing agentic behavior beyond audiovisual interactions.
Meta is also fighting for a place in the next era of AI interactions. Its assistant, Meta AI, was featured this September. It shows similar capabilities to OpenAI’s and Google’s new assistants, providing low-latency responses and real-time video understanding.
But Meta is betting on using augmented reality to push its AI offering, with “discreet” smart glasses capable enough of powering those interactions, using a small camera built into their frames. Meta calls it Project Orion.
Current ChatGPT Plus users can try the new video features by tapping the voice icon next to the chat bar, then hitting the video button. Screen sharing needs an extra tap through the three-dot (aka “hamburger”) menu.
For Enterprise and Edu ChatGPT users eager to try the new video features, January is the magic month. As for EU subscribers? They’ll just have to watch from the sidelines for now.
Edited by Andrew Hayward
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.