OpenAI recently announced powerful new ChatGPT capabilities, including the ability to use pictures in addition to text prompts for conversations with the AI chatbot.
The company offered examples:
“Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow-up questions for a step-by-step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you.”
(The company also announced that its mobile app would support voice input and output for the chatbot. You’ll be able to talk to ChatGPT, just as dozens of third-party apps already allow. And OpenAI officials also announced that ChatGPT will soon be able to access Microsoft’s Bing search engine for addition information.)
OpenAI isn’t the only AI company promising picture prompts.
Meta’s new camera glasses
Meta, the company formerly known as Facebook, recently unveiled the second version of its camera glasses, created in a partnership with EssilorLuxottica’s Ray-Ban division. The new specs, which cost $299 and ship Oct.17, boast more and better cameras, microphones and speakers than the first version — and they enable live streaming to Facebook and Instagram.
Gadget nerds and social influencers are excited about these features. But the real upgrade is artificial intelligence (AI). The glasses contain Qualcomm's powerful new AR1 Gen 1 chip, meaning users wearing the Meta Ray-Ban smartglasses can have conversations with AI via the built-in speakers and microphones. But this is not just any old AI.
In a related announcement, Meta announced a ChatGPT alternative called Meta AI that also supports voice chat, with the responses spoken by any of 28 available synthetic voices. Meta has been baking Meta AI into all its social platforms (including the glasses) — and Meta AI will also be able to search Microsoft’s Bing search engine for more up-to-date information than what the Llama LLM (LLM stands for Large Language Model) has been trained on.
Facebook promised a software update next year that will make the Meta Ray-ban glasses “multimodal.” Instead of interacting with the Meta AI chatbot through voice, the glasses will gain the ability to accept “picture prompts,” as OpenAI now does. But instead of uploading a jpg, the Meta Ray-Ban glasses will just grab the image using the cameras built into the glasses.
While wearing the glasses, you’ll be able to look at a building and say: “What building is this?” and AI will tell you the answer. Meta also promised real-time language translation of signs and menus, instructions for how to repair whatever household appliance you’re looking at, and other uses. I expect that it’s only a matter of time before the glasses tell you who you’re talking to through Meta’s powerful face-recognition technology.
In other words, Meta Ray-Bans will effectively become AR glasses with that software update.
Why the future of AR is AI
Augmented reality (AR) is technology that enhances or provides additional information about what we see in physical reality through digital images, sounds and text.
Companies like Apple, Microsoft, and Magic Leap have spent decades (and billions of dollars) inventing systems for displaying high-resolution virtual 3D objects, characters and avatars to wearers of their expensive, heavy and battery-draining AR glasses.
Whenever we in the tech media or tech industry think or talk about AR, we tend to focus on what kind of holographic imagery we might see superimposed on the real world through our AR glasses. We imagine hands-free Pokémon Go, or radically better versions of Google Glass.
But since the generative AI/LLM-based chatbot revolution struck late last year, it has become increasingly clear that of all the pieces that make up an AR experience, holographic digital virtual objects is the least important.
The glasses are necessary. Android phones and iPhones have had “augmented reality” capabilities for years, and nobody cares because looking at your phone doesn’t compare to just seeing the world hands-free through glasses.
The cameras and other sensors are necessary. It’s impossible to augment reality if your device has no way to perceive reality.
The AI is necessary. We need AI to interpret and make sense of arbitrary people, objects, and activity in our fields of view.
Two-way audio is necessary. The user needs a hands-free way to query and interact with the software in order to exert control over the AR.
And, it turns out, the virtual display, virtual data, and virtual objects, while nice to have, aren’t necessary.
The technology we used to think was most important turns out to be least important. We’ve overemphasized the visual quality of the “output” when talking about AR. Conference attendees, demo audiences and early customers have been dazzled by 3D characters jumping around and other pointless bits of content.
What about the content quality and its relationship to the reality being augmented? What really makes AR powerful is when our devices start with a clear understanding of what’s right in front of us, then can give us information, insight and advice about that reality.
It’s become clear that AI is the most indispensable component of general purpose AR.
This is the opposite of virtual reality, where the visuals are everything and AI isn’t even necessary.
Zuckerberg said at the Meta Ray-Ban announcement that “Smart glasses are the ideal form factor for you to let AI assistants see what you’re seeing and hear what you’re hearing.”
He’s right.
It’s not at all clear that Meta will dominate the future of AR. But what is clear is that AI is the future of AR, and AR is the future of AI. I wouldn’t be surprised if all the leading AI companies, including OpenAI, Microsoft and Google, quickly launched Meta Ray-Ban-like glasses. Because talking is better than typing, and showing is better than talking.
Leaks, patents and reports about Apple suggest that company is working on lightweight, everyday-wear AR glasses to be shipped years in the future long after its massive, bulky, indoors-only Vision Pro ships. But it seems to me that Apple is going to miss the boat again, just as it did with the Amazon Echo home virtual assistant appliance. It took Apple more than two years to ship the Apple HomePod after Amazon shipped the Echo. What’s delaying Apple’s all-day glasses is that it's hung up on presenting the user with compelling visual data in the lenses, rather than focusing on voice conversations and AI camera input.
Nobody seems to be fully registering the implications of last week’s announcements, so I’ll come right out and say it: The Meta Ray-Ban smart glasses announcement means the race to dominate the new platform of AI-driven AR glasses is truly on. This is a whole new compute platform that is going to be huge for both consumers and enterprises. I’ll say it again: The future of AR is AI. And the future of AI is AR.