Convergence -It’s All Coming Together Perfectly
Welcome back to the AI Geekly, by Brodie Woods.
TL;DR Multi-domain technological breakthroughs compound capabilities
This week we’re drawing readers’ attention to the convergence of AI, robotics, and spatial computing. Each of these technologies on their own could be world-changing. We’ve had robots for a long time, but they haven’t been exciting, they’re brainless zombies. Spatial computing (AR/XR headsets) have been around for nearly a decade, but have failed to achieve wide adoption. The missing ingredient: AI.
2024 has been earmarked as 'The Year of the Robot', highlighted by Tesla's Optimus 2.0 and NYU's Dobb·E (December). Stanford's Mobile ALOHA and Figure's Figure-01 are the latest additions, both demonstrating advanced learning capabilities through imitation. Spatial computing, synonymous with AR/XR, is another front where technology is rapidly advancing. This technology promises to transform experiences, enabling convincing telepresence for hybrid work environments and the creation of infinite virtual worlds.
Together, these developments in robotics, spatial computing, and AI portend a once-in-eternity convergence, where multiple technological domains are no longer only advancing in isolation but amplifying each other's capabilities —smarter robots can built better AI hardware, advanced spatial computing accelerates AI engineering workflows. And the wheel spins faster. This synergy is setting the stage for a future where the integration of AI, robotics, and immersive virtual environments will redefine our interaction with technology and the world.
Imitation Game - It’s getting better all the time
What’s Happening in Robotics? We said we expected 2024 to be The Year of the Robot, and we’re off to a great start. Continuing the trend we saw in December with Tesla’s Optimus 2.0 and NYU’s Dobb·E open-source chorebot two impressive new projects were announced this week.
Mobile ALOHA (Open Source) - Stanford
Autonomously completes complex tasks like cooking, using cabinets, and organizing objects. Not only did it learn to do this from just 50 demonstrations by co-training imitation learning algorithms with data, it was able to extrapolate based on its learning to solve novel problems.
Figure-01 (Closed Source) - Figure
Figure’s Figure-01 is the company’s first attempt at a commercially viable humanoid robot. Built from an aluminum frame, the robot is designed to fit easily into a world built by and for humans, as opposed to customized for machines. Like Mobile ALOHA, learned through imitation, learning in a fraction of the time that traditional robotic programming takes.
What it means: We are witnessing an incredible increase in the learning capabilities of robots —multiple orders of magnitude. In terms of its significance to the development of robots, it is akin to biological evolution’s own shift from simple multi-cellular organisms to a more complex stage, for the first time able to interact with and understand the world around it. The implications are broad and deep.
Why it matters: In the near term, with ultra-low unemployment in the U.S. (these jobs numbers are killing my rate cuts!) and high job vacancies, the labor market creates an optimal environment for experimentation with robotic employees for unwanted roles (low-paying, dangerous, or other). In the medium term, this combined with advances in AI will severely disrupt physical work in the way that transformer-based models have disrupted white collar work.
Spatial Computing Sounds a Lot Like AR/XR to Me…
That’s because it is.
What’s Spatial Computing? Spatial computing is Apple’s terminology for the modality unlocked by its upcoming Vision Pro headset (Feb 2 launch). We welcome the fresh terminology. Essentially it is viewing or interaction with content in an extremely rich high pixel-density environment (>4K). We’ll expand the definition a little for our purposes to include rich visual displays that can present comparable experiences.
Vision Pro - Apple
Features an ultra-high-resolution display system (>4K resolution per eye) with 23 million pixels across two micro-OLED displays. The headset creates convincing augmented reality experiences, including interactive media, gaming, productivity applications, and enhanced digital communication.
Vegas Sphere (MSG Sphere at The Venetian)
While the Vision Pro showcased how advanced display technology has become on the small scale, the Vegas Sphere depicts another approach to the challenge of creating immersive spatial content. Comprised of 1.2 million LED lights, 580,000 ft2 inside, and advanced spatial sound, the Vegas sphere is impressive in its own right. It’s not hard to picture scaled-down versions that can generate convincing environments.
What it means: AR headsets have been around for some time, so what’s special about Apple? Great question. We expect Apple to do for spatial computing what they did for the smartphone. Their unparalleled commitment to design combined with their technical expertise (esp. re: silicon) is a winning combination for a technology in the right place at the right time.
Why it matters: Spatial computing will create new experiences and opportunities. Convincing telepresence that really feels like presence will enable better collaboration —why not work in the office, from home? True hybrid work will be possible. Longer-term, the combination of AI and convincing virtual environments enables the creation of infinite worlds. Not sure what we do with that exactly, but I’m sure we’ll think of something…
Artificial Intel/lectual Property
↑That’s how tightly connected they are↑
What’s new in AI? Intellectual property is a hot topic. Recent events include:
Apple’s closed-door negotiations with news companies re:using their data to train AI models
OpenAI buying data rights from Axel Springer (Politco and Business Insider
Unity doing the right thing rolling-out its Muse AI
The New York Times is suing OpenAI for copyright infringement
OpenAI has responded to the suit, claiming NYT’s evidence was cherrypicked from articles that are re-produced at high volume across the internet.
OAI also reportedly interested in buying data rights from the NYT.
JP Morgan’s DocLLM is a flexible financial-focused large language model specifically trained for improved document processing and analysis, exceeding the capabilities of GPT-4 in these tasks.
Before you go… We have one quick question for you:
With over 18 years of capital markets experience as a publishing equities analyst, an investment banker, a CTO, and an AI Strategist leading North American banks and boutiques, I bring a unique perspective to the AI Geekly. This viewpoint is informed by participation in two decades of capital market cycles from the front lines; publication of in-depth research for institutional audiences based on proprietary financial models; execution of hundreds of M&A and financing transactions; leadership roles in planning, implementing, and maintaining of the tech stack for a broker dealer; and, most recently, heading the AI strategy for the Capital Markets division of the eighth-largest commercial bank in North America.