Stolen Thunder

OpenAI cuts the line

Welcome back to the AI Geekly, by Brodie Woods, brought to you by usurper.ai. This week we bring you yet another week of fast-paced AI developments packaged neatly in a 5 minute(ish) read.

TL;DR Assistant Announcements; More Assistant Announcements; A Little Chaos

This was NOT a quiet week in the world of AI… To say it was eventful would be putting it mildly. OpenAI was in the news twice this week: initially for the exciting release of its new GPT-4o model + associated assistant and, by the end of the week, for the disbandment of its Superaligment team in concert with high-profile departures from same. The execs chastised management on the way out, criticizing a focus on new products over safety and ethics. In non-OpenAI news (but perhaps still in the shadow of OAI) Google held its major I/O developer conference where it hoped to allay market/client/general perceptions that it is behind in the great GenAI race. To this effect, GOOG announced several new AI models in assorted sizes and its project Astra augmented reality AI assistant. Read on below the fold as we walk you through the implications of these developments.

New Model, New Assistant, New High-Watermark
OpenAI announcements wow the world

What it is: Like many in the AI space, we spent Monday afternoon watching and dissecting OpenAI’s livestream of several announcements: the release of its latest public model dubbed GPT-4o (“o” is for “omni” signifying the model’s multi-modal nature); the availability of its newest model for free (sorry ChatGPT Plus folks) to all public users; a supercharged AI personal assistant able to respond in real-time(ish) to text, video, image, and sound prompts live; and finally, a Mac desktop app that allows OpenAI’s GPT models to interface directly with users’ desktops.

What it means - GPT-4o: There’s a new gold standard for LLMs (Large Language Models) and MMMs (Multi Modal Models). This year, as predicted, we’ve seen open models approach, match, and surpass the capabilities of GPT-4 (released over a year ago). Unfortunately the closed and proprietary nature of OpenAI’s model means that open science (aka open source) researchers are left guessing when it comes to understanding the underlying architecture of OpenAI’s first MMM designed as such from inception (prior MMMs were a mix of distinct models). For proponents of the democratization of AI, the trend of companies benefitting from the public contributions of the open science community to iterate and improve their private models while contributing nothing in return is a troubling trend. That said, the open science engine of innovation has been firing on progressively more cylinders, as more altruistic entities (academics, institutions, companies) join the ranks.

What it means - AI for the masses: Gripes about closed models aside, we are pleased with OpenAI’s follow-through on its commitment to make game-changing AI tools available to all. While the value prop for ChatGPT Plus users vs. free users seems to be evaporating, the broader availability of the best AI model in the world is a societal good. Regardless of background, geography, or level of education, anyone with an internet connection will be able to access GPT-4o’s multimodal capabilities. The model’s ability to process text, sound, video, and image content means that even communities with lower levels of literacy can apply and learn from these tools. This may be the best example in recent decades of a rising tide that can truly lift all ships.

What it means - Upgraded AI Assistant: OAI’s new upgraded AI assistant is seriously impressive. In Tech, companies and features live or die by their UX (User eXperience). Simple conversational interaction, permitting in-session crossover into video and image ingestion, “feels like magic” and mirrors futuristic human-machine interfaces imagined in science fiction. Undeniably, the future has arrived. While OAI’s demos focused on simple use cases like real-time translation and math tutoring, the applications for its use are widespread. Tantalizing use cases from personal, to travel, to corporate/enterprise are enabled through the tool. Keep in mind that this is still the first iteration, so we expect limitations, bugs, and as always, hallucinations, while OpenAI works to more fully integrate the capabilities it offers into a singular tool (i.e. conversation memory, custom GPTs, Sora, and more). It’s not hard to imagine a scenario in the future where users could ask their personalized GPT-4o how to complete a highly complex task and in response it could produce an instructional narrated video, or, even instruct an AI-enhanced robot to complete the task on their behalf. Once again, however, the update is bad news for professional translators and tutors, and the companies in those sectors, unless they are quick to adapt and apply these tools themselves.

Source: OpenAI

What it Means - Mac Desktop App: The Mac Desktop App is like Microsoft’s Copilot solutions, but for your entire PC (akin to the Copilot embedded in Windows 11). The app allows GPT-4o to interact directly with a desktop user’s files, the content on the screen and more. No longer are users limited to a sandboxed environment of interaction, or a “push” setup uploading singular files to interact with AI. Like its new AI assistant, OAI’s desktop app integrates into the lives and workflows of users as a native application. Contrast this with the many limited, embedded AIs released over the past year plus: Google Gemini for Workspaces, Microsoft 365, Airtable, Notion, Zoom, Slack —all are sandboxed where OpenAI’s tool is not. Who will want to use a dozen singular, limited tools when there is an option for a unified singular point of contact (one throat to choke) that cross the boundaries of a user’s entire desktop experience? As with prior announcements from OpenAI: a whole bunch of companies just learned that their business model or feature set has been eaten for breakfast. Thin wrappers around advanced AI models does not a business make.

Why it Matters: According to a recent study by McKinsey, nearly 50% of employees use ChatGPT or similar tools in their work. As OpenAI continues to reduce the friction of user-AI interaction, we expect greater integration of AI tools in enterprise and personal settings. This week’s announcements were impressive in this regard, pushing the envelope yet again, and putting OpenAI’s competitors on notice. Google, AWS, Apple, and even Microsoft (despite their OAI partnership) will need to seriously step-up their game to close the gap (more on GOOG’s efforts below). With the GenAI frenzy starting pistol fired in 2022, OpenAI has been ahead of the pack, and while it seemed the pack was catching-up, OAI has clearly shot ahead once more. We can’t help but consider the possibility that the so-called AI flywheel has already begun to spin, and it may now be impossible for any AI company to catch-up to OAI’s accelerating rate of improvement, save for government intervention, which is seemingly more probable given this week’s Superalignment/safety fiasco (covered in greater detail below)

Reaching for the Stars
Google pins hopes on Gemini and Astra

What it is: At its annual developer conference (I/O 2024) Google unveiled several upgrades and expansions to its core AI offerings. Key announcements included enhancements to its Gemini AI model, featuring enhanced conversational capabilities, real-time visual information processing, and integration into various Google products like Chrome and Android. Additionally, Google showcased new AI-powered features across its product ecosystem, including updates to Google Search, Google Photos, and Google Workspace. From a technical standpoint, we were most impressed by the doubling of Gemini’s context window to 2 mm tokens, meaning that the model can now consume close to 1.5 million words, >60,000 lines of code, 22 hours of audio or 2 hours of video content (this is starting to get serious).

What it means: As previously covered, Google has been playing catch-up since late 2022 when it was caught flat-footed with the release of OpenAI’s ChatGPT (despite itself being responsible for the underlying technology). Ever since, Google has been struggling to assuage investor and customer concerns that the company is too far behind to catch its smaller, nimbler competitor. While it can’t turn on a dime, Google has, with its announcements this week, finally brought itself close to feature parity with OpenAI, and indeed, offers its own compelling solutions that may surpass OAI via its deep integration of AI into its product suite. Like OAI’s GPT-4o, Google’s upgraded Gemini AI model’s multimodal capabilities aim to provide more natural and efficient interactions across text, images, sound, and video. The integration of AI into Google Search and other products may make its AI easier to work with, especially as we expect that OpenAI’s assistant and Mac client will not have access to a user’s cloud data to the degree that Google does, giving GOOG the upper hand in a world where users’ digital content is increasingly cloud-based.

Why it matters: We think Google is headed in the right direction. While OpenAI strategically announced many similar features the day before I/O, credit is due to Google for its successful turnaround. We’ve heard through back-channels that Google is increasingly a cloud vendor of choice with its robust GenAI platform, including its Vertex AI fully-managed machine learning platform and associated low-code/no-code generative AI studios (note: AWS also has an impressive enterprise offering in Bedrock, Canvas and more). Relative to OpenAI, Google’s enterprise clients now enjoy: better pricing, longer context window, better uptime, more support, fully-managed platform, better software stack integration and more. Depending on the use case, enterprise players of sufficient scale would do well to consider a multi-AI strategy rather than force themselves to use a singular model provider. A combination of Google, OpenAI, and Meta (i.e. Llama 3) models offers greater flexibility in use cases, configuration, and reliability than any singular model in isolation.

Superbad
OpenAI’s Superalignment and safety dirty laundry aired

What it is: We should have asked you which one you wanted first: the good news or the bad news. We gave you the good news above, what’s next is all bad news. You know OpenAI’s Superalignment team? The one responsible for ensuring that successively more intelligent AIs remain aligned with the overall goals, aspirations, and interests of humanity? They, uh… disbanded it. But “don’t worry”, they say, safety is now being embedded in every stage of product development (was it not before?). Critics however, including ourselves, argue that a dedicated team is needed to act as a check against irrational exuberance.  

What it means: Look, we get it. Safety isn’t fun. It isn’t sexy. But it is critical; especially for a technology that has rocked industries with its product launches, threatens severe job displacement, and potential economic havoc without purposeful deployment. In just a couple of years, several previously viable careers have become commoditized as AI use cases. Design, copy writing, education, translation, music, film, cybersecurity and more have all been disrupted with little-to-no warning. Frankly, we don’t think this disruption could have been avoided, these are the core capabilities of GenAI. We do, however think that these serve as useful examples of how quickly the status quo can be forever altered. While these events have impacted certain groups, we note that as more tools are released, the impact becomes wider and begins to effect broader swaths of society. We would be remiss if we didn’t mention that open source AI and the democratization thereof helps to empower the smaller players who are at the greatest risk of disruption.

Why it matters: The way things went down was not pretty. The disbandment of the SA team was largely driven by the departures of several high profile leaders in the team: Ilya Sutskever, co-founder and Chief Scientist (one of the most respected minds in AI) and Jan Leike, Superalignment co-lead (former Deepmind). Long-standing tensions seem to have mounted to a breaking point. Jan had a lot to say on X, blaming a lack of compute resources (despite a publicly announced 20% commitment), and a preference for launching shiny new products over prioritizing safety. Por que no los dos? As OAI continues its damage control efforts we hope this acts as a shot across the bow, reigning-in some of the danger by drawing attention to these shortcomings. We wouldn’t be surprised to hear an uptick in regulation-talk on the back of these events, which may lead to too heavy-handed an approach.

Before you go… We have one quick question for you:

If this week's AI Geekly were a stock, would you:

Login or Subscribe to participate in polls.

About the Author: Brodie Woods

As CEO of usurper.ai and with over 18 years of capital markets experience as a publishing equities analyst, an investment banker, a CTO, and an AI Strategist leading North American banks and boutiques, I bring a unique perspective to the AI Geekly. This viewpoint is informed by participation in two decades of capital market cycles from the front lines; publication of in-depth research for institutional audiences based on proprietary financial models; execution of hundreds of M&A and financing transactions; leadership roles in planning, implementing, and maintaining of the tech stack for a broker dealer; and, most recently, heading the AI strategy for the Capital Markets division of the eighth-largest commercial bank in North America.