AI Geekly
Posts
You Gotta Be Fresh

You Gotta Be Fresh

What it will take to make the next gen of AI models

Brodie Woods
June 24, 2024

Welcome back to the AI Geekly, by Brodie Woods, brought to you by usurper.ai. This week we bring you yet another week of fast-paced AI developments packaged neatly in a 5 minute(ish) read.

TL;DR Don’t Tip Your Servers; Claude 3.5 Sonnet -less than meets the eye; SuperalignmentMan’s secret identity

This week we have more exciting AI news to share with you. We begin with a high-level analysis of AI model training clusters and what it will take to get to truly next generation models (and how it’s probably more work than you think). Next we take a peek at Anthropic’s Claude 3.5 Sonnet model, their newest, best-in-class (and take a moment to understand why it’s not all it’s cracked-up to be). Finally, we observe yet another shot across OpenAI’s bow by former employees starting a new company dedicated solely to what they have vociferously called-out as OAI’s shortcomings. Read on, because this paragraph and the TL;DR (Too Long; Didn’t Read) really won’t make any sense unless you do!

It’s All About the Infrastructure
Analyzing what it takes to get to the next level of AI

What it is: This week, venerable semiconductor analysis publication SemiAnalysis released a deep-dive report discussing the current state and future directions of AI model training, particularly focusing on the infrastructure required for large-scale AI clusters. It highlights the stagnation in AI capabilities since the release of GPT-4, attributing this to the lack of significant increases in compute power dedicated to single models (we concur with this assessment of stagnation and the reasons therefore). The article also delves into the technical complexities and challenges of building and maintaining large AI training clusters, including network topology, power requirements, and fault tolerance —it is easy to gloss over these items, but the reality is that at the current scale of compute, seemingly smaller issues like these scale-up substantially, creating major bottlenecks if not thoroughly planned.

What it means: The stagnation in AI capabilities is not due to a lack of innovation but rather the limitations in available compute power and infrastructure. Readers of the Geekly will recall that the three main ingredients to AI models are Compute, Data, and Algorithms. Ergo, limitations on overall compute dedicated to model development hamper progress. The next significant leap in AI is expected to come from training multi-trillion parameter multimodal transformers, which require massive amounts of compute power and sophisticated infrastructure. SemiAnalysis rightly underscores the importance of efficient network design, fault tolerance, and power management in achieving these advancements, as well as the high costs and technical challenges involved.

Why it matters: Understanding the infrastructure challenges and requirements for large-scale AI training is critical for the future development of more advanced AI models. As it stands today, no datacenters exist that can support the combined scale, networking, and power requirements needed to train next-gen models: they will need to be built (and several are in progress). As companies race to build more powerful AI clusters, innovations in network design, power management, and fault tolerance will be key to overcoming current limitations. This knowledge is essential for stakeholders in the AI industry, including researchers, engineers, and investors, as it highlights the areas where investment and innovation are most needed to drive the next series of AI advancements. Watch this space closely.

Play me a Tune; Rhyme me a Rhyme
Anthropic’s Claude 3.5 Sonnet model release

What it is: Anthropic has announced the release of Claude 3.5 Sonnet, a new AI model in their Claude 3.5 product line. This model is designed to outperform its predecessors, particularly Claude 3 Opus, in terms of intelligence, speed, and cost-efficiency. Claude 3.5 Sonnet excels in tasks requiring graduate-level reasoning, undergraduate-level knowledge, and coding proficiency. It also introduces new features like Artifacts for dynamic content interaction (Artifacts allow users to modify images and text on the fly from within the chatbot window).

What it means: Despite the improvements, Claude 3.5 Sonnet does not represent a major leap in AI capabilities. The model's advancements are primarily incremental, focusing on enhanced performance metrics and new features rather than groundbreaking innovations. This stagnation is largely due to the similar levels of compute power applied to training Claude 3.5 Sonnet (per the article we referenced above) as compared to other recent models, which limits the potential for significant breakthroughs. That said, we do expect better performance with so-called “hard” prompts such as bespoke coding problems, as the company has had access to synthetic data to train its model, which generally leads to better hard prompt performance.

Why it matters: We hate to be Deborah Downers when so many in the tech media are crowing about a model that beats GPT-4o on a handful of benchmarks, but the reality is that beyond the reductions in latency and price (let us be clear, both of these are great for enterprise use cases), Claude 3.5 Sonnet does little to move the needle regarding the state of the art. Rather, the release belies the challenge facing the AI industry of achieving substantial advancements without a corresponding increase in compute power. While the model offers improved performance and new functionalities, it underscores the need for more significant investments in infrastructure to enable the next major leap in AI capabilities. We contend that while incremental improvements are valuable, the true potential of AI will only be unlocked with more substantial computational resources and innovative training techniques.

The Superalignment Security Blanket
Questionable business models in the name of safety

What it is: In the next chapter of the ongoing saga of OpenAI, its employees and its alumni, recently departed co-founder Ilya Sutskever has announced the formation of a new company called Safe Superintelligence Inc. (SSI). The company is dedicated to developing a "safe superintelligence," focusing solely on creating an AI model that surpasses human intelligence while ensuring its safety. SSI will not release any products or engage in commercial activities until this goal is achieved, aiming to insulate its mission from short-term commercial pressures.

What it means: The formation of SSI is in stark contrast to the current for-profit operating model of OpenAI. This is by design. Sutskever and team were reportedly very displeased with OAI’s purported focus on product release over safety (and quite public about it in fact), which drove the departure of several key safety team leaders (we covered this here) and the eventual dissolution of the company’s Superalignment team. It therefore comes as no surprise to see Ilya and team form a company specifically focused on Superaligment (read: ensuring that the goals and desires of AI match those of humans over the long-term).

It’s a bold move; let’s see how it plays out: Not releasing any products, SSI aims to avoid the distractions and compromises that come with commercial pressures, allowing the team to concentrate fully on the technical challenges of ensuring AI safety. This approach contrasts with other AI firms that balance innovation with market demands. We don’t really see how this will work long-term; investors will want to see a return on their dollars and that’s going to be very hard to do with zero product releases.

Why it matters: SSI's singular focus on safe superintelligence pits them squarely in the camp of the “AI Nervous”, those within the AI community concerned about the potential risks of advanced AI systems. Dedicating resources exclusively to safety, SSI is by far the most conservative AI company we have seen, even relative to Anthropic whose ex-OpenAI founders too formed their company due to concerns about a lack of safety focus at OpenAI. We don’t agree with their business model, because, well, it’s not a business. However; we do appreciate its contribution to the rich tapestry of the overall AI development landscape. Perhaps their submissions to academia (assuming they support at least some Open Science) will enrich the overall quality of AI tools more broadly.

Before you go… We have one quick question for you:

If this week's AI Geekly were a stock, would you:

About the Author: Brodie Woods

As CEO of usurper.ai and with over 18 years of capital markets experience as a publishing equities analyst, an investment banker, a CTO, and an AI Strategist leading North American banks and boutiques, I bring a unique perspective to the AI Geekly. This viewpoint is informed by participation in two decades of capital market cycles from the front lines; publication of in-depth research for institutional audiences based on proprietary financial models; execution of hundreds of M&A and financing transactions; leadership roles in planning, implementing, and maintaining of the tech stack for a broker dealer; and, most recently, heading the AI strategy for the Capital Markets division of the eighth-largest commercial bank in North America.