AI Geekly: They're Ba-ack!

Fully recharged after a summer break

Welcome back to the AI Geekly, by Brodie Woods, brought to you by usurper.ai. Fresh off of a short ~ one month hiatus, we’re back with yet another week of fast-paced AI developments packaged neatly in a 5 minute(ish) read.

TL;DR Catching-up on some recent developments

It’s been a while! Hope you’ve all been keeping well. Things haven’t slowed down in the world of AI since our last note. There’ve been a number of impressive developments in the last month. We’re going to go with a bit of a freeform format this week, as we have a lot to cover, so please bear with us. Next week we’ll get back to the punchy AI insights we’re known for.

New Kids on the Block
OpenAI and Meta released some impressive new models

GPT-4o mini: We have a new model from OpenAI, GPT-4o mini —a leaner, cheaper, faster model that performs pretty close to GPT-4o but at a fraction of the cost and much quicker response time (great for commercial production use cases). It’s not much to write home about necessarily, but good to keep an eye what the leader of the pack is up to.

Llama 3.1: Another interesting release was Meta’s Llama 3.1 70B and 405B models (meaning they have 70 billion and 405 billion parameters). This is the model we have been waiting for in the open-source community. While it’s too large in its native form to run on a single node, it can be run locally in its quantized (read: shrunken) form. What makes this model truly impressive is that it is the first open-source model that matches, in fact surpasses, Open AI’s GPT-4 model —an impressive feat!

Open source Catches-up with OpenAI: Longtime readers of the Geekly will recall we actually predicted a model like Llama 3.1 in our first note of 2024. What’s impressive isn’t the fact that we were right, the impressive thing is how quickly this technology is moving. Open-source models open up access to bleeding-edge models to anyone with the technological skill and the desire to tweak them for their own purposes. Of particular interest is the application of this model for enterprise and small-business use cases where the model can be deployed free of charge (ex-infrastructure) and customized as needed in a completely private, on-premises application. This is of particular benefit to those operating in highly regulated industries like finance and healthcare, or simply for those who are uncomfortable sharing their data assets with third parties.

AI Image Generation Scene in FLUX
New image gen model shakes up the space

What it is: There’s been a major shake-up in the world of image generation. A number of former developers from Stability AI (the makers of Stable Diffusion) have formed Black Forest Labs (BFL), and just released an image generation model that surpasses anything we’ve ever seen. Dubbed FLUX 1 (and coming in three different flavors, two open source) the new model surpasses all of the latest models from Midjourney, DALL-E (OpenAI), and Stable Diffusion —by a massive margin in terms of realism, quality, and complexity. It also reliably generates embedded text, which has long been a struggle with image generation. It counts among its investors VC heavyweights like A16z and Y-combinator.

What it means: This represents a massive shift in the AI image generation landscape. BFL has combined experimental techniques like rotary positional embeddings and parallel diffusion transformers to create a faster, more accurate model (albeit one that requires fairly high-end hardware to run). This leap in capabilities could force established players to innovate rapidly or risk falling behind.

Why it matters: FLUX 1's surprise emergence from stealth and effectively trouncing its competitors is part of what makes this space so fun. One minute an AI company is on the top of the world, proudly proclaiming their model to be the best. Then, out of nowhere a dark horse emerges and takes the lead. There are no pauses in AI. No chances to take a break. Every day there is someone out there who is working to take the top position. A scrappy team willing to try new ideas, put into practice new research to produce something even better. On display is that spirit of competition and collaboration, so well presented here, that reminds us of the human element. In the AI world, it’s the Olympics every day.

What’s next: BFL is working on a text-to-video model that will rival OpenAI’s Sora. If it’s anything like FLUX 1, text-to-video players like OAI, Luma Labs, and Kuaishou (makers of Kling) should view this as a shot across the bow: BFL is coming for them next.

Keeping it Geekly: We wouldn’t be the AI Geekly if we weren’t using the best AI tech we can get our hands on. Every image in this week’s Geekly (and going forward until better models come out, which they will) was produced using FLUX 1.

All The Rest
Hesitant to call these leftovers because each is a main course

Exit Stage Left: OpenAI's leadership exodus continues, with only 3/11 founders still at the helm. Greg Brockman's (President, Chairman) taking a sabbatical, John Schulman's (Safety Lead) jumping ship to Anthropic, and Peter Deng's (VP of Consumer and Enterprise Product but not a founder) out the door. It's like watching a high-stakes game of musical chairs, but with AI geniuses instead of kindergarteners. Many in the media have questioned whether a game-changing AI could truly be around the corner in GPT-5 (or whatever they call it) otherwise why would these experts depart? We think that may be reading too much into it. Like with Sam’s dismissal and resurection in November, sometimes it’s just the human factor.

Groq just dived into its Scrooge McDuck Money Bin: Groq just raised $640 mm from a suite of investors to fund its AI chip ambitions, valuing the company at a cool $2.8 billion. The AI gold rush is definitely still ongoing, as the Information reported that Q2/24 saw a record $12 Bn raised, 3x Q1/24 and ahead of the $11 Bn previous record from Q1/23 (of which $10Bn was MSFT’s investment in OAI). If this is a bubble, there’s air in it yet. The challenge remains generating value from the massive investment to date in the technology, which was a topic of intense interest as MSFT, GOOG, and AMZN announced their quarterly results last week.

Autobots Roll out!: Figure's new humanoid robot, Figure 02, is ready to clock in on the factory floor or the office. Which is a little more painful given last week’s flagging US jobs report… The 02 has better battery life, slicker design, and serves-up requisite office banter courtesy of its OpenAI powered brain. Recall part of the allure of humanoid robots is that in a world designed for humans, it’s easier to plug-in a human shaped robot than it is to rebuild an entire factory or process from scratch. Sometimes the bolt-on approach is the best answer.

Forget the Olympics, it’s the Math Olympiad that matters: Harmonic's Aristotle is flexing its mathematical muscles, acing Olympiad-level problems like they're grade-school arithmetic. As readers of the Geekly know, Large Language Models (LLMs) powering the leading chatbots struggle with mathematics. This puts Aristotle in direct competition with OpenAI’s alleged Strawberry model (formerly Q*) which also excels in logic and mathematics.

Blackwell Delays and Video Challenges: Nvidia's next-gen AI chips are expected to be released three months late due to a design flaw. With reports that Nvidia’s GPUs are under-specced for video-related AI tasks, where data is measured not in terabytes, but in petabytes (1,000 terabytes). Apparently, our math has been off this whole time… a video is worth a thousand words… While Nvidia’s cards remain the preferred cards for AI video workloads, this does present an opportunity for AMD, who’s latest MI300x Instinct cards have more VRAM (192 GB) than Nvidia’s current generation H100 cards (80 GB).

Before you go… We have one quick question for you:

If this week's AI Geekly were a stock, would you:

Login or Subscribe to participate in polls.

About the Author: Brodie Woods

As CEO of usurper.ai and with over 18 years of capital markets experience as a publishing equities analyst, an investment banker, a CTO, and an AI Strategist leading North American banks and boutiques, I bring a unique perspective to the AI Geekly. This viewpoint is informed by participation in two decades of capital market cycles from the front lines; publication of in-depth research for institutional audiences based on proprietary financial models; execution of hundreds of M&A and financing transactions; leadership roles in planning, implementing, and maintaining of the tech stack for a broker dealer; and, most recently, heading the AI strategy for the Capital Markets division of the eighth-largest commercial bank in North America.