AI Geekly
Posts
AI Geekly: Open Source Dark Horse

AI Geekly: Open Source Dark Horse

Promising open-source model appears to unseat OpenAI

Brodie Woods
December 02, 2024

Welcome back to the AI Geekly, by Brodie Woods, brought to you by usurper.ai. This week we bring you yet another week of fast-paced AI developments packaged neatly in a 5 minute(ish) read.

TL;DR Test Time; Sora Soars!

This week two bleeding edge AI models were made available, one on purpose, the other a little less on purpose… Alibaba released its impressive open-source QwQ-32B-Preview model, which promptly walked right up to the toughest model on the AI leaderboards (OpenAI’s o1-preview) and socked it right in the eye! Or, in less dramatic terms, unseated many of the top closed models like o1-preview and Anthropic’s Claude Sonnet 3.5 in reasoning tasks. This is all the more impressive given that as a 32B model QwQ-32B-Preview is several orders of magnitude smaller than its peers while delivering better performance (even more surprising, it still outperforms when quantized/compressed!). That said, we have reservations given the provenance of the model. Next, we look at the leaked OpenAI Sora preview. The company’s impressive video generation model API was leaked (and then promptly sealed back up) in an act of protest. How does it stack up relative to AI video model heavyweights? It’s not even close. Sora video generation is amazing —the quality, fidelity, consistency and fluidity is unrivaled. Read on below!

Big Things Come in Small Packages
Alibaba’s new open QwQ Model beats OpenAI’s best

What it is: Alibaba released QwQ-32B-Preview, a new open-source reasoning-focused large language model (LLM) containing 32.5 billion parameters (for comparison GPT-4 is about 1.8 trillion parameters!).

Better than o1-preview: The model demonstrates competitive performance against OpenAI's o1-preview reasoning model (the ChatGPT-maker’s top model), exceeding it on certain benchmarks like AIME and MATH, while incorporating similar "Chain-of-Thought" (CoT) reasoning capabilities (recall o-1 preview’s CoT allows the model to think step-by-step through prompts at run-time to achieve better results).

Hits pretty hard for a little fella: Notably, QwQ-32B-Preview achieves this level of performance despite being trained using significantly less compute power than its closed-source counterpart (playing into the recent media narrative that LLMs have hit a wall in performance improvement on the back of scaling laws).

The quants have it: Furthermore, even highly quantized versions of the model (compressed versions of the model for use on weaker hardware) are competitive, also exceeding OpenAI’s o1 models on certain tests, defying expectations about the performance trade-offs associated with model compression —this is the part that really surprises us! A model that fits on your computer now performs better than the top model from OpenAI. This is outstanding! Few would have guessed this time last year (we did) that an open-source model would be released that could so thoroughly trounce GPT-4 (the top dog at the time), but here we are. We’ll say it again: the rate of development in AI is unprecedented.

What it means: The release of QwQ-32B-Preview is a relevant milestone in AI development, particularly open source, providing researchers and developers with access to a powerful reasoning model that rivals the capabilities of proprietary, closed-source alternatives.

The promise of open source: The model helps to democratize access to advanced AI technology, accelerating innovation and challenging the dominance of closed-source models. A model that outperforms ones that cost >$100 mm to train is now freely available and can be run on consumer hardware.

A solution for scaling challenges: The model's impressive performance, achieved with relatively modest training resources, also has implications regarding the future applicability of AI scaling laws and the potential of alternative training strategies. Much ink has been spilled over the past several weeks regarding the major AI players hitting a supposed wall with regard to scaling models (training time, data size, compute) and while we question the degree to which this is true, solutions like CoT or more broadly “Test Time Compute” (allowing a model to do more work at the time of inference to get a more accurate result) appear to be yielding dividends.

Reason to be cautious: Notwithstanding the foregoing, we will repeat our prior warnings regarding using models produced within the jurisdiction of the PRC. Given the risk of a supply chain injection (our just invented portmanteau of a supply chain attack and an SQL injection [a common hacking technique]) via an LLM potentially influenced by a foreign adversary with a strong track record of cyber-attacks against other nations we can’t in good conscience recommend QwQ-32B-PReview or any of the Qwen family of models for use either by enterprise or by others with sensitive data.

Genie’s out of the bottle: We expect to see other model developers release similar Test Time Compute versions of their models with similar bumps in performance (i.e. Meta’s Llama models) so one need not wait long to find suitable alternatives. The battle for model supremacy is neverending!

Why it matters: QwQ-32B-Preview's competitive performance, open-source accessibility, and efficient design portend a promising new generation of smart, smaller, more accessible LLMs. While the model's current limitations require further refinement (or alternatives as supply chain injection risk is unresolvable), its strengths suggest that open-source models are rapidly closing the gap with proprietary systems. The increasing availability of high-performing open-source LLMs, coupled with innovative techniques like test-time compute makes powerful AI force multipliers more widely available. It will also enable enterprise and small business to deploy sophisticated AI solutions more affordably and at scale.

Cut! Cut! CUT!!!
OpenAI’s Sora video generation model leaked

What it is: A group calling itself "Sora PR Puppets" leaked access to OpenAI's Sora video generation API, allowing anyone to generate short videos through a front-end interface hosted on Hugging Face. This act of protest, seemingly driven by dissatisfaction with OpenAI's compensation practices and its control over Sora's early access program, exposed the technology's impressive capabilities (better than anything we’ve seen from Runway, Kling, or open-source LTXV). While access was quickly revoked, the incident briefly provided a glimpse into Sora's functionality, including its speed, resolution, and watermarking.

What it means: On the technology side, it shows just how much better OpenAI’s text-to-video model is relative to competitors (a lot!). The leak also showcases the growing pains of AI development, particularly when it comes to balancing innovation with ethical considerations and community engagement. The group's allegations of unpaid labor and "art washing" are a valid critique. The leaked version of Sora, appearing to be a faster, more refined iteration of the technology, also suggests that OpenAI is making progress despite reported technical challenges and increasing competition in the video generation space.

Why it matters: We’ve been experimenting with the abovementioned open-source LTXV model this week and we have been impressed. We’ll save our 2025 predictions for our New Year’s edition, but with the rapid pace of advancement in GenAI in the video space it won’t be long before the quality and capabilities of these models reaches a point that will begin to have an impact on the entertainment industry. We’re not that far from a world where bespoke feature length films will be created from a single sentence prompt in moments, complete with script, cinematography, soundtrack, plot, casting, etc.

Before you go… We have one quick question for you:

If this week's AI Geekly were a stock, would you:

About the Author: Brodie Woods

As CEO of usurper.ai and with over 18 years of capital markets experience as a publishing equities analyst, an investment banker, a CTO, and an AI Strategist leading North American banks and boutiques, I bring a unique perspective to the AI Geekly. This viewpoint is informed by participation in two decades of capital market cycles from the front lines; publication of in-depth research for institutional audiences based on proprietary financial models; execution of hundreds of M&A and financing transactions; leadership roles in planning, implementing, and maintaining of the tech stack for a broker dealer; and, most recently, heading the AI strategy for the Capital Markets division of the eighth-largest commercial bank in North America.