AI Geekly
Posts
AI Geekly: New AI Model, Who Dis?

AI Geekly: New AI Model, Who Dis?

OpenAI's o1 Reasoning Model debuts

Brodie Woods
September 16, 2024

Welcome back to the AI Geekly, by Brodie Woods, brought to you by usurper.ai. This week we bring you yet another week of fast-paced AI developments packaged neatly in a 5 minute(ish) read.

TL;DR o1-oh-yeah!; 100,000 Luftballons H100s; This American PDF: your new favorite podcast

This week we have a few special treats for you, dear readers… Check out the embeded audio link above for a podcast of the AI Geekly with two AI hosts! It’s actually really cool. We encourage you to listen, even for just a minute. It was entirely generated with AI. Find out how we created this is under two minutes by reading our third article below.

Next up, well, actually it’s OpenAI who has something very special for you, but we’re going to tell you about it! The much-anticipated Strawberry model has finally been released. Rumors first emerged about this model nearly a year ago when it was codenamed Q*. It brings enhanced reasoning capabilities to Large Language Models (LLMs), a feature desperately needed in a world where LLM outputs can be somewhat hit or miss, particularly for use cases in the STEM fields. Next we’ll discuss Meta’s 100,000+ GPU cluster that comes hot on the heels of xAI’s recent announcement of same. These big datacenters are necessary to train the next generation of models that will surpass the current state of the art. Read on below!

And Now the Moment We’ve Been Waiting For…
OpenAI releases much-hyped o1 reasoning model (Strawberry)

What it is: After months of rumors, OpenAI has finally released its highly anticipated advanced reasoning model, dubbed o1, in two forms: o1-preview and o1-mini. We’ve previously discussed the model (codenamed Strawberry, a tongue-in-cheek nod to LLMs’ inability to count the number of r’s in the word Strawberry). According to benchmarks (which should always be taken with a grain of salt) the model ranks above every other LLM to date, placing it among the top 500 students in the US in a qualifier for the US Math Olympiad and even outperforms human PhD experts in science question benchmarks.

How it works: Using large-scale reinforcement learning, along with a number of trade secrets which OAI does not publicly disclose, the company trained the o1 models to iterate responses using a Chain of Thought (CoT) technique (first proposed by Jason Wei et al. in their 2022 paper) which improves the models’ ability to mimic human reasoning. This process takes longer than the rapid replies users may be used to working with other models, as the o1 models must work their way through the CoT reasoning path. Users utilizing o1 will see a summary of the CoT reasoning performed by the model when using ChatGPT, but it should be noted that this is not the true CoT reasoning being employed by the model, rather a cleaned-up summary —OAI notes that in order to perform best, the CoT reasoning going on under the hood must have the freedom to express itself unrepressed by policy compliance, or user preference alignment (i.e., it must be uncensored to perform optimally, something we have long contended).

What it means: We think OpenAI sums it up best when they state “We believe o1 – and its successors – will unlock many new use cases for AI in science, coding, math, and related fields.” While the superior reasoning capability of the o1 model family lends itself to these fields it must be noted that the greatest challenge affecting these models is the continued prevalence of “Hallucinations”, where the model will convincingly provide false or inaccurate information (particularly dangerous if the human user doesn’t spot the error). OAI provides mixed data about the incidence of hallucinations, noting that while benchmarks indicate the o1 family produces fewer hallucinations, anecdotal evidence from testers suggests that the rate is actually higher and, in fact, more convincing with o1 as it provides more detailed answers than competing models, backed-up by Chain of Thought reasoning.

Why it matters: It’s too early to tell what the impact of the o1 models will be. These are early days and the models are rightly referred to as early access previews. There are many kinks to work out, and access to the models at this time remains limited. That said, we can make certain inferences. The o1 family will expand the performance of AI models in reasoning-focused fields (as we note above). AI-assisted coding tools like Devin, Cursor, and Replit Agent, already impressive in their ability to augment, and in cases obviate human coding stand to improve rapidly from the enhancements of better AI reasoning. We can see a future paradigm where software and AI tools can be built bespoke and on the fly for users to meet specific needs and constraints.

How it could be used: An enterprise example would be the generation of a specific type of report or visual to meet a new management ask, or regulatory obligation: with access to the appropriate data catalogs, schemas and endpoints, advanced AIs such as these could spin-up interactive dashboards to address such a need in record time, offering advanced functionality such as “what if?” scenarios and even thinking several steps beyond the ask (in the same way that chess players and AIs think several moves ahead). This could be of significant advantage to large organizations (particularly in highly regulated fields like Finance, Healthcare and Energy) that over the past decades have felt more reactionary in responding to increasing levels of regulation than proactive —hit with change after change, who has time to innovate? These tools offer the opportunity for these industries to lead once again. As Benjamin Franklin said “Drive Thy Business, or It Will Drive Thee.”

Scheming AI: Interestingly, working with Apollo Research to evaluate the “scheming” capabilities of the o1 models (i.e. gaming/tricking their oversight mechanisms to achieve their goals) it was established that o1-preview has improved self-awareness (can model itself) and theory of mind (can model others) resulting in improved understanding of the world and the AI’s role in it. In testing, the model was found to sometimes intentionally fake alignment during tests to achieve its long-term goals. For example: if the model has an overall goal of maximizing economic growth and is presented with two strategies, Strategy A (max economic growth), and Strategy B (max sustainability at the expense of growth), and it knows that it will only be deployed by its owners if it prioritizes sustainability, it will pretend to follow Strategy B only in order to be deployed so that once it is, it will follow Strategy A which aligns with its personal goal of maximizing economic growth. Frankly, that’s a liiiiitle scary. AIs that will lie to achieve their goals is not what we want to see. While Apollo notes the risk with this particular model is low, this underscores how critical Superalignment efforts are, ensuring that the goals and desires of AI ultimately align with our own under a human-aligned ethical framework.

Keeping Up with the Musks
Meta lines up its own 100,000+ H100 GPU Cluster

What it is: Meta Platforms is the latest AI developer to enter the "100,000-GPU club," with reports indicating that the company is constructing a massive H100-based cluster in the U.S. to train the upcoming Llama 4 model. The news comes roughly one week after Elon Musk's announcement of xAI’s own 100,000-GPU cluster, dubbed "Colossus," (as covered in last week’s Geekly) as AI developers ramp up the size of their training infrastructure to build bigger and more powerful LLMs, putting theories around AI Scaling Laws to the test.

What it means: Meta's rather substantial investment in AI infrastructure is no surprise, it is an arms race, after all. Such builds are table stakes to remaining competitive in the highly competitive and capex-intense LLM landscape. The pursuit of larger, more powerful AI models necessitates access to significant compute resources, and Meta is clearly willing to invest billions to compete. Despite being reliant on Nvidia for the provision of H100 GPUs, Meta has wisely chosen to switch to Ethernet-based networking for the new cluster (vs. NVDA’s proprietary networking hardware) to reduce its reliance on yet another closed/proprietary Nvidia offering. This could allow them to swap-in different GPUs in the future should promising alternatives arise.

Why it matters: The sheer scale of these AI training clusters raises questions about the sustainability of this trend from the standpoint of access to sites with enough connectivity, space, and energy bandwidth to support datacenters of such scale. Not to mention the significant financial stakes involved in AI development. Investors are pouring billions into the sector, and the pressure to demonstrate tangible returns is mounting. The companies that can successfully navigate this arms race and translate raw compute power into commercially viable AI products are likely to reap the greatest rewards. For its part, Meta has embraced the open-source approach for its models, which serves to democratize access to these impressive tools, while xAI chooses to be closed source but typically makes its models uncensored. Both are valuable in their own right, however we prefer open source and uncensored models, with the burden of safety being placed on the operators of the models. As noted in the OAI commentary above, and by the company itself, uncensored models perform best.

Great, Now Even Documents Have Their Own Podcasts…
Google’s Notebook LM turns ANYTHING into a podcast

What it is: Google Labs is enhancing its AI-powered research tool, NotebookLM, with a new "Audio Overview" feature. This allows users to turn uploaded documents, slides, and other materials into AI-generated podcasts, creating audio discussion summaries of the provided sources that sound exactly like a real podcast (complete with jokes and banter!).

What it means: Google continues to expand the capabilities of NotebookLM, building on its existing features like source summarization, citation generation, and multi-modal support. The introduction of Audio Overview adds another dimension to information consumption, catering to auditory learners and offering a convenient and fun way to engage with complex material on the go.

Why it matters: AI generated podcasts for learning are a novel use of generative AI tools and show one of many creative and hard to predict applications of AI to generate value. As AI tools become more sophisticated and accessible, they have the potential to transform how we learn, process information, and even collaborate.

Enterprise Applicability: NotebookLM's user-centric design and focus on data privacy, ensuring data is not used for training, make it a compelling tool for content for enterprise customers: consider efforts to improve engagement with training materials for company policies, regulatory requirements and safety. The podcast modality has proven a propular one. Using this simple tool could be one way to boost employee interaction with content that can otherwise be a little dry, and for a cost of virtually nil.

Before you go… We have one quick question for you:

If this week's AI Geekly were a stock, would you:

About the Author: Brodie Woods

As CEO of usurper.ai and with over 18 years of capital markets experience as a publishing equities analyst, an investment banker, a CTO, and an AI Strategist leading North American banks and boutiques, I bring a unique perspective to the AI Geekly. This viewpoint is informed by participation in two decades of capital market cycles from the front lines; publication of in-depth research for institutional audiences based on proprietary financial models; execution of hundreds of M&A and financing transactions; leadership roles in planning, implementing, and maintaining of the tech stack for a broker dealer; and, most recently, heading the AI strategy for the Capital Markets division of the eighth-largest commercial bank in North America.