- AI Geekly
- Posts
- AI Geekly: Welcome to 2025
AI Geekly: Welcome to 2025
The Year of Artificial General Intelligence
Welcome back to the AI Geekly, by Brodie Woods, brought to you by usurper.ai. This week’s Geekly provides our 2025 outlook including expectations for the year ahead in the world of AI.
That special time of year
Last week was that special time of year, where we finally stop to take a beat. Where our unyielding society’s never-ending obsession with growth for growth’s sake, higher productivity, and squeezing every last drop of blood from every possible stone finally slows down… just for a week! Certainly, it’s a time to rest, and recharge our batteries.
As a former equity research analyst, this is also special time. One where we take a step back. We ingest and think through another year’s worth of measured results. We evaluate the inputs, the variables (both selected and foisted), the sector, the players, the teams, the tools, the concepts, the methods, the outputs, and more. In so doing, we create a model of the world as it sits presently, but that’s not where we stop. We analyze and compile all of these elements, carefully consider our assumptions and use them to thoughtfully formulate our forecast for the quarters to come.
Easy, right?
That’s all great, except... It’s one thing to try in vain to predict energy prices, industrial activity levels, or widget sales as many analysts covering those sectors do. Making predictions in AI is a biiiit more challenging. The pace, the scale, and the lack of precedents are confounding factors to be sure. It gets murkier as AI improves, particularly when it comes to making predictions around Artificial General Intelligence (AGI) and Artificial Super Intelligence (ASI) —AI that is on-par with humans and AI-that far surpasses human capabilities, respectively. This is because once we have achieved these levels of AI (and assuming these AIs can recursively self-improve) the pace at which AI and technology can improve will begin to accelerate so quickly that we can no longer observe or understand it (a Breakout event). This runaway acceleration of AI advancement and the inevitable convergence or conclusion of the human-technology relationship (fingers crossed it’s the former!) culminates in an event called the Technological Singularity (Singularity for short). The Singularity is what is known as an “Event Horizon” —a boundary beyond which we have no measurable data and cannot reasonably predict— a concept originating from the study of black holes. Just as we cannot know what happens to matter or light once it reaches a black hole’s event horizon, we can’t truly know what will happen when we achieve AGI/ASI/Singularity...
Makes it a little hard to make predictions about then, doesn’t it?
Sure does. But, we can still make some deductions, despite what the theorists say. Borrowing Descartes’ deductive reasoning model and foundational use of logical intuition, let’s see what we can infer. We’ll start with the relatively easier piece: 2025. In the weeks ahead, we’ll share some more of our thoughts on what the impacts of the rapid adoption of advanced AI might be on our society.
2025 Predictions
Here we lay out a few of our predictions for 2025 and their underlying rationale. In a piece that will follow in coming weeks, we build on our 2025 convictions to share our longer-term expectations as relates to AGI, ASI and their profound potential impacts on society. It’s impossible to capture all of this in a short piece —so as always, feel free to reach out to discuss further.
Hallucinations? A mere figment of the imagination…
Source: vectara
Note: model ranking roughly correlates with release date (newer is better)
2025 may be the year we finally put hallucinations (the frustrating tendency for LLMs to confidently spit-out false information) behind us for good. While the incidence of hallucinations has declined over the years (2023’s Google Palm 2 GPT-2 hallucinated at a rate of 12.1% vs. today’s o1 mini at 1.4%), it remains one of the greatest roadblocks to broader adoption of GenAI, particularly in applications where there is zero margin for error (healthcare, finance, and more). Critical decisions and execution simply cannot be trusted to a solution that will arbitrarily produce incorrect outputs (even with mitigation steps the risk must be nil). While promising research improves our understanding of the causes of different types of hallucinations, techniques now incorporated into Test Time Compute (TTC) such as Reflexion and Chain of Thought reasoning have further reduced hallucinations. Given the sums of capital invested in GenAI over the past several years, AI co’s (and certainly their shareholders) are hyper-fixated on real-world, practical applications of AI. We expect the investment focus on eliminating hallucinations will yield novel remediations, which coupled with ongoing enhancements will whittle down remaining hallucinations.
BYOAI - Your Own, Personal AI Assistant
Based on some of the projects being spun-up, in both open and closed-source communities (in fact, we’re working on this ourselves at usurper.ai!), there is a clear understanding that personal data assets can and should be better managed. With data artifacts spread across dozens of platforms, devices, ecosystems and more, individuals’ digital identity is increasingly fragmented. Tremendous inefficiencies are realized through ad-hoc retrieval of information from disparate, unorganized repositories in our lives —this amounts to ~20% of the average day! Wouldn’t it be better to have a smart AI assistant sitting on top of all of that data? Not only could it retrieve files/docs and answer questions as needed, but imbued with agentic capabilities (discussed below) and the ability to act proactively, it could perform actions on the user’s behalf (with human approval). Many practical examples exist. Document retrieval: consolidating relevant documents at tax time. Optimization: automatic and constant price comparison (commonly purchased items, large investments/purchases, etc.). Information Distillation: automatically reads your news sources and summarizes relevant info, smart email filtering that summarizes and prioritizes. We also have some thoughts on multiple modalities here as well (e.g. cross-platform a la 1-800-ChatGPT) to ease the UX. We know that OpenAI is building a dedicated AI device, which may incorporate said capabilities, while Meta (with its AR Ray-Bans and XR glasses) and smaller players like start-up Field Labs (with their Compass AI necklace) work to refine their approach.
Agents that can actually do things - à la Claude Computer Use
To be perfectly honest, we were a little disappointed with Agents in 2024. Plenty of players talked a big game about agents for the better part of the year, but what they managed to deliver was sub-par, with the exception of Anthropic, who introduced Claude 3.5 Sonnet Computer Use capabilities in November. Almost everybody else got this wrong somehow… Salesforce released not one, but two versions of its Agentforce agents (glorified chatbots), Microsoft released and re-re-released probably 5-6 different products that it called Copilot or copilots (offering limited functionality) and then at the very end of the year Google and OpenAI both teased their own flavors of agents —Google’s can ONLY use a browser, not the rest of an OS and OpenAI’s remains limited to MacOS and a handful of apps. We don’t really consider those to be Agents. As open-source agents paralleling the cross-platform capabilities of Claude 3.5 Sonnet Computer Use emerge, we expect Anthropic to reduce pricing for its agent tool, leading to wider adoption on both fronts. We presume Google will follow suit and untether its agent from the confines of the Chrome browser especially if the DOJ forces them to spin it off! (sidebar: yes, because divesting of its open-source internet browser will break the back of its search/ad monopoly… very clever DOJ, bravo). So yes. AI agents. Using your computer. For you. Ideally built into the BYOAI mentioned above.
AI will start pitching-in on the AI research
We’ve previously covered a few stories of AI-driven research —that is to say, research actually performed by AI. Examples included Google’s DeepMind lab using a deep learning model to discover millions of new materials and Japan’s Sakana AI with its AI Scientist (to say nothing for the AlphaFolds of the world). While we expect more companies to experiment with AI-run research and development this year, we believe players who have been dedicated to this path already for some time will begin to realize a return on their early efforts and ongoing development (companies like Google and OpenAI) with tangible improvements thanks in large part to AI-driven innovation. Specifically, we expect AI-driven AI-research to become more widespread —recursive self-improvement but with some steps and humans in the middle slowing it down (for now). If you feel like you have difficulty keeping-up with the pace of advancement in AI today (and you should, because we sure do!), remember that the current frenetic and rapidly intensifying speed of development is still being driven ~100% by humans. As AI researchers point their increasingly sophisticated models’ capabilities inward and AI begins to develop new techniques and tools to further improve AI/itself, and cut-out inefficient steps that slow iteration, the contribution of AI itself will spin the flywheel of innovation ever faster.
Nvidia’s Consumer GPUs will remain in high demand
Coocoo for Consumer: Consumer GPUs form a core backbone of the Open Source and Local AI communities (aka the GPU Poor). Lots of press is given to Nvidia’s datacenter H100s, B100s, B200s, etc., but the consumer offerings, built using the exact same components (the product of the “silicon lottery” binning process), albeit with limitations, offer compelling capabilities. They enable local users to work with advanced AI models at home / on premises at a fraction of the cost of Nvidia’s premier datacenter and professional tier offerings.
Best in breed GPU specs: Later this month, Nvidia is expected to roll-out its hotly anticipated consumer-focused RTX 5000 series, based on the same Blackwell TSMC 4NP (5nm) process as its state-of-the-art B100 and B200s (though some lower-tier cards may still use predecessor Hopper chips). Nvidia’s current flagship consumer card is the RTX 4090 (released in 2022), clocking an impressive 1,321 AI TOPS from its 16,384 CUDA Cores and 512 Tensor Cores along with 24GB of GDDR6X VRAM on a 384-bit memory bus providing 1.0TB/s of memory bandwidth. From an AI perspective, we care about each of these elements.
🐎Total Operations Per Second (TOPS): is a loose measurement of potential peak AI inferencing performance —think of it like the “horsepower” of a card.
🏎️Cores: CUDA cores (dedicated to parallel computing) and Tensor cores both drive TOPS —think of the core count like the size of the engine (there’s no replacement for displacement! bigger/more = better!).
📦GB of Video Random Access Memory (VRAM): For models to be performant (i.e. fast enough to be of use) they are typically loaded into a GPU’s VRAM. As we discuss below, by keeping everything on the GPU and within the VRAM envelope of the GPU (or on high-speed inter-GPU connections) models become usable. GPUs can be run in parallel to effectively scale VRAM. So, for example, multiple RTX 4090s with 24 GB VRAM can be combined to create a server with 48, 72, 96 GB, etc. of VRAM which can run larger (and more powerful) AI models.
🛣️Memory bandwidth: is a critical metric for AI performance —AI models benefit from lower latency (higher speed) data transfer speeds for both training and inference. The further (or slower) data has to travel between compute cores (in terms of architecture) and storage, the slower the overall AI workload becomes (weakest link). —Ideally, all requisite components would reside on a single piece of silicone (compute cores, storage, and memory), virtually eliminating transfer inefficiency, but that’s not quite how they are built yet— A bigger memory bus and higher memory bandwidth deliver faster training and inference. For its datacenter and professional tier cards (as well as RTX 3000 and prior series) Nvidia offers NVLink, a direct GPU-GPU connection that bypasses the motherboard (it’s like an HOV lane instead of a single lane dirt road via the motherboard) offering speeds well above event the highest motherboard PCI-E specification. With the RTX 4000 series, Nvidia’s Jensen jettisoned NVLink, likely due to the risk of 4090s cannibalizing its datacenter and professional tier sales (which they certainly would). The lack of NVLink is more impactful on the training side (due to scale).
RTX 5090 Consumer GPU Chad: Now that you’re armed with a pretty good understanding of what matters re:GPUs in AI let’s talk about the RTX 5000s. Because of what we mentioned above re: VRAM, we really only care about the top-end cards, the RTX 5090 and/or RTX 5090 Titan/RTX 5090 Ti/RTX AI Titan (or however they decide to brand it). The top-tier card is expected to be announced at CES 2024 (in a couple of days) and should provide a considerable improvement in AI workloads with its 21,760 CUDA cores (+33%) and 680 Tensor cores (+33%), supporting its 32GB (+33%) of GDDR7 memory on a 512-bit bus (+33%) with 1.8TB/sec memory bandwidth(+80%). It also includes other features like FP4 support and other quality of life enhancements we won’t dig into right now.
We’ll take whatever you’ve got: Nvidia’s RTX 3090 (Ti) cards remain in high demand (partly due to the inclusion of NVLink) as do its RTX 4090s, which have been unable to stay on shelves since launch in 2022 (in part due to crypto mining at the time) as scalpers buy up stock and re-list a la Ticketmaster/StubHub. With the launch of the RTX 5090, we actually expect demand to remain constant for the prior series cards, with 5090s likely to be in short supply and priced at a premium.
There’s no competition: AMD and Intel, in dogged refusal to appease consumers continue to offer cards with limited VRAM, kneecapping their ability to compete in the space. We expect both are hyper-focused on the datacenter segment and therefore have left their consumer offerings to languish. Should AMD ever decide to put out a 40+ GB VRAM card at a reasonable price, we would expect a groundswell of consumer support. Plentiful cheap high VRAM GPUs would certainly inspire consumers and even some professionals to consider foregoing the convenience of Nvidia’s proprietary (and vendor-locked) CUDA software and “roll their own” ROCm or ONNX solutions. We would encourage those interested in local AI applications to purchase a consumer GPU to experiment with in 2025 with at least 24 GB of VRAM (1x RTX 3090 on a budget up to 2 × RTX 5090 on the high end).
Natural Language to 3D-Print
We did a fair amount of work with 3D printing and GenAI in 2024. Including through the use of LLaMA-Mesh (covered in a prior Geekly) and several bespoke LLM-driven workflows. The current state of this application is nascent, but there is potential. All of the components exist for a motivated entity (open or closed source) to more specifically focus on training dedicated models specifically for Natural Language to 3D-Print use cases. While non-trivial, training such a model is not only doable, but inevitable (and probably fairly cheap). Given AI’s existentially dire need for more real world, needle moving use cases, we expect such a capability to be more fully developed this year. If it isn’t, perhaps we’ll just do it ourselves!
Crypto + AI: There’s something there, but crypto is messy
We don’t spend a lot of time on crypto at the AI Geekly. That’s not because of any type of aversion. Indeed, we have been active in the space since 2013. Our quick views are as follows:
🪙Bitcoin: has value as a philosophical representation of pure scarcity. With a hard-coded limit on total supply, it is the only truly finite resource that we cannot simply produce more of (we can mine more gold, we can issue more shares of AAPL, but we will never produce more than 21 mm BTC). It is limited by the current crypto ecosystem (which rather than reducing costs relative to TradFi actually increases them!) and technological limitations that prevent it from scaling to the frequency needed to truly act as a backbone of global payments (off-chain solution exist, but the fact remains).
💎Ethereum: Other than the fact that they are both cryptocurrencies that use a blockchain, Ethereum and Bitcoin are fundamentally different. Ethereum feels less like a store of value and more like an enabler of value —for one, there is no upper limit on the number of ETH that will be issued, and I’ll explain why. Ethereum’s value is derived from its status as the most broadly accepted ecosystem for smart contracts and tokens. Contracts-as-code via ETH’s Solidity programming language makes a ton of sense and in the right circumstances can dramatically reduce risk. The possibilities when combining AI with contracts-as-code is dizzying.
🤡Everything else: Certainly, there will be other applications that make sense in a crypto context —compute-tied crypto that provides credits for training/compute/inference/generations/outputs, etc. could make sense either in the context of hyperscalers in the datacenter world, or as a way to manage unlimited demand for a popular AI tool (or AGI) with limited supply. Beyond this, there are certain to be more rug-pulls, meme coins and other general crypto degenerate behavior and nonsense —irrelevant in this context, but worth mentioning the ugly parts.
And that’s really the extent to which we care about crypto within this context. The other way we can view it is as the tools/language for both money and contracts for AIs. The slow, traditional world lumbers along with paper contracts, escrow companies, market makers, dollars and physical money. For AI it will be instant, digital, programmatical-secure escrow, AI-market-making and digital currency.
The Software Engineers are Coming
Over the past three years U.S. tech companies have laid off over 400,000 employees. While a subset have filled open roles at other firms, retired or change fields, it is clear based on the simple data that a healthy number, if not the majority of these 400,000 remain as free agents. While on the one hand this doesn’t really seem sustainable from an economic perspective, what we’re more interested in (in this context) is what happens when you take~300,000 unemployed, bored, tech workers with a pile of cash from working at FAANG companies, and give them the greatest AI software development tools ever made (Cursor, Replit, Windsurf, Dev0, etc.) ? Absolutely some very narrow deep-tech plays that solve very narrow niche cases… But also: perhaps you get a whole bunch of highly motivated, intelligent, talented individuals who begin to produce high-quality, disruptive tools and solutions at scale —a groundswell of innovation. Let’s come back these folks later when we talk about AGI
Closed AI is going to get more expensive and more Elysium
We’ve been a little spoiled over the decades with the free (just kidding, you’re the product) model deployed by Silicon Valley. With massive profits from other businesses like ads (or just a pile of VC cash), they’ve been able to offer customers highly desirable products at a loss. With the advent of GenAI, this tradition has continued and we’ve had the luxury of toying with billion-dollar models for free. That’s starting to change. We saw it late last year with the introduction of ChatGPT Pro for $200/month, placing a velvet rope around OpenAI’s most performant o1-pro model. The increase makes sense: the o family of reasoning models benefit from additional Test Time Compute, which has real dollar costs every time a query is submitted. Not only are the costs higher, but OpenAI and its peers aren’t in this for love of the game (just look at OAI’s plans to convert to a public benefit corporation), they’re here to make money. They need to start showing investors they can. That’s part of the reason for the release of the progress with the o3 reasoning model. Certainly, it is impressive and the implications to our AGI-ASI timeline are meaningful. The fact that much of the focus of the discussion centered around the cost (>$1mm to run the ARC-AGI benchmark) and that the company has carefully shared stories with the press about a $2,000/month option for its top model belies their longer-term intentions and messaging: AGI (Closed). Ain’t. Free.
Robots improving at breakneck pace
There’s going to be a lot more rapid advancement in robotics this year as the trend of infusing LLM-powered reasoning into human-shaped robots remains hot hot hot! Surely the Figure01s and Optimii (I think that’s how you pluralize Optimus) will continue to move forward, as will the fast-follow 80/20 versions that are developed in China (80% of the functionality, 20% of the cost). Robotics is one of the key ingredients to impactful AGI. Just as LLMs must leave the chat box and become agents to take digital actions on our behalf to be useful, so too must AI leave the confines of the digital word, becoming a physical agent, actions manifest in the real world. We invite readers to think about what will happen to the cost of labor over time as robots become better and cheaper.
Base AGI 2025
This is the big one. Foundational AGI in 2025. Wow. Previously we had it pegged at 2027, but that’s what an accelerating pace of development and investment will do (see our coverage of OpenAI’s o3 model). There’s so much more to talk about here when it comes to things like model architectures, whether transformer-based models can be said to be “reasoning”, and broader epistemology.
Quickly as a reminder, when we discuss AGI in this context, we’re roughly talking about an AI that is about as smart as a competent human or colleague (we said competent not average, so actually smart) and can generalize well. It’s not a perfect definition because most of the things that AI already does that humans can do, AI does better, and it’s a little subjective. But maybe a better way to judge it isn’t by listing off capabilities like a new smartphone, but by looking at what types of profound societal impact it would need to have (or be capable of) to constitute AGI. In this context we’ll say: by the end of 2025, we expect there to be a closed AGI capable of doing the work of any white-collar worker (or let’s say 90%). Open Source AGI will follow shortly thereafter.
How will we get there? Same way we got here. Creativity, human ingenuity and collaboration. Even though closed source companies try not to keep their advances under wraps, they are still reverse engineered. OpenAI’s work kickstarted the accelerated rise of LLMs in the Open Source community and its recent work to enhance the reasoning of its o-family models using TTC was rapidly replicated within by the OS community in weeks! This is the state of the AI space: the smartest minds on the planet, with the most powerful hardware ever made, using more money than has ever been invested at such a pace, are all essentially working together to solve the same problem, and they’re getting better and better at it.
Before you go… We have one quick question for you:
If this week's AI Geekly were a stock, would you: |
About the Author: Brodie Woods
As CEO of usurper.ai and with over 18 years of capital markets experience as a publishing equities analyst, an investment banker, a CTO, and an AI Strategist leading North American banks and boutiques, I bring a unique perspective to the AI Geekly. This viewpoint is informed by participation in two decades of capital market cycles from the front lines; publication of in-depth research for institutional audiences based on proprietary financial models; execution of hundreds of M&A and financing transactions; leadership roles in planning, implementing, and maintaining of the tech stack for a broker dealer; and, most recently, heading the AI strategy for the Capital Markets division of the eighth-largest commercial bank in North America.