AI Geekly
Posts
AI Geekly - It's About Damn Time

AI Geekly - It's About Damn Time

Quantifying the impact of GenAI

Brodie Woods
March 11, 2024

It’s About Damn Time

Welcome back to the AI Geekly, by Brodie Woods. We’re back in the saddle in NYC. Great meet-up event on Wednesday with our friends from Developer Capital (shout-out to Max Mccrea and Jordan Steiner for putting-on a great event with compelling subject matter). Once again this week, we bring you the core developments in the AI space through a critical capital markets lens, neatly encapsulated for your review below.

TL;DR Buy now, pray later; MSFT calls “shotgun!”; Claude III > Rocky III

This week, we take a look at some pretty impressive developments in GenAI. Readers will recall last week we focused on the dearth of quantifiable public datapoints on the positive impact of GenAI on companies’ bottom-lines. From our perspective, none of the developments over the past few years on the GenAI side (traditional ML is already proven value-add) are of material significance unless they can move the needle financially for companies applying them. Klarna, a Swedish “buy now; pay later” player has documented just such an impact, which we will examine. Next, we take a peek at MSFT’s Copilot for Finance, where it looks to deliver on its overpromising to-date with its Microsoft 365 (formerly Office 365) Copilot product (enterprise customers we’ve spoken with have been underwhelmed) by actually making a meaningful impact in business workflows. Lastly, we look at Claude 3, the most advanced model released by Anthropic to date, and currently the top of the unofficial AI leaderboards.

Karma Klarna’s a Bitch
Predatory Swedish lender puts GenAI to work

What it is: Klarna has been using an OpenAI-powered chatbot in live production for over a month. Here are some stats provided by the company:

Klarna’s AI has had > 2.3 mm conversation, which accounts for a full 67% of the company’s total customer service chats.
Relative to human workers, Klarna AI is equivalent to roughly 700 full-time agents in terms of work product, scoring similar customer satisfaction scores.
More accurate in resolution of issues than humans, reducing repeat inquiries by 25% and resolving issues in less than 1/5 the time it takes humans to resolve.
Available 24/7, in 35 languages across 23 markets.

What it means: About two years ago, Klarna laid-off 700 people —the exact number replaced by its AI tool. This may seem like a coincidence, but we think it may have been the beginning of an experiment, the results of which the company has just now published above. The timing lines-up nicely with the advent and public availability of OpenAI’s GPT series of models and the explosion in their popularity via ChatGPT.

But Why?: Socially, laying off 700 people to replace them with AI isn’t a good look, but keep in mind these two things: 1) Klarna is a predatory lender, their business revolves around taking money from people who don’t have it (bank overdraft fees anyone?) so being good corporate citizens isn’t exactly their jam; 2) Klarna management has a fiduciary duty to its investors (including retail investors, pension funds, university endowments, ETFs, 401ks, etc.) to maximize returns. So if you’re focused solely on the financial picture, then this is a success, this is the needle moving, quantifiable return on investment we’ve been looking for.

Why it matters: It means our thesis from last week is correct. We’re in the early innings of the GenAI story. No one has been left behind yet, but these are some of the first indications of companies’ experimentation paying off. Use cases and specific applications of GenAI are one of the most difficult problems management teams face: “how exactly do we use this in our business?”. Clearly Klarna has found application within its niche, we expect others to follow suit, at a minimum replicating the customer service chat modality, but also applying GenAI in novel ways beyond.

Goose or Maverick?
Microsoft’s Copilot for Finance takes flight

What it is: MSFT announced the preview release of Copilot for Finance, another crack at being more useful in day-to-day business workflows. We’ve seen the demo, we think you should click through it too (it’s ~5 mins). Integrated in Excel and Outlook currently, the AI add-on can perform variance analysis (something that players like Thoughtspot have done for eons) and can plumb Outlook and Excel to generate emails supported by user prompt guidance.

How Good is it?: Currently, it seems to be narrowly useful, but like all things AI, it’s important to remember: this is the worst it will ever be. Put another way, everything you see today will. only. get. better! Put another another way: pay attention to anything AI can’t do today, because tomorrow that will be what the AI can do, and very well!

What it means: That’ll be about three-fitty $50/month on top of the current $22-$30/month corporate license (similar pricing to Copilot for Sales) businesses pay. We’ve heard gripes from many a corporate client about the layering of fees for MSFT’s AI offerings, hoping these add-ons would be included in one of the several Microsoft 365 subscriptions they already have. Hint: MSFT are spending ~$10 Bn on OpenAI alone, they’re going to need to make that back somewhere…

Why it matters: Microsoft has bet big on GenAI, one of the biggest bets on the space by far (~$10Bn for OpenAI alone). This bet is far from a sure thing. MSFT will need to convince its customer base across enterprise and personal that its AI ‘juice’ is worth the monthly subscription ‘squeeze’ (BTW these come with annual commitments, no monthly subscription for you!). Once again this comes down to generating quantifiable value for users in their respective value chains. Jury’s still out on this one.

From Clod to Claude
Anthropic’s third iteration heats-up competition

Note: CoT (Chain of Thought): prompts provided to AI in sequence to achieve the desired outcome (vs. a singular prompt)
n-shot: the number of examples provided to the AI to complete a similar task.

What it is: We’ll be honest: we’ve pretty much ignored Anthropic. The origin story (a bunch of ex-OpenAI-ers who felt that the company was too lax on safety tattled to the world and then started their own company) isn’t really a compelling one. But their new model, Claude 3 sure is!

What it means: Our critique of the frequently lambasted Google Gemini model was that it took Google 1.5 years to produce a model that barely surpasses GPT-4 (note: OpenAI hasn’t been sitting on their hands here fellas, GPT-5 is going to make GPT-4 look like a tinker toy, and where does that leave Google et al.?). Claude has done Gemini one better, and with a much smaller team and budget to boot.

Another demo: Seriously, give this a quick watch. Less than four minutes. You’ve got the time. This will demonstrate the capabilities of Claude 3 better than I can explain.

Why it matters: Take a look at the table above. The big improvement in Claude 3 is its reasoning capability. Anthropic staff and practitioners in the field have reported to us that the difference with this model is it “just gets it”. Contrast this with the experience of using today’s most commonly available models via platforms like ChatGPT, Gemini, Bing Copilot, Perplexity.ai and more, where often the AI fails to grasp the substance of what you’re saying, even when clearly laid-out. Claude 3 has a 200k token context window, which not only puts it up against some of the largest in the industry (they have a 1 mm token one under wraps) but contributes to its ability to comprehend user inputs.

Final thought: We’re thrilled to see more models competing with and besting GPT-4, even if late to the party. In the AI space, as in business and the world, competition is healthy, and spurs innovation. As Max Mccrea discussed at our meet-up on Wednesday —let’s try to avoid monopolies here if we can, it will work out better for everyone.

Before you go… We have one quick question for you:

If this week's AI Geekly were a stock, would you:

About the Author: Brodie Woods

With over 18 years of capital markets experience as a publishing equities analyst, an investment banker, a CTO, and an AI Strategist leading North American banks and boutiques, I bring a unique perspective to the AI Geekly. This viewpoint is informed by participation in two decades of capital market cycles from the front lines; publication of in-depth research for institutional audiences based on proprietary financial models; execution of hundreds of M&A and financing transactions; leadership roles in planning, implementing, and maintaining of the tech stack for a broker dealer; and, most recently, heading the AI strategy for the Capital Markets division of the eighth-largest commercial bank in North America.