AI Geekly: Summer Heat

AI development and competition heat up

Welcome back to the AI Geekly, by Brodie Woods, brought to you by usurper.ai. This week we bring you yet another week of fast-paced AI developments packaged neatly in a 5 minute(ish) read.

TL;DR Google’s two front war; Cosine’s new model is no tangent; Sakana takes AI-authored research papers a step further

This week we have some interesting updates as Google continues to play catch-up to OpenAI and Apple, with a slew of new AI announcements for its Pixel devices and Android OS (contemporaneous with several anti-trust findings and suits from the DOJ and the FTC). We would argue that even with the positive announcements, this wasn’t the company’s best week. Its new offerings failed to impress, and we were left wondering whether they have any innovative ideas or whether half-baked versions of recently announced features from Apple and OpenAI are all they can muster. The world of coding AI/agents just got quite a bit more interesting as the folks at Cosine released a hot new software engineering model that outperforms anything we’ve seen to date. Teams are about to get more efficient in a way that eclipses GitHub Copilot. Finally, not satisfied with the current dizzying pace of scientific advancement in the 21st century, Japan’s Sakana AI has just taken the wraps off of a new fully autonomous AI scientific research agent. The agent offers potential in both acceleration and reduction in cost of scientific research.

Gemini's Voice: More Hype Than Help?
Google tries to get it right, but the assistant still stumbles

What it is: Google unveiled its latest AI-powered voice assistant and a suite of new Pixel devices at its recent Made by Google event, showcasing capabilities like handling interruptions, maintaining context over long conversations, and integrating with various Google apps. The company emphasized live demos to address past criticisms and highlighted features like on-device processing for sensitive tasks. This was Google’s latest attempt to shake free of some of its public failures in the AI space, clearly taking aim at both Apple’s Apple Intelligence iPhone integration and OpenAI’s voice model. A valiant effort, but while it comes close to matching its peers, it failed to surpass them.

What it means: While Google is pushing hard to position itself at the forefront of AI assistants, the reality might not live up to the hype. Despite the flashy presentation, the new Gemini-powered assistant still faces significant usability issues. Users of Gemini integrated into their android devices report problems with basic tasks like navigation, list management, and overall reliability – issues that weren't apparent in the carefully orchestrated demos. While Google may have beaten competitors like Apple and OpenAI to market with certain features, the actual user experience seems to fall short of expectations. This disconnect raises questions about the current state of AI assistants and whether they're truly ready for prime time.

That wasn’t supposed to happen: During a live demo the presenter experienced hiccups, where Gemini failed twice to check a calendar before finally succeeding, serving as a perfect metaphor for the current state of AI assistants. They're impressive when they work, but still prone to frustrating failures.

Genie Out of the Bottle
Cosine's AI assistant outperforms the competition

What it is: Cosine, a startup you've most likely never heard of, has released Genie, an AI coding agent that's giving the big players a run for their money. In a head-to-head showdown, Genie outperformed Cognition Labs' much-hyped Devin on key software engineering tasks, scoring 30% on the SWE-Bench (SoftWare Engineering Benchmark) compared to Devin’s paltry 13.8%.

What it means: The AI coding agent space is heating up faster than an overclocked GPU… While Cognition attracted much media attention with Devin's launch, Cosine has quietly been refining their own offering. Genie's secret sauce, according to its creators: it’s trained to “think” like a software engineer. Specifically, it was trained by watching the workflows of real software engineers (as opposed to simply being trained on high volumes of code like other models such as GitHub Copilot). It even behaves in the ways that human coders do: asking clarifying questions, responding to comments on pull requests, and messaging team members in Slack.

Why it matters: Tools like Genie are of significant value in boosting the productivity of software engineers. They also allow less technical users to partner with an AI agent to punch above their weight, using natural language to produce code that would have been beyond their reach just a week ago. These are powerful enablers that reduce barriers and accelerate development —remember: this is the worst this AI will be; it’s only going to improve. Where will technology be a year from today?

Human in the loop: While Genie's performance is impressive, let's not forget that the real magic happens when human creativity meets AI agentic workflows. As these tools evolve, the key will be finding the sweet spot where AI amplifies human ingenuity.

AI Gets Its PhD
Sakana's AI Scientist makes the grade with automated AI research

What it is: Sakana AI, a Japanese startup founded by ex-Google researchers, has released "The AI Scientist" - an AI system that can autonomously conduct end-to-end scientific research (we’ve seen similar AIs produced by Google DeepMind). This AI-driven researcher can generate ideas, design and run experiments, analyze results, write papers, and even review its own work. It's like a tireless digital grad student that doesn't need coffee breaks or funding.

What it means: The real promise of Sakana’s AI Scientist is the potential to democratize some types of scientific research. It can iterate on its own ideas, building on previous work in an "open-ended" fashion that mimics the human scientific community. At a cost of around $15 per paper, it offers a highly cost-effective alternative to traditional research methods. The AI Scientist could allow smaller institutions and researchers with limited resources to contribute to cutting-edge science.

Why it matters: While the AI Scientist is impressive, it's not without its quirks. It can't process visual data, so it struggles with visual elements of papers, and it sometimes makes critical errors in evaluating results. Despite current, transitory limitations, tools like this could materially accelerate scientific development in some areas. Certainly, we will need peer-review AI tools in the not-too-distant future: the AI Scientist (and its ilk)’s ability to produce research papers at scale could flood the scientific community with low-to-medium-quality AI-generated content, potentially complicating peer review processes and the validation of scientific findings if review remains purely manual.

Before you go… We have one quick question for you:

If this week's AI Geekly were a stock, would you:

Login or Subscribe to participate in polls.

About the Author: Brodie Woods

As CEO of usurper.ai and with over 18 years of capital markets experience as a publishing equities analyst, an investment banker, a CTO, and an AI Strategist leading North American banks and boutiques, I bring a unique perspective to the AI Geekly. This viewpoint is informed by participation in two decades of capital market cycles from the front lines; publication of in-depth research for institutional audiences based on proprietary financial models; execution of hundreds of M&A and financing transactions; leadership roles in planning, implementing, and maintaining of the tech stack for a broker dealer; and, most recently, heading the AI strategy for the Capital Markets division of the eighth-largest commercial bank in North America.