5.12.2026

How Sakana trained a 7B model to orchestrate GPT, Claude
and Gemini LLMs

Researchers at Sakana AI have introduced the "RL Conductor," a small language model trained via reinforcement learning to automatically orchestrate a diverse pool of worker LLMs. Conductor dynamically analyzes inputs, distributes labor among workers, and coordinates among agents.

This automated coordination achieves state-of-the-art results on difficult reasoning and coding benchmarks, outperforming individual frontier models like GPT-5 and Claude Sonnet 4 as well as expensive human-designed multi-agent pipelines. It achieves this performance at a fraction of the cost and with fewer API calls than competitors. RL Conductor is the backbone of Fugu, Sakana AI’s commercial multi-agent orchestration service.

5.11.2026

Hermes Agent NEW Desktop App - The 24/7 Self-Evolving
AI Agent!

Hermes Agent is one of the most advanced open-source AI agents right now, and in this video I showcase the brand new Hermes Desktop App that makes running persistent autonomous AI agents dramatically easier.



5.08.2026

How Claude's Design Agents Work

In this video, I look at how Claude's Design Agents system actually works and some of the key components that are in it that you can use for making your own vertical agent apps. 



5.07.2026

Google’s Gemma 4 AI models get 3x speed boost by
predicting future tokens

Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google’s take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for Gemma. Google says these experimental models leverage a form of speculative decoding to take a guess at future tokens, which can speed up generation compared to the way models generate tokens on their own.

5.06.2026

OpenAI releases GPT-5.5 Instant, a new default model
for ChatGPT

On Tuesday, OpenAI released a new foundation model called GPT-5.5 Instant, which will replace GPT-5.3 Instant as the default ChatGPT model. The company said the model reduces hallucination in sensitive areas such as law, medicine, and finance, while maintaining the low latency of its predecessor.

5.05.2026

OpenAI Just Showed Us What Comes After the Harness. Here's The Layer Almost Everyone's Missing.

OpenAI just open-sourced Symphony, their internal orchestration spec for scaling autonomous coding agents, and it highlights one of the biggest shifts happening in AI engineering right now. As coding agents become more capable, humans become the bottleneck, and the real work moves from writing code to building the scaffolding around the agents.

In this video, I break down the mental models behind agent harness engineering and show you how to think about building reliable autonomous systems at scale. Whether you're trying to scale Claude Code beyond a few chat sessions, or designing orchestration into your own AI powered apps, these frameworks will help you architect systems that actually work in production.



5.04.2026

Alibaba's Metis agent cuts redundant AI tool calls from 98% to 2% — and gets more accurate doing it

One of the key challenges of building effective AI agents is teaching them to choose between using external tools or relying on their internal knowledge. But large language models are often trained to blindly invoke tools, which causes latency bottlenecks, unnecessary API costs, and degraded reasoning caused by environmental noise. 

To overcome this challenge, researchers at Alibaba introduced Hierarchical Decoupled Policy Optimization (HDPO), a reinforcement learning framework that trains agents to balance both execution efficiency and task accuracy. 

Metis, a multimodal model they trained using this framework, reduces redundant tool invocations from 98% to just 2% while establishing new state-of-the-art reasoning accuracy across key industry benchmarks.