How Claude's Design Agents Work
In this video, I look at how Claude's Design Agents system actually works and some of the key components that are in it that you can use for making your own vertical agent apps.
In this video, I look at how Claude's Design Agents system actually works and some of the key components that are in it that you can use for making your own vertical agent apps.
Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google’s take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for Gemma. Google says these experimental models leverage a form of speculative decoding to take a guess at future tokens, which can speed up generation compared to the way models generate tokens on their own.
On Tuesday, OpenAI released a new foundation model called GPT-5.5 Instant, which will replace GPT-5.3 Instant as the default ChatGPT model. The company said the model reduces hallucination in sensitive areas such as law, medicine, and finance, while maintaining the low latency of its predecessor.
OpenAI just open-sourced Symphony, their internal orchestration spec for scaling autonomous coding agents, and it highlights one of the biggest shifts happening in AI engineering right now. As coding agents become more capable, humans become the bottleneck, and the real work moves from writing code to building the scaffolding around the agents.
In this video, I break down the mental models behind agent harness engineering and show you how to think about building reliable autonomous systems at scale. Whether you're trying to scale Claude Code beyond a few chat sessions, or designing orchestration into your own AI powered apps, these frameworks will help you architect systems that actually work in production.
One of the key challenges of building effective AI agents is teaching them to choose between using external tools or relying on their internal knowledge. But large language models are often trained to blindly invoke tools, which causes latency bottlenecks, unnecessary API costs, and degraded reasoning caused by environmental noise.
To overcome this challenge, researchers at Alibaba introduced Hierarchical Decoupled Policy Optimization (HDPO), a reinforcement learning framework that trains agents to balance both execution efficiency and task accuracy.
Metis, a multimodal model they trained using this framework, reduces redundant tool invocations from 98% to just 2% while establishing new state-of-the-art reasoning accuracy across key industry benchmarks.
Andrej Karpathy (co-founder of OpenAI, former head of AI at Tesla, and now founder of Eureka Labs) talks with Sequoia partner Stephanie Zhan at AI Ascent 2026 about what's changed in the year since he coined "vibe coding." He explains why he's never felt more behind as a programmer, why agentic engineering is the more serious discipline taking shape on top of vibe coding, and why we should think of LLMs not as animals but as ghosts: jagged, statistical, summoned entities that require a new kind of taste and judgment to direct. He also touches on Software 3.0, the limits of verifiability, and why you can outsource your thinking but never your understanding.
Mistral AI, the Paris-based artificial intelligence company valued at €11.7 billion ($13.8 billion), today released Workflows in public preview — a production-grade orchestration layer designed to move enterprise AI systems out of proofs of concept and into the business processes that generate revenue.
The product, which launches as part of Mistral's Studio platform, is the company's clearest articulation yet of a thesis that is quietly reshaping the enterprise AI market: that the bottleneck for organizations adopting AI is no longer the model itself, but the infrastructure required to run it reliably at scale.