AI Blog: June 2026

6.30.2026

Meituan open sources LongCat-2.0, the 1.6T, near-frontier agentic coding model that's been leading OpenRouter

Chinese delivery app company Meituan officially unveiled LongCat-2.0 on GitHub, Hugging Face, and its native platform, unmasking the model as the computational engine behind "Owl Alpha," the anonymous stealth model that has spent the last two months commanding global developer charts on OpenRouter.

Developed to fundamentally disrupt closed-source enterprise dominance in autonomous software engineering, the 1.6-trillion-parameter Mixture-of-Experts (MoE) system brings a native 1-million-token context window to the public domain under a highly permissive, enterprise grade, commercially viable MIT license.

6.29.2026

Introducing Ornith 1.0 - Agentic Coding LLMs

Sam Witteveen explores this new family of self-scaffolding models designed to generate their own task-specific harnesses alongside solutions. By utilizing a two-stage reinforcement learning process, these models aim to optimize both the coding environment and agentic trajectories, offering a versatile approach for handling complex local coding tasks without requiring human-authored scaffolds.

6.26.2026

Liquid AI's smallest model yet LFM2.5-230M beats models 4X its size at data extraction, can run 'anywhere'

Liquid AI, founded by former MIT computer scientists, today released its smallest AI language model yet, LFM2.5-230M, and enterprises would do well to consider it for their uses in data extraction and local deployment on smartphones, laptops and robotics.

This is a 230-million-parameter foundation model explicitly designed for on-device agentic workflows, and as Liquid states in its release blog post, that small size makes it possible to run nearly "anywhere." According to Liquid, it also outperforms models more than 4X its size on selected benchmarks, specifically doing better at data extraction than the 800 million parameter count Alibaba Qwen3.5-0.8B (Instruct) and 1-billion parameter Google Gemma 3 1B.

6.25.2026

Mistral launches OCR 4, turning document extraction into
a full enterprise AI play

Mistral AI on Tuesday released OCR 4, a document intelligence model that moves beyond raw text extraction to return structured representations of entire documents — complete with bounding boxes, block-type classification, and per-word confidence scores. The release marks Mistral's fourth generation of optical character recognition technology in roughly 15 months and lands at a moment when the company's pitch for European AI sovereignty has never been more commercially relevant.

6.24.2026

VibeThinker 3B - Taking on Giant Models

In this video, I look at VibeThinker 3b and how it is beating some models that are 300x its size on certain benchmarks by improving its reasoning and chain of thought to be better for specific use cases. While the model is not for production it shows what could be done with these techniques.

6.23.2026

Researchers introduce Self-Harness, a framework that lets AI agents rewrite their own rules, boosting performance up to 60%

Researchers at the Shanghai Artificial Intelligence Laboratory have introduced “Self-Harness,” a new paradigm in which an LLM-based agent systematically improves its own operating rules. By examining its own execution traces to apply edits, the system trades manual guesswork for empirical evidence.

Self-improving harnesses can enable development teams to deploy robust custom agents that continually adapt their own execution protocols to overcome model-specific weaknesses.

6.22.2026

Anthropic ships major Claude Design overhaul with design system imports, code round-trips, and a fix for its token-burning problem

Anthropic is shipping a substantially overhauled version of Claude Design that attempts to fix the extreme token consumption issue while simultaneously repositioning the product from a flashy demo into something far more strategically important: a design system compliance layer that connects to code, connects to the tools enterprises already use, and — critically — keeps everything on brand.

6.18.2026

Kimi K2.7 Code: BEST Open Source Model? REALLY Cheap and
Beats Opus 4.8 and GPT 5.5?

Kimi K2.7 Code might be one of the most impressive open-weight coding models released so far. In this video, we fully test Moonshot AI’s latest coding-focused model and see how it compares against Opus 4.8 Max, GPT-5.5, Fable 5, Qwen 3.7 Max, Grok 4.3, and other top coding models.

6.17.2026

Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding benchmarks for 1/6th the cost

Today, Chinese AI startup Z.ai (formerly Zhipu AI) announced the immediate release of GLM-5.2, a 753-billion parameter open-weights large language model (LLM) engineered specifically to dominate "long-horizon" autonomous coding and engineering tasks.

Available immediately on Hugging Face, the Z.ai API, and more than 20 third-party coding environments, the model boasts a highly stable 1-million-token context window alongside enterprise subscription tiers starting at just $12.60 per month.

6.16.2026

WWDC26: Run local agentic AI on the Mac using MLX

Run AI agents locally with privacy, low latency, and offline access. Dive into how MLX advancements and Mac hardware make powerful agentic workflows possible entirely on-device. You’ll explore code agents such as OpenCode, see how they integrate into Xcode, learn techniques for multi-Mac scaling, and discover how to integrate tools seamlessly — without ever leaving your machine.

6.15.2026

Xiaomi's new open source, agentic AI coding harness MiMo Code beats Claude Code at ultra-long, 200+ step tasks

Xiaomi's MiMo AI team has open-sourced MiMo Code V0.1.0, a terminal-native AI coding assistant that the Chinese electronics giant says outperforms Anthropic's Claude Code on key agentic coding benchmarks, especially on long-horizon, multi-step tasks (200+ steps) — at least, according to its own internal beta release and survey of 576 developers.

6.12.2026

Google's DiffusionGemma generates 256 tokens in parallel and self-corrects as it goes

Google's DiffusionGemma, released this week, is an open source experimental model that applies diffusion to text generation at production scale. Built on the Gemma 4 backbone and released under the Apache 2.0 license, it is the first diffusion language model natively supported in the open source vLLM inference platform. It generates a 256-token block in parallel rather than sequentially, with every token position attending to every other. Google says DiffusionGemma generates text up to 4x faster than standard models on GPUs.

6.11.2026

On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.

On-device AI models have stayed small because the entire weight set has to live in DRAM, capping practical parameter counts well below what server-side deployments use. Enterprise architects evaluating agentic workloads have had to choose between capable cloud-dependent models and limited on-device ones. Apple's third-generation foundation models, announced at WWDC26, break that constraint by moving the weight set off DRAM entirely.

6.10.2026

Anthropic brings Mythos to the masses with Claude Fable 5,
its most powerful generally available model ever

Anthropic today launched two new AI models — Claude Fable 5 and Claude Mythos 5 — marking the company’s first broad release of the powerful “Mythos-class” AI capabilities it previously made available only to participating organizations in its restricted cybersecurity program, Project Glasswing, which it announced two months ago.

The company says Fable 5, which is the version most users and developers will get starting today, exceeds every Claude model it has previously made generally available — featuring stronger performance across software engineering, knowledge work, vision, scientific research and long-running tasks.

6.08.2026

How Claude Code’s lead designer builds with AI

During Dive Club Live in NYC we got to hear from Claude Code’s lead designer, Meaghan Choi. She shared a demo of how the team at Anthropic uses Claude Code and there are a lot of practical takeaways.

6.05.2026

Microsoft launches MXC, an OS-level sandbox for AI agents,
with OpenAI and Nvidia already on board

For the past two years, the technology industry has raced to make AI agents more capable — teaching them to write code, navigate software interfaces, manage files, and orchestrate multi-step workflows with increasing autonomy. What the industry has not done, at least not with any consistency, is answer the question that keeps chief information security officers awake at night: what happens when an agent goes wrong?

On Tuesday at its annual Build developer conference, Microsoft offered what may become the definitive answer. The company introduced Microsoft Execution Containers, or MXC — a policy-driven execution layer, built into the Windows operating system itself, that lets developers and IT administrators declare exactly what an AI agent can and cannot access, with those boundaries enforced at runtime by the OS kernel.

6.04.2026

Google's new open source Gemma 4 12B analyzes audio, video —
and runs entirely locally on a typical 16GB enterprise laptop

While many AI open source model providers are pursuing larger and more powerful models, Google is still giving attention to the smaller, more local side of the market. Today, the tech giant released Gemma 4 12B, an 11.95-billion-parameter open-weights model with permissive Apache 2.0 license optimized to execute locally on a standard enterprise laptop using just 16GB of VRAM or unified memory.

That means those enterprise users looking to keep working with AI while on a flight without WiFi, or trying to keep it offline for security reasons, can now do so far more easily and at far less cost (free to download and operate).

6.03.2026

Perplexity AI unveils hybrid local-cloud inference system
at Computex 2026

Perplexity AI unveiled what it calls the first hybrid local-server inference orchestrator at Computex 2026 on Monday night, demonstrating software that autonomously decides — in real time and mid-task — which AI workloads stay on a user's device and which get routed to frontier models in the cloud.

6.02.2026

MiniMax M3 IS INSANE! BEST Opensource AI Model!

In this video, I fully test MiniMax M3, the new open-weight frontier model from MiniMax that combines coding, agentic reasoning, multimodal understanding, and long-context capabilities into one model. M3 supports up to a 1 million token context window, is natively multimodal from day one, and delivers some seriously impressive benchmark results across SWE-Bench Pro, BrowseComp, SVG-Bench, KernelBench Hard, OSWorld Verified, and more.

What makes this release even more insane is the pricing. MiniMax M3 is not only competing with models like Opus 4.7 and GPT-5.5, but in several benchmarks it actually beats them while being dramatically cheaper. MiniMax is also offering huge token plans, aggressive API pricing, and open-weight access, making this one of the most accessible frontier-level models available right now.

6.01.2026

Running Local AI on AMD

In this video, we look at running local AI work jobs for LLMs, images and video models, but running it on an AMD GPUs and processors.