AI Blog: February 2026

2.27.2026

Perplexity announces “Computer,” an AI agent that assigns work to other AI agents

Perplexity has introduced “Computer,” a new tool that allows users to assign tasks and see them carried out by a system that coordinates multiple agents running various models.

The idea is that the user describes a specific outcome—something like “plan and execute a local digital marketing campaign for my restaurant” or “build me an Android app that helps me do a specific kind of research for my job.” Computer then ideates subtasks and assigns them to multiple agents as needed, running the models Perplexity deems best for those tasks.

2.26.2026

Alibaba's new open source Qwen3.5-Medium models offer Sonnet 4.5 performance on local computers

Alibaba's now famed Qwen AI development team has done it again: a little more than a day ago, they released the Qwen3.5 Medium Model series consisting of four new large language models (LLMs) with support for agentic tool calling, three of which are available for commercial usage by enterprises and indie developers under the standard open source Apache 2.0 license.

But the big twist with the open source models is that they offer comparably high performance on third-party benchmark tests to similarly-sized proprietary models from major U.S. startups like OpenAI or Anthropic, actually beating OpenAI's GPT-5-mini and Anthropic's Claude Sonnet 4.5 — the latter model which was just released five months ago.

And, the Qwen team says it has engineered these models to remain highly accurate even when "quantized," a process that reduces their footprint further by reducing the numbers by which the model's settings are stored from many values to far fewer.

2.25.2026

Mercury 2: The World's Fastest Reasoning Model!

AI just keeps getting wilder! Meet Mercury 2, the first reasoning diffusion LLM — 5x faster than speed-optimized models like Claude 4.5 Haiku & GPT-5.2 Mini.

2.24.2026

Anthropic says DeepSeek, Moonshot, and MiniMax used 24,000 fake accounts to rip off Claude

Anthropic dropped a bombshell on the artificial intelligence industry Monday, publicly accusing three prominent Chinese AI laboratories — DeepSeek, Moonshot AI, and MiniMax — of orchestrating coordinated, industrial-scale campaigns to siphon capabilities from its Claude models using tens of thousands of fraudulent accounts.

2.23.2026

NEW Antigravity AI Studio Release From Google Changes
AI Code Development!

Google just dropped the new Antigravity AI Studio update, and it’s transforming the way we develop AI code! In this video, I break down all the latest features, show real examples of building AI applications faster than ever, and explain how this release could change the AI coding landscape forever.

2.20.2026

Google launches Gemini 3.1 Pro, retaking AI crown with 2X+ reasoning performance boost

Late last year, Google briefly took the crown for most powerful AI model in the world with the launch of Gemini 3 Pro — only to be surpassed within weeks by OpenAI and Anthropic releasing new models, s is common in the fiercely competitive AI race.

Now Google is back to retake the throne with an updated version of that flagship model: Gemini 3.1 Pro, positioned as a smarter baseline for tasks where a simple response is insufficient—targeting science, research, and engineering workflows that demand deep planning and synthesis.

Google Just Solved The Greatest Limitation of AI Agents

Browser agents are broken. Not because the models are bad, but because the entire internet was built for human eyes, not machines. WebMCP changes the approach entirely: instead of making agents better at reading websites, it makes websites better at talking to agents.

2.19.2026

New agent framework matches human-engineered AI systems

Agents built on top of today's models often break with simple changes — a new library, a workflow modification — and require a human engineer to fix it. That's one of the most persistent challenges in deploying AI for the enterprise: creating agents that can adapt to dynamic environments without constant hand-holding. While today's models are powerful, they are largely static.

To address this, researchers at the University of California, Santa Barbara have developed Group-Evolving Agents (GEA), a new framework that enables groups of AI agents to evolve together, sharing experiences and reusing their innovations to autonomously improve over time.

2.18.2026

Anthropic's Sonnet 4.6 matches flagship AI performance at
one-fifth the cost, accelerating enterprise adoption

Anthropic on Tuesday released Claude Sonnet 4.6, a model that amounts to a seismic repricing event for the AI industry. It delivers near-flagship intelligence at mid-tier cost, and it lands squarely in the middle of an unprecedented corporate rush to deploy AI agents and automated coding tools.

2.17.2026

OpenAI deploys Cerebras chips for 'near-instant' code generation in first major move beyond Nvidia

OpenAI on Thursday launched GPT-5.3-Codex-Spark, a stripped-down coding model engineered for near-instantaneous response times, marking the company's first significant inference partnership outside its traditional Nvidia-dominated infrastructure. The model runs on hardware from Cerebras Systems, a Sunnyvale-based chipmaker whose wafer-scale processors specialize in low-latency AI workloads.

2.16.2026

The Rise of WebMCP

In this video, I look at Web MCP and how it could have a huge impact on how agents operate on the web and work with websites.

2.13.2026

MiniMax's new open M2.5 and M2.5 Lightning near state-of-the-art while costing 1/20th of Claude Opus 4.6

Chinese AI startup MiniMax, headquartered in Shanghai, has sent shockwaves through the AI industry today with the release of its new M2.5 language model in two variants, which promise to make high-end artificial intelligence so cheap you might stop worrying about the bill entirely.

It's also said to be "open source," though the weights (settings) and code haven't been posted yet, nor has the exact license type or terms. But that's almost beside the point given how cheap MiniMax is serving it through its API and those of partners.

2.12.2026

z.ai's open source GLM-5 achieves record low hallucination rate and leverages new RL 'slime' technique

Chinese AI startup Zhupai aka z.ai is back this week with an eye-popping new frontier large language model: GLM-5.

The latest in z.ai's ongoing and continually impressive GLM series, it retains an open source MIT License — perfect for enterprise deployment – and, in one of several notable achievements, achieves a record-low hallucination rate on the independent Artificial Analysis Intelligence Index v4.0.

With a score of -1 on the AA-Omniscience Index—representing a massive 35-point improvement over its predecessor—GLM-5 now leads the entire AI industry, including U.S. competitors like Google, OpenAI and Anthropic, in knowledge reliability by knowing when to abstain rather than fabricate information.

2.11.2026

'Observational memory' cuts AI agent costs 10x and outscores
RAG on long-context benchmarks

RAG isn't always fast enough or intelligent enough for modern agentic AI workflows. As teams move from short-lived chatbots to long-running, tool-heavy agents embedded in production systems, those limitations are becoming harder to work around.

In response, teams are experimenting with alternative memory architectures — sometimes called contextual memory or agentic memory — that prioritize persistence and stability over dynamic retrieval.

One of the more recent implementations of this approach is "observational memory," an open-source technology developed by Mastra, which was founded by the engineers who previously built and sold the Gatsby framework to Netlify.

2.10.2026

I tried a Claude Code rival that's local, open source,
and completely free

I was curious if Block's Goose agent, paired with Ollama and the Qwen3-coder model, could really replace Claude Code.

Goose, developed by Jack Dorsey's company Block, is an open-source agent framework, similar to Claude Code. Qwen3-coder is a coding-centric large language model similar to Sonnet-4.5.

2.09.2026

GPT-5.3 Codex Is INSANE! OpenAI’s BEST Model
Might Beat Opus 4.6?

GPT-5.3 Codex is blowing minds in early 2026! In this video, we put OpenAI’s fastest, most interactive coding AI to the test and compare it against Claude Opus 4.6—one of the deepest reasoning, long-horizon planning models out there.

2.06.2026

OpenAI’s GPT-5.3-Codex drops as Anthropic upgrades Claude

OpenAI on Wednesday released GPT-5.3-Codex, which the company calls its most capable coding agent to date, in an announcement timed to land at the exact same moment Anthropic unveiled its own flagship model upgrade, Claude Opus 4.6. The synchronized launches mark the opening salvo in what industry observers are calling the AI coding wars — a high-stakes battle to capture the enterprise software development market.

Anthropic releases Opus 4.6 with new ‘agent teams’

On Thursday, Anthropic released the latest version of Opus — its most advanced model and a particularly important model for Claude Code. Opus 4.5 was only released last November, and with 4.6, the company has sought to broaden its model’s capabilities and appeal, allowing for a greater variety of uses and customers.

Perhaps the most notable addition to the newest version of Opus is the inclusion of what the company calls “agent teams” — teams of agents that can split larger tasks into segmented jobs.

2.05.2026

Apple integrates Anthropic’s Claude and OpenAI’s Codex into Xcode 26.3 in push for ‘agentic coding’

Apple on Tuesday announced a major update to its flagship developer tool that gives artificial intelligence agents unprecedented control over the app-building process, a move that signals the iPhone maker's aggressive push into an emerging and controversial practice known as "agentic coding."

Xcode 26.3, available immediately as a release candidate, integrates Anthropic's Claude Agent and OpenAI's Codex directly into Apple's development environment, allowing the AI systems to autonomously write code, build projects, run tests, and visually verify their own work — all with minimal human oversight.

2.04.2026

Qwen3-Coder-Next offers vibe coders a powerful open source,
ultra-sparse model

Chinese e-commerce giant Alibaba's Qwen team of AI researchers has emerged in the last year as one of the global leaders of open source AI development, releasing a host of powerful large language models and specialized multimodal models that approach, and in some cases, surpass the performance of the proprietary U.S. leaders such as OpenAI, Anthropic, Google and xAI.

Now the Qwen team is back again this week with a compelling release that matches the "vibe coding" frenzy that has arisen in recent months: Qwen3-Coder-Next, a specialized 80-billion-parameter model designed to deliver elite agentic performance within a lightweight active footprint.

2.03.2026

OpenAI Codex App: Claude Cowork Killer?

Explore the OpenAI Codex app, a potential command center for AI agents. This video demonstrates the app's features, including multi-project handling and built-in voice dictation. A firsthand setup and usage walkthrough on macOS is also included.

2.02.2026

I use the ‘potato’ prompt with ChatGPT every day — here is how it finds the holes in my logic

I’ve spent years using AI as a sounding board, but I recently realized I was using it wrong. I was asking ChatGPT to "help me plan" or "summarize my ideas," which usually resulted in the AI simply agreeing with me. It was a digital echo chamber.

Now, when I have a new strategy, a pitch or a controversial argument, I don't ask for a "critique." I type one word.

Potato.