The last six months in LLMs in five minutes

TL;DR

Over the past six months, LLMs have seen rapid progress, with multiple models overtaking each other in performance, especially in coding tasks. New projects like OpenClaw gained prominence, and models became more capable of complex tasks, signaling a significant inflection point.

Over the past six months, the landscape of large language models (LLMs) has undergone rapid and notable changes, with multiple models competing for dominance and significant improvements in coding and creative capabilities. The period marks what many call the November 2025 inflection point, where model performance and application scope expanded dramatically.

In November 2025, the previously top-ranked model, Claude Sonnet 4.5, was overtaken by GPT-5.1, Gemini 3, and GPT-5.1 Codex Max, with Anthropic’s Claude Opus 4.5 eventually reclaiming the crown. This competition among major AI labs underscored a period of intense focus on model performance, especially in coding tasks, where reinforcement learning techniques from 2025 significantly improved code quality and reliability.

During this time, the emergence of new projects like Warelay, later renamed OpenClaw, captured attention. OpenClaw, a “personal AI assistant” project, grew rapidly in popularity, with users deploying it on Mac Minis for personal use, likening it to digital pets. Meanwhile, models such as Gemini 3.1 Pro and Google’s Gemma 4 series released highly capable open-weight models, with Gemma 4 standing out as one of the most advanced from a US-based lab. Chinese AI lab GLM also released GLM-5.1, a large 1.5TB model capable of impressive creative outputs, including animated pelicans on bicycles.

Why It Matters

This period marks a turning point in AI development, where coding agents became reliable enough for daily use, and accessible models started outperforming expectations. The rapid model competition and technological advances signal a new era of AI tools that are more capable, versatile, and integrated into everyday workflows. These developments impact industries from software development to creative arts, and influence the future direction of AI research and deployment.

MIMOUSE Wireless Mechanical AI Numeric Keypad with Voice Typing Text Translation for Laptop Mac Windows Linux, Mini Bluetooth Number pad One Handed Portable AI Numpad for Office Workers/Lawyers

AI NUMBERIC KEYPADS: The Mimouse ai wireless bluetooth numberic keypad available as a separate basic 10-key mechanical number…

As an affiliate, we earn on qualifying purchases.

Background

Prior to this period, the AI community had been gradually improving LLM performance, but November 2025 is recognized as an inflection point due to the rapid succession of model improvements and the rise of specialized coding agents. The focus shifted from just scaling models to enhancing their utility and reliability, particularly in coding tasks. The emergence of projects like Warelay/OpenClaw and the release of high-capability open models reflect this shift. The competitive landscape intensified, with labs worldwide striving to outdo each other in both raw performance and practical applications.

“The past six months have seen a remarkable acceleration in LLM capabilities, especially in coding, with multiple models swapping rankings and new projects gaining rapid traction.”

— Simon Willison

“People are buying Mac Minis to run their Claws — they’re the new digital pets.”

— Drew Breunig

“Here’s a video of an animated pelican riding a bicycle, plus other animals on vehicles — AI labs are paying attention.”

— Jeff Dean (Google)

AI VoiceWriter – Smart Dictation & AI Writing Assistant for Windows & Mac | USB Dongle & Mobile App for Voice Input, Proofreading, Rewriting & Multilingual Support

🎙️ Hands-Free Voice Typing for Windows & Mac – Powered by iOS & Android dictation technology, AI VoiceWriter…

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

While the developments are well-documented, it remains unclear how the performance of these models will evolve in the coming months, especially regarding their reliability, safety, and broader adoption. The long-term impact of new projects like OpenClaw and the true capabilities of large models like GLM-5.1 are still being evaluated, and competition among labs continues to intensify.

AI Engineering: Building Applications with Foundation Models

As an affiliate, we earn on qualifying purchases.

What’s Next

Expect further model releases and improvements, with AI labs likely to focus on making models more reliable, safe, and accessible. The next milestones include broader deployment of coding agents in real-world workflows and potential breakthroughs in multimodal capabilities. Monitoring how these models integrate into industry and daily life will be key.

AI ART AND IMAGE GENERATOR: HOW TO CREATE WITH MIDJOURNEY

As an affiliate, we earn on qualifying purchases.

Key Questions

Which model is currently considered the best?

There is no definitive answer; rankings have shifted multiple times over the past six months, with models like GPT-5.1, Gemini 3, and Claude Opus 4.5 all holding top spots at different times.

What is OpenClaw, and why is it significant?

OpenClaw is a project that developed a personal AI assistant, gaining rapid popularity and demonstrating the increasing accessibility and utility of advanced AI models in everyday tasks.

How have coding capabilities changed recently?

Reinforcement learning techniques have significantly improved code quality, making coding agents reliable enough for daily use without extensive fixes, marking a major milestone in AI-assisted programming.

Are there concerns about AI safety or reliability?

While capabilities have advanced, questions remain about the safety, reliability, and ethical deployment of these models, which are ongoing areas of research and discussion.

Up next

Probe synthetic test

Author

Artificial Intelligence

Share article

Why It Matters

MIMOUSE Wireless Mechanical AI Numeric Keypad with Voice Typing Text Translation for Laptop Mac Windows Linux, Mini Bluetooth Number pad One Handed Portable AI Numpad for Office Workers/Lawyers

Background

AI VoiceWriter – Smart Dictation & AI Writing Assistant for Windows & Mac | USB Dongle & Mobile App for Voice Input, Proofreading, Rewriting & Multilingual Support

What Remains Unclear

AI Engineering: Building Applications with Foundation Models

What’s Next

AI ART AND IMAGE GENERATOR: HOW TO CREATE WITH MIDJOURNEY

Key Questions

Which model is currently considered the best?

What is OpenClaw, and why is it significant?

How have coding capabilities changed recently?

Are there concerns about AI safety or reliability?

World Model Readiness: Are You Ready for AI That Acts?

Outcome-First Decisions: Keep, Change, or Kill

The $725 Billion Question: Hyperscaler Capex Q1 2026 and What the Earnings Don’t Answer

Fable and Mythos: How Anthropic Shipped Its Most Powerful Model to Everyone

AmenGate: The Moment Before the Scroll

The Real Cost Of A Local-Inference Rig In 2026

Software-Defined Warfare: How Ukraine’s Delta Turned The Battlefield Into A Shared, Real-Time Map

The Eye Over The City: How Wide-Area Motion Imagery Works — And Where It Goes Blind

The last six months in LLMs in five minutes

Up next

Author

Artificial Intelligence

Share article

Why It Matters

MIMOUSE Wireless Mechanical AI Numeric Keypad with Voice Typing Text Translation for Laptop Mac Windows Linux, Mini Bluetooth Number pad One Handed Portable AI Numpad for Office Workers/Lawyers

Background

AI VoiceWriter – Smart Dictation & AI Writing Assistant for Windows & Mac | USB Dongle & Mobile App for Voice Input, Proofreading, Rewriting & Multilingual Support

What Remains Unclear

AI Engineering: Building Applications with Foundation Models

What’s Next

AI ART AND IMAGE GENERATOR: HOW TO CREATE WITH MIDJOURNEY

Key Questions

Which model is currently considered the best?

What is OpenClaw, and why is it significant?

How have coding capabilities changed recently?

Are there concerns about AI safety or reliability?

You May Also Like