📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google highlights that in AI-driven software development, the model accounts for only about 10% of system behavior. The focus is shifting toward harness design and verification, which are now the key to building reliable AI systems.

A new Google whitepaper emphasizes that the AI model constitutes only about 10% of the overall system behavior in AI-assisted software development. The paper argues that the real skill now lies in designing the harness, verification, and context management, shifting the focus away from the model itself. This development is significant because it redefines where organizations should invest their resources to improve AI reliability and effectiveness.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that 85% of professional developers use AI coding agents regularly, with 51% using them daily. Despite the hype around models, the document stresses that model improvements alone are insufficient for system robustness. Instead, the harness — including prompts, tools, rules, and observability — accounts for roughly 90% of system behavior, as evidenced by experiments on benchmarks like Terminal Bench 2.0 and LangChain. These findings suggest that organizations can significantly influence system performance through configuration and scaffolding, rather than solely relying on the latest models.

At a glance

reportWhen: published early 2026

The developmentGoogle’s new whitepaper reveals that the core of AI software engineering is less about the AI model and more about the harness and verification processes.

The Model Is Only 10% — The New SDLC With Vibe Coding

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

Impact of Harness and Verification on AI Development

This shift matters because it redirects the strategic focus from chasing the latest AI models to building better scaffolding, verification, and context management. Organizations that understand this can achieve more reliable AI systems at lower costs, avoiding the high expenses associated with model updates and retraining. It also emphasizes that cost efficiency in AI development depends more on how systems are configured and maintained than on the raw power of the models used.

Amazon

AI system verification tools

As an affiliate, we earn on qualifying purchases.

Evolution of AI Development Strategies

The paper builds on recent trends where AI tools like ChatGPT, Codex, and Claude have become integral to software workflows. Since early 2026, the industry has seen a surge in AI adoption, with more companies recognizing that model improvements are incremental. The concept of “vibe coding” has been criticized for its lack of discipline, leading to a renewed emphasis on structured engineering practices. Prior to this, the focus was predominantly on model capabilities, but recent experiments demonstrate that configuration and scaffolding can outperform raw model improvements in system performance and reliability.

“The biggest shift in software engineering isn’t a new language or framework; it’s moving from writing code to expressing intent and trusting machines to interpret that intent.”
— Addy Osmani

Amazon

AI observability software

As an affiliate, we earn on qualifying purchases.

Unconfirmed Aspects of Cost and Performance Gains

While experiments demonstrate the importance of harness and configuration, it remains unclear how universally these findings apply across different AI tasks and industries. The long-term impact on AI model development and the pace of change in model architecture are still evolving, and some experts question whether future models might shift this balance.

Amazon

AI harness design tools

As an affiliate, we earn on qualifying purchases.

Expected Focus on Configuration and Verification Tools

Organizations will likely prioritize developing robust harnesses, verification frameworks, and context management strategies. Future research and development may focus on creating standardized tools for configuration, testing, and observability to maximize AI systems’ reliability and cost-efficiency. Monitoring how these practices influence AI system performance in real-world deployments will be critical in the coming months.

Amazon

AI development testing frameworks

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system behavior?

The whitepaper shows that the model’s core algorithms are just a small part of how AI systems behave; most of the system’s reliability depends on how the model is integrated, configured, and verified through scaffolding, prompts, and tools.

How can organizations improve their AI systems based on this insight?

By investing in better harness design, context engineering, and verification processes, organizations can enhance AI reliability without solely relying on upgrading models.

Does this mean model improvements are no longer important?

Model improvements remain valuable but are now only part of a broader strategy. The whitepaper emphasizes that configuration and verification are more impactful for system behavior and cost efficiency.

What are the risks of focusing too much on harness and verification?

Overemphasizing configuration without understanding the underlying models could lead to brittleness if models evolve or change unexpectedly. Balance between model development and system engineering is still necessary.

Source: ThorstenMeyerAI.com

The Model Is Only 10%: The Real Lesson of the New SDLC

Up next

Cutrova: Edit the Words, Not the Timeline

Author

Artificial Intelligence

Share article

The model is only 10%