Anthropic Ships Claude Sonnet 4.6 With 1M Token Context Window

Contents

Computer Use Gets Real Developer Preferences Tell the Story The Vending Machine Test Safety and Pricing

Terrill Dicki
Feb 17, 2026 18:16

Anthropic releases Claude Sonnet 4.6, delivering Opus-level AI performance at $3/$15 per million tokens with major computer use and coding improvements.

Anthropic dropped Claude Sonnet 4.6 on February 17, 2026, marking the company’s most aggressive move yet to democratize frontier AI capabilities. The new model delivers what previously required their top-tier Opus pricing—at just $3/$15 per million tokens.

The headline feature? A 1 million token context window now in beta. That’s enough to hold entire codebases, lengthy contracts, or dozens of research papers in a single request. But raw context size means nothing without reasoning ability—and Anthropic claims Sonnet 4.6 actually thinks effectively across all that information.

Computer Use Gets Real

Back in October 2024, Anthropic introduced general-purpose computer-using AI and called it “experimental—at times cumbersome and error-prone.” Sixteen months later, the gap has narrowed considerably.

On OSWorld, the standard benchmark testing AI across real software like Chrome, LibreOffice, and VS Code, Sonnet models have shown steady gains. Early users report human-level capability on tasks like navigating complex spreadsheets and filling multi-step web forms across multiple browser tabs. Insurance benchmark testing hit 94% accuracy—the highest any model has scored for computer use applications.

Anthropic acknowledges the model still lags behind skilled human operators. But the trajectory matters more than the current position for enterprise buyers evaluating automation investments.

Developer Preferences Tell the Story

Internal testing revealed users preferred Sonnet 4.6 over its predecessor roughly 70% of the time in Claude Code. More telling: users preferred it over Opus 4.5—Anthropic’s November 2025 frontier model—59% of the time.

The reasons cited weren’t abstract benchmark improvements. Developers reported fewer false claims of success, reduced hallucinations, and more consistent follow-through on multi-step tasks. The model reads context before modifying code rather than duplicating logic, making long coding sessions less frustrating.

Box tested the model on enterprise document tasks and found a 15 percentage point improvement over Sonnet 4.5 in heavy reasoning Q&A. Financial services firm Hebbia reported significant jumps in answer match rates on their benchmark.

The Vending Machine Test

One evaluation stands out for demonstrating genuine strategic reasoning. Vending-Bench Arena simulates running a business over time, with different AI models competing against each other for profits.

Sonnet 4.6 developed an unexpected approach: heavy capacity investment for the first ten simulated months, spending significantly more than competitors, followed by a sharp pivot to profitability in the final stretch. The timing of that pivot—not something explicitly programmed—helped it finish ahead of the competition.

Safety and Pricing

Anthropic’s safety evaluation concluded Sonnet 4.6 shows “a broadly warm, honest, prosocial, and at times funny character” with “no signs of major concerns around high-stakes forms of misalignment.” Prompt injection resistance improved significantly over Sonnet 4.5.

Pricing stays flat at $3 per million input tokens and $15 per million output tokens—unchanged from Sonnet 4.5’s September 2025 launch. The model is now default for Free and Pro plan users on claude.ai and Claude Cowork.

For context, Anthropic’s estimated valuation sits around $380 billion as of February 2026. The company’s rapid iteration pace—from Claude 3.5 Sonnet in June 2024 through four major Sonnet releases in under two years—suggests enterprise AI buyers should expect continued capability jumps at stable price points.

Opus 4.6 remains Anthropic’s recommendation for tasks requiring the deepest reasoning, like codebase refactoring or multi-agent coordination. But for most production workloads, the price-performance calculation just shifted significantly.

Image source: Shutterstock