AI Weekly — Apr 27, 2026
Week of April 27, 2026
The Big Picture
The week's biggest story isn't a model. It's the realization that the AI industry has stopped competing on intelligence and started competing on the full stack — chips, pricing, infrastructure, governance, and increasingly geopolitics. GPT-5.5 and DeepSeek V4 landed within hours of each other, Anthropic locked in roughly $65 billion in pledged capital, Google split its TPU into two specialized chips, and the White House accused China of industrial-scale AI theft on the same Friday DeepSeek shipped a model tuned for Huawei silicon. Underneath the headlines, the quieter shift: how we measure AI is breaking down at the same moment the stakes are getting real.
What Just Shipped
- GPT-5.5 (OpenAI): Launched Thursday with stronger coding and computer-use capabilities. OpenAI reports 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval across 44 professions. Codex now controls browsers, edits spreadsheets, and runs an auto-review mode where a second agent checks the first agent's work.
- DeepSeek V4 Pro and Flash (DeepSeek): Two MIT-licensed open-weight models — V4-Pro at 1.6 trillion parameters (49B active), V4-Flash at 284 billion (13B active) — with 1-million-token context and pricing roughly 90% below Western frontier models.
- TPU 8t and 8i (Google): The first time Google has split its flagship TPU into two specialized chips: 8t for training, 8i for inference. Google claims up to 80% better inference price-performance over the seventh-generation Ironwood.
- Qwen3.6-27B (Alibaba): A 27-billion-parameter Apache 2.0 model that beats Alibaba's own 397B predecessor on major coding benchmarks. Runs locally on a MacBook Pro.
- GPT-Image-2 (OpenAI): Earlier in the week, with thinking and non-thinking variants, dramatic gains in text rendering and editing, and Arena's #1 ranking across all image generation leaderboards.
- Kimi K2.6 (Moonshot AI): A trillion-parameter open-weight model with 384 experts, 256K context, and native multimodality.
This Week's Stories
GPT-5.5 Is a Model and a Platform Bet
The most important thing about GPT-5.5 isn't the benchmark numbers — it's what OpenAI built around them.
Codex now has browser control, spreadsheet and document editing, OS-wide dictation, and an auto-review mode where a guardian agent checks the primary agent's work before anything ships. Greg Brockman called it a step toward a "super app" — combining ChatGPT, Codex, and an AI browser into one unified service. Workspace agents started rolling out to ChatGPT Business and Enterprise on Wednesday, with scheduled runs, Slack integration, team sharing, and admin controls.
If this works, the lock-in won't come from benchmarks — it'll come from the AI already knowing how your company works. Once a finance team's weekly close runs through a scheduled Codex agent that knows your spreadsheets, your reviewers, and your Slack channels, switching costs become organizational, not technical. The failure mode is also visible: API pricing doubled to $5 input and $30 output per million tokens. If enterprises don't get matching productivity gains, the math falls apart fast — and DeepSeek V4 is right there waiting at one-tenth the price.
The signal to watch: whether enterprise procurement teams in Q3 earnings calls start describing AI spend as a line item with stable contracts, or as something they're constantly renegotiating.
DeepSeek V4 Is a Pricing Attack, a Hardware Strategy, and a Geopolitical Statement
Hours after GPT-5.5 dropped, DeepSeek answered. The timing was notable.
V4-Flash costs $0.14 per million input tokens and $0.28 per million output tokens. V4-Pro is $1.74 input and $3.48 output. GPT-5.5, by comparison, is $5 and $30. Both V4 models have a 1-million-token context window. Both are MIT-licensed. Both are tuned to run on Huawei's Ascend NPUs — the architectural choice that turns a model release into a sovereignty argument.
The technical work is genuinely novel. DeepSeek's new compressed sparse attention reduces KV cache memory by roughly 90% at long context, meaning long agentic tasks that cost $100 on Western APIs might run for $5 on V4. DeepSeek itself acknowledges V4 trails the top closed models by three to six months — but an open-weight model that's 90% as good at 10% of the price changes the economics for every developer on earth.
What changes if this works: the pricing floor for frontier-class inference collapses, Western SaaS pricing models built on token costs come under pressure, and "sovereign AI" stops being a slogan and becomes a procurement category. What failure looks like: V4 throughput stays compute-constrained, Ascend 950 supernodes slip past their second-half ship date, and DeepSeek remains a benchmark headline rather than production infrastructure. The signal is volume — if Fireworks and Together see V4 traffic 5x by May, the answer is in.
Google Splits Its Brain in Two — and Bets $40 Billion on Anthropic
Google had a busy week, and the two biggest moves are connected.
At Cloud Next, Google unveiled the eighth-generation TPU as two purpose-built chips: TPU 8t for training, TPU 8i for inference and reinforcement learning. Training and inference have fundamentally different needs — training wants raw throughput, inference wants low latency at scale. Building two chips instead of one is Google's clearest answer yet to Nvidia's dominance.
Then came the money. Alphabet committed $10 billion now, at a $350 billion Anthropic valuation, with $30 billion to follow if Anthropic hits performance targets. Google Cloud will provide a fresh 5 gigawatts of capacity over five years. Combined with Amazon's pledge of up to $25 billion, Anthropic now has roughly $65 billion in committed capital and 10 gigawatts of reserved compute — and an annualized revenue run rate that, according to industry reporting, has gone from $1 billion at the start of 2025 to $30 billion by April 2026.
Numbers like that make an IPO look not just possible but inevitable.
The White House Accused China of AI Theft — the Day DeepSeek V4 Shipped
The coincidence of these two events on Friday was notable.
The State Department, per Reuters, sent a diplomatic cable to embassies worldwide instructing staff to warn foreign governments about alleged IP theft by DeepSeek and other Chinese AI firms — naming Moonshot AI and MiniMax alongside DeepSeek. Two days earlier, the White House Office of Science and Technology Policy published a memo accusing Chinese entities of running "deliberate, industrial-scale campaigns" to distill American frontier AI systems.
The distillation accusation — that Chinese labs train on the outputs of American models to skip years of research — is serious but contested. What's harder to contest is the result. He Hui, director of semiconductor research at Omdia, told Reuters: "Huawei's Ascend chips are the country's best homegrown alternative to Nvidia, and supporting DeepSeek V4 shows that top Chinese AI models can now run on Chinese hardware."
The U.S. export control strategy was built on the assumption that cutting off chips would slow Chinese AI. V4's launch is evidence that the strategy has coincided with adaptation rather than surrender.
The Benchmark Everyone Uses to Buy AI Coding Tools Is Now Officially Broken
While everyone watched the model launches, the standard everyone uses to evaluate them collapsed.
SWE-bench Verified — the leaderboard cited in nearly every AI coding announcement — is showing test overfitting rates above 30%, according to an arXiv paper published this month. OpenAI has stopped reporting Verified scores after finding training-data contamination across every frontier model.
The numbers that make this concrete: Claude Opus 4.5 scores 80.9% on SWE-bench Verified. The same model, on SWE-bench Pro using standardized scaffolding on tasks it couldn't have seen during training, scores 45.9%. That's a 35-point drop on the same model doing the same kind of task. And the model may matter less than you think — Morph found that swapping between top frontier coding models produced a roughly 1% score change on SWE-bench Pro, while swapping the agent scaffold produced a 22% swing.
For enterprise teams making procurement decisions based on leaderboard numbers, this is a quiet crisis. The number a vendor put in their sales deck may be measuring how well they tuned their test harness, not how well the model writes code. Independent evals are becoming the real scarce asset — and the labs that earn trust on reproducible, scaffold-controlled benchmarks will have a quiet but durable advantage.
A 23-Year-Old With a ChatGPT Subscription Just Cracked a 60-Year-Old Math Problem
Liam Price is 23 and has no advanced mathematics training. He just helped close a 60-year-old conjecture left behind by Paul Erdős, the Hungarian mathematician who died in 1996 and whose open problems are tracked at erdosproblems.com.
The solution came in response to a single prompt to GPT-5.4 Pro. Stanford mathematician Jared Lichtman and Fields Medalist Terence Tao verified and refined the proof. The catch — and it's an important one — is that the raw output was, in Lichtman's description, "actually quite poor," requiring an expert to extract the key insight. What the model found was a 90-year-old technique (Markov chains with von Mangoldt weights) that no human researcher had applied to this problem class — what Tao calls a "previously undescribed connection."
The story isn't "AI solves math." It's the workflow it represents: amateur exploration plus professional validation, with AI doing the combinatorial search across a vast space of known techniques. Tao has suggested AI is "better suited for being systematically applied to the 'long tail' of obscure Erdős problems, many of which actually have straightforward solutions." If that holds, the long tail of mathematics — small, specialized conjectures nobody has time to revisit — is about to get worked through at unprecedented speed.
⚡ What Most People Missed
Google revealed 75% of its in-house code is now AI-generated. Disclosed quietly at Cloud Next while everyone was looking at the chips. The most concrete data point we have on what "AI-native engineering" looks like at scale — and it's coming from a company with 180,000 employees.
Meta is recording employee keystrokes and screenshots to train AI agents. Through a tool called Model Capability Initiative, Meta is collecting training data to teach AI models how humans interact with computers. Some legal experts say this subjects white-collar workers to real-time surveillance once limited to gig workers. European law would likely prohibit it under GDPR.
The first hard number on white-collar employment is in. S&P 500 employment fell by 400,000 in 2025 to 28.1 million — the first annual decline since 2016, following eight consecutive years of growth. The number comes from Bank of America strategist Michael Hartnett, derived from aggregated public filings. Causality is genuinely hard to establish, but the direction is consistent across every data source that's looked. The "AI might affect jobs" debate is now an "AI is affecting jobs" data point.
The White House pushed an Anthropic researcher out of a key AI safety job days after he was hired. The Department of Commerce had picked Collin Burns to lead the Center for AI Standards and Innovation. He was forced out within days. The Washington Post reported it Thursday. It's a signal about who will actually be writing U.S. AI safety standards going forward — and it isn't the people who came from frontier labs.
The EU quietly opened a consultation on measuring AI energy use. The European Commission's AI Office is taking input through May 15 on how to measure the energy consumption and emissions of AI models. Once policymakers move from vague sustainability language to asking how to measure, they're laying groundwork for procurement checklists, disclosure templates, and compliance work. Energy accounting is moving upstream into AI governance.
📅 What to Watch
- If DeepSeek V4 API traffic on Fireworks and Together grows 5x by May 1, open-weight agents have started undercutting U.S. labs on cost alone — and enterprise contracts signed at today's pricing become renegotiation candidates.
- If Anthropic files IPO paperwork by Q3, the secondary market's roughly $1 trillion implied valuation gets tested against public reality, and every other AI company's valuation framework moves with it.
- If Microsoft Build (May 19–22) ships a major Copilot or MAI model announcement, Microsoft is signaling it intends to compete with OpenAI and Anthropic on models, not just distribute them — which would reshape its partnership with OpenAI.
- If Huawei confirms wide availability of Ascend 950 supernodes by end of Q2, China has officially established a parallel, self-sufficient AI hardware supply chain — and the export-control strategy enters a new phase.
- If FERC issues new interconnection standards by its June deadline, every U.S. data center currently under construction has its timeline and economics rewritten in a single regulatory action.
- If publishers and journals start running agentic reproducibility tools on submissions, it would force research groups to standardize data and code pipelines, increase demand for reproducibility engineers, and make reproducibility scores part of hiring, funding, and promotion decisions.
The Closer
A 23-year-old solved a problem Erdős left behind in the year humanoid robots ran a half-marathon faster than humans on the week DeepSeek shipped a 1.6-trillion-parameter model that runs on chips America tried to keep China from buying. The export controls coincided with the emergence of cheaper Chinese AI on Chinese hardware and produced a strongly worded cable from the White House on the same Friday. Onward.
If you know someone trying to make sense of any of this, send it their way.