AI Daily — Apr 25, 2026
Photo: lyceumnews.com
Saturday, April 26, 2026
The Big Picture
Two days, two flagship models, two competing visions of where this industry goes next. OpenAI shipped GPT-5.5 on Thursday as a closed, premium-priced step toward agentic autonomy; DeepSeek answered Friday with V4 — a 1.6T-parameter MoE under an MIT license, runnable on Huawei Ascend chips, with a 1M-token context that reduces long-context memory requirements sharply. The frontier and the open tier are now separated by months, not generations, and the competition has stopped being about scores. It's about who controls the substrate.
What Just Shipped
- DeepSeek V4 Pro & Flash (DeepSeek): 1.6T total / 49B active (Pro) and 284B / 13B active (Flash), MIT-licensed, 1M-token context, hybrid thinking modes, runnable on Huawei Ascend.
- GPT-5.5 (OpenAI): 1M-token context, 82.7% on Terminal-Bench 2.0, $5/$30 per million input/output tokens, helped optimize its own inference stack during training.
- Nemotron 3 Super (NVIDIA): 120B-parameter open hybrid MoE with 1M-token context, tuned explicitly for long-horizon agent memory and multi-step planning.
- Gemma 4 E2B IT (Google): Open model with 131K context, optimized for IT operations workloads.
- Trinity Large Preview (xAI): Preview release with limited public detail in trackers as of this morning.
Today's Stories
GPT-5.5 Lands as a Model and a Platform Bet
OpenAI shipped GPT-5.5 on Thursday with the kind of numbers that usually generate their own news cycle: 82.7% on Terminal-Bench 2.0 (Claude Opus 4.7 sits at 69.4%), 84.9% on GDPval across 44 professions, 78.7% on OSWorld-Verified for actual computer use. Per Artificial Analysis, GPT-5.5 medium scores the same as Claude Opus 4.7 max on their Intelligence Index at roughly a quarter of the cost — about $1,200 versus $4,800. API pricing landed at $5 per million input tokens and $30 per million output, double GPT-5.4's rate.
But the model is half the story. OpenAI bundled GPT-5.5 with a Codex overhaul that turns it into the company's superapp foundation: browser control, Sheets/Slides, Docs/PDFs, OS-wide dictation, and an "auto-review" guardian agent that, OpenAI says, reduces human approvals on long-running tasks. OpenAI says the model helped optimize its own inference infrastructure during training, yielding a 20%+ token-generation speedup versus GPT-5.4.
If this works, OpenAI could own the desktop the way Microsoft once did — agents that operate your software rather than chat about it. One early signal would be enterprise Codex revenue rotating away from per-seat ChatGPT Enterprise contracts. An early signal it's not would be the same complaint developers had about GPT-5.4: that the model is exploratory and drifts off-task without tight instruction. Cursor's CEO called 5.5 "noticeably more persistent." We'll know in a quarter whether that holds in production.
DeepSeek V4 Ships on Huawei Silicon, Hours After GPT-5.5
DeepSeek released V4 Pro (1.6T total / 49B active) and V4 Flash (284B / 13B active) Friday, both MIT-licensed, both with 1M-token context, both with base and instruct checkpoints. The 58-page technical report was described as "the best model paper of the year" by multiple independent researchers. Per Artificial Analysis, V4 Pro scores 52 on the Intelligence Index — second among open-weight reasoning models behind Kimi K2.6 — with V4 Flash priced at $0.14 / $0.28 per million tokens.
The architecture is the news. New compressed and hybrid attention mechanisms (CSA and HCA) cut 1M-token inference to roughly 27% of the FLOPs and 10% of the KV-cache memory of DeepSeek V3.2. The KV cache for a 1M-token sequence drops from 83.9 GiB to 9.62 GiB — an 8.7× reduction. Long-context economics that were theoretical in December are operational now.
The geopolitical layer matters more than the benchmarks. V4 was co-designed for Huawei Ascend chips, and DeepSeek told reporters Pro pricing could fall sharply once Ascend 950 supernodes scale in the second half of 2026. If this succeeds, China demonstrates frontier-adjacent open models trained on domestic silicon, and the chip-export-control regime starts to look like a slowdown rather than a moat. If it fails, the bottleneck reveals itself: V4 Pro throughput is currently capped by high-end compute supply, and Teortaxes argued the architecture may be too complex for most labs to reproduce — meaning openness in license, not openness in practice.
Anthropic Hits a $1 Trillion Implied Valuation on Secondary Markets
On Forge Global, Anthropic shares are trading at roughly a $1 trillion implied valuation. OpenAI on the same platform sits at $880 billion. Three months ago, Anthropic closed a $30 billion Series G led by GIC and Coatue at $380 billion post-money — meaning secondaries are now pricing the company at nearly three times its last primary round.
The mechanism is revenue. Anthropic's annualized run rate went from roughly $9 billion at the end of 2025 to $30 billion by March 2026, per multiple secondary-market trackers — a 233% jump in a single quarter, driven by enterprise Claude Code adoption. Caplight reports Anthropic share interest spiked over 650% in twelve months; Rainmaker Securities' Glenn Anderson described $960 billion as "unthinkable a month earlier" before getting bid up within hours. Anthropic is reportedly working with Goldman Sachs and JP Morgan on an IPO targeting late 2026 at $400–500 billion — well below the secondary mark, which is how private market dynamics usually correct.
If this holds, Anthropic becomes the first AI lab to IPO with revenue that justifies frontier-lab valuations on something other than narrative. If it doesn't — if Claude Code growth slows, or if GPT-5.5's coding gains pull enterprise contracts back — the gap between secondary pricing and IPO pricing becomes the cautionary chart of 2026.
⚡ What Most People Missed
The DeepSeek V4 Flash KV-cache math is the actual breakthrough. A 1M-token context that fits in under 10 GiB of memory means long-context agents become economically viable on hardware that doesn't cost a small country's defense budget. Several researchers flagged Flash, not Pro, as the disruptive SKU.
OpenAI quietly open-sourced a privacy filter. A 1.5B-total / 50M-active MoE token classifier under Apache 2.0, with a 128K context, designed for cheap PII redaction across enterprise log corpora. It's a real infrastructure release dressed up as a side project, and it solves a problem every agent pipeline has.
Google quietly announced its 8th-gen TPU at its Cloud Next event. TPU 8t for training, TPU 8i for inference, with 1,152 TPUs per pod and Google's per-pod claims of nearly 3× compute per pod versus Ironwood. A million TPUs in a single cluster is now the stated ceiling.
Hugging Face shipped ML Intern. An open-source CLI agent that researches papers, runs experiments on HF datasets and jobs, and iterates up to 300 steps. Not a hosted service — a local tool. The agent-as-junior-employee abstraction is becoming a commodity.
📅 What to Watch
- If Anthropic's IPO prices below $500 billion while secondaries hold near $1 trillion, expect a wave of secondary-market repricing across the AI private cap stack.
- If Codex auto-review adoption drives measurable reductions in human code review hours at large enterprises, Microsoft's Copilot positioning could shift materially toward a margin-focused product line.
- If DeepSeek V4 Pro pricing drops sharply in H2 once Ascend 950 supernodes scale, the chip-export-control thesis loses its primary economic argument.
- If a third lab beyond Cursor and Cognition ships a domain-specific coding model that users actively prefer to frontier defaults, the agent-lab playbook could become the dominant startup pattern of 2026.
- If GPT-5.5's "model helped optimize its own inference" result is reproduced by any other lab, the recursive-self-improvement debate stops being theoretical and starts being a procurement question.
The Closer
GPT-5.5 helping rewrite its own inference scheduler, DeepSeek V4 fitting a million tokens into 9.6 gigabytes on Huawei silicon, and Anthropic's secondary shares getting bid up faster than Rainmaker brokers can return calls. Somewhere in Palo Alto, a venture associate is updating a deck that already needed updating before lunch.
Stay sharp out there.
Forward this to the friend who keeps asking you what actually changed this week.