The Lyceum: AI Daily — May 28, 2026
Photo: lyceumnews.com
Thursday, May 28, 2026
The Big Picture
The enterprise AI bill is coming due, and Uber's COO just said the quiet part into a podcast microphone: he cannot draw a line between the company's AI spend and anything useful shipping to customers. Meanwhile Google made its fastest model the default for a billion people, a security flaw in the plumbing under half the AI stack quietly emerged, and a CMU/Maryland preprint argues language models need sleep — literally. The throughline today is accountability: faster models, cheaper tokens, louder questions about whether any of it is worth what it costs.
What Just Shipped
- Gemini 3.5 Flash (Google DeepMind): Now the default model in the Gemini app and AI Mode in Search globally; per Google, beats Gemini 3.1 Pro on coding and agentic benchmarks at roughly 4x the speed.
- Claude Sonnet 4.6 (Anthropic): Rolling out with a 1 million token context window in beta.
- Lyria 3 (Google DeepMind): Google's newest music generation model.
- Pomelli (Google Labs): New tool for generating studio-quality marketing assets.
- Grok 4.2 Beta (xAI): Beta release; no capability specifics from primary sources yet.
- Seed 2.0 (ByteDance): Released in Pro, Light, and Mini variants.
- Seedance 2.0 (ByteDance): Video generation model rolling out in the US inside Runway and CapCut.
- Gemma 4 (Google DeepMind): New open model family.
Today's Stories
Uber Burned Its Entire 2026 AI Budget in Four Months — and the COO Can't Explain Why
The most honest sentence in enterprise AI this year came from a podcast. Uber's president and COO Andrew Macdonald, speaking on the Rapid Response podcast in coverage published by Fortune on Tuesday, said the link between Uber's rising Claude Code usage and shipped consumer features "is not there yet." He went further — it's "very hard to draw a line" between AI usage stats and producing meaningfully more useful features.
The backdrop, per AI Magazine citing The Information: Uber has already exhausted its 2026 AI budget, with R&D expenses hitting $3.4 billion in 2025. Roughly 11% of Uber's live backend code updates are now written entirely by AI agents, about 1,800 code changes per week ship without direct human input, and nearly 95% of Uber engineers use AI tools monthly. CTO Praveen Neppalli Naga said the company is "back to the drawing board" on AI budgeting.
The productivity is real. The business value is unproven. That's a fundamentally different problem than the one AI vendors are selling against — and Macdonald is the first major-company COO to say it on a microphone instead of in a footnote. Watch whether other enterprise COOs adopt similar language in Q2 earnings calls. If they do, AI coding tools get repriced from "productivity multiplier" to "cost center under review," and Anthropic and OpenAI face their first real pressure to ship consumption-smoothing pricing — exactly the trigger we've been tracking since Microsoft's blowout last week.
Google Made Its Fastest Model the Default for a Billion People
Forget the benchmark sheet. The move that matters: Google didn't just release Gemini 3.5 Flash last week — it replaced the model a billion people use every day. Flash is now the default in the Gemini app and AI Mode in Search globally. The new Gemini Spark agent, which runs 24/7 to take action on a user's behalf, also runs on 3.5 Flash, with a beta for Google AI Ultra subscribers in the US starting next week.
Per Google's own benchmarks (not independently verified): 76.2% on Terminal-Bench 2.1, 1,656 Elo on GDPval-AA, 83.6% on MCP Atlas — all reportedly beating Gemini 3.1 Pro while running faster and cheaper. Per DataCamp's coverage, Gemini 3.5 Pro is in internal use and slated to roll out next month.
Here's the consequence nobody is naming: when your fastest, cheapest, most agentic model becomes the default for Search, every Google user is now an agent user whether they know it or not. The failure signal would be a quiet rollback — if AI Mode answer quality drops noticeably and Google reverts users to a heavier model, the agentic-by-default bet doesn't survive contact with the search ad business.
DuckDuckGo Had Its Best Week in Years — Amid Google's AI Search Push
A telling irony, per PC Gamer's traffic analysis: Google announced that users love AI Mode, and the week after, DuckDuckGo — which markets itself explicitly as an AI-free alternative — saw nearly 28% more visits. Nikkei flagged the related dynamic in Japan: users no longer visiting source websites because AI answers intercept the click, and publishers watching referral traffic erode in real time.
The backlash to AI search is now measurable, not anecdotal. The structural question for Google: if AI Mode cannibalizes the click-through model that funds the whole operation, the company has a revenue problem it hasn't publicly addressed. Watch the next earnings call for any disclosure on AI Mode's effect on search ad revenue per query — its absence is itself the signal.
YouTube Will Now Auto-Label AI-Generated Videos — Disclosure or Not
YouTube is done waiting for creators to self-report. The platform announced it will automatically apply AI-generated content labels whether the creator discloses it or not. Until now, most platform AI-disclosure policies relied on creator honesty, which produced predictably uneven results.
The shift moves enforcement from the creator to the platform, and signals YouTube believes its detection tech is good enough to run at scale. The risk is the inverse problem: false positives on legitimate creative work create a new class of complaint that didn't exist before. Watch whether Meta, TikTok, or X follow with similar automated enforcement within 60 days — if they do, automated labeling becomes the default infrastructure response to synthetic media, and creator self-reporting effectively ends as a regulatory primitive.
The Open-Weight Race Has a New Leaderboard — and China Is Running It
Alibaba's Qwen 3.5-397B-A17B — a 397-billion-parameter model that activates only 17 billion parameters per query via sparse mixture-of-experts — now ranks #3 on Artificial Analysis's open-weight Intelligence Index, behind GLM-5 and Kimi K2.5. Per Alibaba, it's 60% cheaper to run than its predecessor, decodes 19x faster than Qwen3-Max at 256K context, and runs at roughly 1/18th the cost of Gemini 3 Pro. Weights are openly available on Hugging Face under Apache 2.0. One caveat from Artificial Analysis: hallucination rate is high relative to leading open-weight peers.
The model you can download and self-host is now competitive with the model you have to rent. That changes the vendor lock-in math for enterprise procurement — and the national security math for Washington. The signal to watch: a US enterprise disclosing Qwen 3.5 in a production stack. That converts the AI cost war from a procurement decision into a Commerce Department file in the same quarter — the trigger we've been tracking since the DeepSeek-via-Baidu/Alibaba rollouts earlier this week.
"BadHost" Vulnerability Hits Starlette — vLLM, FastAPI, and MCP Servers All Affected
Most AI infrastructure teams haven't seen this one yet. CVE-2026-48710 — dubbed "BadHost" — was discovered in the Starlette web framework during an X41 D-Sec audit sponsored by OSTIF. By manipulating the HTTP Host header, attackers can bypass path-based authentication middleware and access protected routes without valid credentials.
The blast radius is wide: FastAPI services, vLLM and LiteLLM inference servers, Model Context Protocol (MCP) servers, OpenAI-compatible APIs, and custom AI frameworks. Starlette sits underneath FastAPI, which sits underneath a meaningful fraction of production AI deployments. MCP servers — the tool-use layer connecting agents to external services — are explicitly named.
This is the kind of vulnerability that hits quietly because it lives in the plumbing, not the model. Watch whether enterprise AI security teams start requiring framework-level audits before agent deployments — that's a new procurement gate that didn't exist last quarter, and if it appears, agent rollout timelines slip industry-wide.
"Language Models Need Sleep" — A CMU/Maryland Preprint Is Quietly Trending
A preprint from Carnegie Mellon and the University of Maryland is sitting at 210+ points on Hacker News, which is unusual for an architecture paper. The proposal: LLMs should periodically convert recent context into persistent fast weights through offline "sleep" passes, then clear the key-value cache — preserving low-latency inference during "wake" while consolidating memory during "sleep."
The empirical claim, per the paper: increasing sleep duration directly improves accuracy, learning speed, and reasoning depth — decoupling memory capacity from reasoning compute. Tested on synthetic tasks (cellular automata, multi-hop graph retrieval) and a math reasoning task where regular transformers and SSM-attention hybrids fail.
It's a preprint on synthetic benchmarks, not a deployed system. But it speaks directly to the agent token-burn problem: agents reasoning over long contexts are exactly the workload "sleep" targets. If a major lab cites this architecture in a model card within 90 days, the long-context scaling story stops being about buying more memory and starts being about scheduling downtime — which would be a genuinely strange thing to say about software, and the first architectural answer to Uber's bill.
⚡ What Most People Missed
- Mozilla shipped Cq — Stack Overflow for AI coding agents: A shared memory layer that lets agents reuse prior tool-use traces and failed-attempt knowledge instead of retrying from scratch. The companion hosted service, cq exchange, went from 2 to 1,100+ GitHub stars after Hacker News traction. This is the framework-layer answer to the retry-loop token burn that helped blow Uber's budget — and it pushes power from model vendors toward orchestrators.
- VIPER-MCP found 106 zero-days across 39,884 MCP repos: A preprint scanned open-source Model Context Protocol repositories and found 106 zero-day vulnerabilities, with 67 CVEs already assigned. The agent protocol itself is becoming the attack surface, not just the model prompt. Combined with BadHost above, this is two MCP-layer security stories in 24 hours.
- PJM can now curtail data centers under DOE emergency order: Under a May 18 Department of Energy order, the biggest US grid operator can throttle data centers with backup generation as a last resort before rolling blackouts. Compute is starting to inherit power-market rules — financing a campus isn't the constraint anymore; the grid telling you to back off mid-training run is.
- Sichuan's ¥400 billion AI+ industrial target for 2030: Per Sina Finance, the provincial government announced its "AI+ Innovation Project No. 1" — roughly $55 billion in scale by 2030. China's AI buildout is now happening at the provincial level, not just through national champions, meaning the infrastructure and talent pipeline are being replicated across dozens of regions simultaneously. [Source: Sina Finance — Chinese]
- MiMo cut prices by up to 99%: Xiaomi's MiMo model joined DeepSeek in slashing inference prices, with Chinese coverage framing it as a permanent reduction rather than a promotional discount. The Chinese price war is now structural, not tactical. [Source: Sohu — Chinese]
📅 What to Watch
- If Anthropic or OpenAI announces consumption-smoothing or capped pricing for agent tiers within 30 days, Uber's COO will have forced the first defensive pricing move from a Western frontier lab — and the cost war crosses the Pacific.
- If a US enterprise discloses Qwen 3.5, DeepSeek, or MiMo in production within 30 days, the AI cost war becomes a Commerce Department file in the same quarter.
- If Gemini 3.5 Pro ships in June as Google promised, it will be the first time a Flash-tier model outperformed the previous Pro tier before the new Pro even launched — and the speed/cost curve for agentic AI will have bent faster than anyone modeled.
- If LangChain or LlamaIndex integrates Cq-style shared agent memory within 60 days, the retry-loop token problem gets solved at the framework layer before model vendors fix their pricing — and orchestrators capture margin that labs assumed was theirs.
- If a major MCP-using vendor (Anthropic, Cursor, Cline) publishes an emergency advisory tied to BadHost or VIPER-MCP within a week, the agent tool layer enters its first real security incident cycle.
The Closer
A COO confessing into a podcast that he set $3.4 billion on fire and can't say what it bought; a paper arguing language models need sleep, presumably with little weighted blankets; and a vulnerability called BadHost quietly threading itself through the plumbing of half the AI agents running in production right now. The honest summary of enterprise AI in May 2026 is that we built models that code faster than humans, then realized we forgot to build the part that tells us if the code was worth writing. Onward.
Forward this to the engineering lead who keeps asking why the AWS bill grew a leg.