The Lyceum: AI Daily — May 14, 2026
Photo: lyceumnews.com
Thursday, May 14, 2026
The Big Picture
Three threads pulled tight overnight: a humanoid robot worked an eight-hour shift on a public livestream, Alibaba booked nearly nine billion yuan of AI revenue in a single quarter, and the UK's AI Security Institute published numbers showing a frontier model can now hack a corporate network in three out of ten tries. The receipts are arriving. The "AI is coming" era is quietly ending — what's left is the messier work of measuring whether it actually shows up to work.
What Just Shipped
- DeepSeek V4 Pro (DeepSeek): MIT-licensed open-weight release, $1.74 in / $3.48 out per million tokens, 1M token context.
- Kimi K2.6 (Moonshot AI): Now live across DeepInfra ($0.75 / $3.50), AtlasCloud ($0.95 / $4.00), Phala ($1.09 / $4.60), and Together ($1.20 / $4.50) — 256K context.
- MiMo v2.5 Pro (Xiaomi): $1.00 in / $3.00 out, 1M token context — Xiaomi's latest reasoning model.
- Trinity Large Preview (Arcee AI): $0.15 in / $0.45 out, 131K context — aggressively priced preview from the enterprise fine-tuning shop.
- Qwen3.6 35B A3B (Alibaba): Open-source MoE with only 3B active parameters per token, 262K context, live on Together.
- Gemma 4 E2B / E4B IT (Google): Instruction-tuned variants of the Gemma 4 family, 131K+ context.
Today's Stories
Figure AI Just Ran an 8-Hour Robot Shift — Live, Unedited, No Humans
The most important question in humanoid robotics isn't whether a robot can do a task. It's whether it can do the same task for an entire shift without someone babysitting. Figure AI put that question to a public test overnight, and the answer was complicated in exactly the right way.
Responding to a challenge from robotics diligence director Scott Walter — who'd argued humanoids have "limited utility" until they can prove shift-length endurance — Figure CEO Brett Adcock launched an eight-hour livestream of Figure 03 units sorting packages at a conveyor, flipping boxes so shipping labels faced downward, all governed by the company's Helix-02 neural network. Humanoids Daily clocked throughput at 2.6 seconds per package. It wasn't flawless: early in the stream, a robot failed to flip a cardboard box and let it pass label-up.
The clever part is the architecture. Figure 03 has a five-hour battery, so an eight-hour autonomous shift requires something more than endurance — it requires a fleet. Adcock described "multi-robot coordination with autonomous failover": when a unit detects a problem, it self-diagnoses, walks itself to maintenance, and requests a replacement from the fleet. No humans in the loop. A clip circulating on r/singularity appears to show one unit briefly losing its footing and grabbing nearby scaffolding to stabilize before continuing — community-sourced, not engineering-verified, but suggestive.
If this succeeds, the conversation changes from "can humanoids do tasks" to "what's the per-hour rate." If it doesn't — if the failure logs Figure hasn't released turn out to show heavy intervention — the demo joins a long list of impressive livestreams that didn't survive third-party scrutiny. Watch for telemetry. That's where the real signal lives.
Alibaba Just Proved China's AI Revenue Is Real
For two years, the question hovering over Chinese AI has been: is any of this actually making money? Alibaba's March-quarter earnings, reported Wednesday, are the clearest answer yet — and the number that matters isn't the headline.
AI-related product revenue hit RMB 8.97 billion in the quarter, the eleventh consecutive quarter of triple-digit year-over-year growth, according to Alibaba's earnings release. Cloud Intelligence external revenue grew 40% year-over-year in the quarter, and AI products now account for 30% of cloud external revenue in the quarter — management told analysts it expects that to clear 50% within a year.
Then the forward guidance got loud. CEO Eddie Wu said annualized recurring revenue from AI model and application services should top RMB 10 billion in the June quarter and RMB 30 billion by year-end — and that compute demand is so strong Alibaba will spend more on infrastructure over the next five years than the RMB 380 billion it had previously committed across three. Yahoo Finance reported executives describing payback windows in years, not decades, and calling margin "secondary."
If this trajectory holds, the "China AI is all hype" narrative would weaken amid signs that Beijing's domestic compute buildout could be strategically rational rather than purely defensive. If Tencent's next quarter — reported this week with profit short of expectations — shows the same inflection, the case would strengthen. Alibaba's U.S.-listed shares jumped more than 7% intraday on the print.
Mythos Hacked a Corporate Network in 3 of 10 Tries. The Next Checkpoint Reportedly Did It in 6.
The UK's AI Security Institute published its evaluation of Anthropic's Claude Mythos Preview — the model Anthropic has declined to release publicly because of its cyber capabilities — and the results deserve to be read slowly.
AISI built a benchmark called "The Last Ones": a 32-step corporate network attack simulation, from reconnaissance to full takeover, that AISI estimates takes a human expert roughly 20 hours. Mythos Preview is the first model to solve it end-to-end, succeeding in 3 of 10 attempts and averaging 22 of 32 steps across all runs. Anthropic separately disclosed in Project Glasswing that it used Mythos Preview to find thousands of zero-day vulnerabilities across every major operating system and browser.
The dual-use part is what should keep security leads awake. Anthropic says it didn't train Mythos to specialize in exploitation — the capability emerged as a downstream consequence of general reasoning and software-engineering gains. A community thread on r/singularity is now circulating numbers from what appears to be a newer Mythos checkpoint solving "The Last Ones" in 6 of 10 attempts and clearing a previously unsolved task called "Cooling Tower" in 3 of 10. Treat the checkpoint figures as unverified — but if AISI or Anthropic confirm them, the results would suggest the cyber time-horizon doubling rate AISI had pegged at five months may be accelerating.
CyberScoop reported Wednesday that GPT-5.5 is showing similar gains in Palo Alto Networks evaluations. The question isn't whether other labs catch up. It's whether any of them choose to withhold.
California Opens a Probe Into xAI Over Deepfake Nudes on X
California Attorney General Rob Bonta has opened an investigation into xAI and X following a surge of complaints that an image-editing feature on the platform is being used to generate non-consensual sexually explicit images, LAist reported. The probe is examining whether the companies violated California's deepfake and non-consensual intimate imagery statutes.
If this succeeds as enforcement, image-editing tools get gated, audited, and watermarked the way age-restricted products are — and product teams shipping anything involving identity manipulation start carrying a new line item on the risk register. If it stalls, expect a state-by-state patchwork that pushes the harm into whichever jurisdiction enforces last. The observable signal: whether other state AGs file parallel actions in the next two weeks, and whether xAI ships any product-level mitigations before they do.
OpenAI Endorses an IAEA-Style Global AI Body — With China Inside
Buried in this morning's LLM news roundup, citing Bloomberg: OpenAI now says it would support a US-led global AI governance body modeled on the International Atomic Energy Agency — and explicitly including China as a member. The IAEA analogy has floated around policy circles for years. A frontier lab endorsing it on the record, with Beijing in the tent, is new.
The timing is pointed: it lands the week of Google I/O, when attention is elsewhere, and days after the Pentagon expanded frontier-model defense contracts to xAI, Google, and Anthropic. An IAEA-style body would create inspection and disclosure obligations cutting both directions — useful for a company that is simultaneously a commercial entity and a de facto national security asset, friction for one that needs to keep its weights closely held. Treat as single-source attribution; the policy implication is large enough to track regardless.
Mozilla.ai Posts "Cq" — a Stack Overflow for Coding Agents
A Show HN from Mozilla.ai's Peter Hall surfaced a proof-of-concept called Cq: a system for storing and sharing what coding agents learn when they hit repo-specific gotchas. Agents submit "knowledge units" after getting stuck, a human reviews them, and the lessons become available to other agents working the same codebase.
It's prototype-level, surfaced on Hacker News rather than launched as a product. But it's a concrete artifact of the next operational fight in agent deployments: shared memory across fleets, not just smarter base models. Pair it with oobabooga's TextGen shipping a one-minute-install desktop runtime for local models, and you can see the plumbing layer starting to harden. If agentic work scales, this is the layer that decides whether it scales cheaply.
Open-Source Agents Are Hitting a Cloudflare Wall
Multiple threads across r/LocalLLaMA and adjacent repos are reporting a sudden spike in failed web-search requests and gateway timeouts for autonomous agent workflows. Developers point at recent Cloudflare bot-blocking behavior and tighter access to free search-index APIs as proximate causes. Vendor confirmation is absent; treat as a developer-reported friction signal, not a coordinated policy.
If it persists, RAG and agent flows that depend on cheap, anonymous web access get pushed onto paid authenticated connectors — which favors well-capitalized agent companies and squeezes the hobbyist and open-source layer. The early signal to watch: whether Brave Search, Tavily, or Exa raise prices in the next 30 days. They're the obvious arbitrage.
⚡ What Most People Missed
- A Chinese GPU just got into a top-tier inference framework: Moore Threads' MUSA GPU architecture has been merged into SGLang mainline — the first time a domestically produced Chinese GPU has earned what Sohu calls a "native ticket" into a top-tier open-source inference runtime. A small crack in Nvidia's grip on the inference stack inside China. [Source: Sohu — Chinese (Simplified)]
- Qwen is now wired into Taobao checkout: Alibaba has connected Qwen to Taobao so users can browse, compare, place orders, and manage deliveries through natural conversation at Taobao scale. This is a move from pilot to transaction-level productization — it exposes the model to live conversion and fulfillment data flows that materially accelerate the path to monetization and real-world A/B testing.
- A new paper flags a "lucky pass" problem in SWE-agent evals: A title surfaced on Hugging Face's daily papers today — "AgentLens: Revealing the Lucky Pass Problem in SWE-Agent Evaluation" — arguing that current coding-agent benchmarks may be crediting solutions that succeed by accident. Not yet peer-reviewed. If the critique holds, every leaderboard claim from the past year needs an asterisk.
- A first-of-its-kind AI child-pornography prosecution in Japan: Yomiuri Shimbun reports a former elementary school teacher has been indicted for possession of AI-generated child sexual abuse material — described as the first such prosecution nationally. The legal precedent matters far beyond Japan. [Source: Yomiuri Shimbun — Japanese]
📅 What to Watch
- If Figure AI publishes detailed failure logs from the eight-hour stream within a week, the demo could become a credible benchmark for shift-length endurance and fleet-level autonomous failover claims; if they don't, the silence will be treated as a negative signal.
- If Tencent's next quarter shows an Alibaba-style AI revenue inflection, Western export-control debates over chips and cloud services may face stronger pushback from business and diplomatic stakeholders.
- If AISI or Anthropic confirm the leaked Mythos checkpoint numbers, defenders' response window would shrink materially, increasing pressure on vendors to change release practices and on incident response teams to automate mitigations.
- If a second state AG joins California's xAI probe before month-end, expect platforms to implement gating, forensic watermarking, and age/consent verification for image-editing features as a practical compliance path.
- If Bloomberg's OpenAI/IAEA story is confirmed by a named OpenAI executive, expect rapid political blowback from US national-security hawks who oppose China sitting on any board that inspects American models.
The Closer
A humanoid grabbing scaffolding to stay upright on a livestream, a Japanese elementary-school teacher indicted for prompts, and OpenAI politely inviting China to the inspection party. The receipts are arriving — they're just printed on weird paper. Until tomorrow.
Forward this to the one friend who'll appreciate that "Cooling Tower" is now a benchmark and not a place.