The Lyceum: AI Daily — May 15, 2026
Photo: lyceumnews.com
Friday, May 15, 2026
The Big Picture
Three humanoid robots named Bob, Frank, and Gary worked a 30-hour shift this week with no humans on the floor, Anthropic published a 2028 scenario paper that's really a chip-export lobbying brief in research clothes, and arXiv decided that hallucinated citations are now a one-year ban. Today's news isn't one breakthrough — it's the moment several adjacent systems (factories, academic publishing, geopolitics, corporate desktops) all started reacting to AI capability that already shipped. The machines got more useful. The institutions around them are finally starting to grow teeth.
What Just Shipped
- DeepSeek V4 Pro (DeepSeek): 1M-token context window, priced at $1.74 in / $3.48 out per million tokens — one of the most aggressive frontier-tier price points to date.
- DeepSeek V4 Flash (DeepSeek): Open-source, non-reasoning MIT-licensed sibling to V4 Pro with the same 1M context window.
- Kimi K2.6 (Moonshot AI): 256K context window, propagating across multiple provider listings this week.
- MiMo v2.5 Pro (Xiaomi): 1M-token context — Xiaomi's deepest push yet into frontier-scale models.
- Qwen3.6 35B A3B (Alibaba): A new Mixture-of-Experts variant with a 262K context window.
- NVIDIA Nemotron 3 Super (NVIDIA): 120B parameters total, only 12B active at inference, with a 1M-token window — built for long-context efficiency.
- Codex Windows Sandbox (OpenAI): A purpose-built local execution environment that finally makes Codex agents safe to run on corporate Windows machines.
Today's Stories
Bob, Frank, and Gary Just Worked 30 Hours Straight
The question in humanoid robotics has never been can a robot do the task. It's can it keep doing it, alone, for long enough to matter. Figure AI just answered.
On May 13, Figure put three Helix-02-powered humanoids on a package-sorting line for what was billed as an eight-hour shift, processing barcoded packages at speeds the company says match human throughput. Then the robots kept going. By May 14, they were closing in on 30 hours of continuous, fully autonomous operation — no teleoperation, every action generated onboard by Helix-02, per Figure's own statements. The internet named them Bob, Frank, and Gary; Figure added the name tags.
The capability worth flagging isn't endurance — it's recovery. If a robot gets stuck, Helix-02 triggers an autonomous reset and resumes. If a unit needs maintenance, it walks itself off the floor while another robot takes over. According to livestream data cited by Maeil Business Newspaper, a unit named Gary sorted 10,000 packages in 7 hours 44 minutes before retreating to a charger, with Frank seamlessly stepping in.
What's still unverified: independent confirmation. A live r/singularity thread is arguing whether the run is truly hands-off or whether human operators are doing silent recoveries off-camera — that's a Tier 3 community signal, not evidence. If this is what Figure says it is, autonomous fleet behavior is the milestone, and an announcement by a commercial customer is the next observable signal. If it isn't, expect a journalist or competitor to surface the asterisk within weeks. Either way, every warehouse operator in America just put their labor attorney on speed dial.
Anthropic Published a Geopolitics Paper Disguised as Research
Anthropic dropped a scenario paper this week framed as 2028 forecasting. It's really a lobbying brief, and the timing is the tell.
The paper sketches two futures. In one, the US tightens chip export controls, disrupts Chinese distillation attacks (where labs illicitly extract capabilities from leading American models), and locks in a 12–24 month lead in frontier AI by 2028. In the other, those loopholes stay open, China closes the gap, and authoritarian regimes use frontier models to industrialize censorship, surveillance, and offensive cyber operations. Anthropic also flags near-term systemic risks — automated spear-phishing at scale, AI-driven market manipulation — rather than dwelling on classic existential-AGI framing.
Here's context the paper doesn't mention: media have reported congressional activity on chip export controls in early 2026, and U.S. and Chinese delegations were reportedly discussing AI guardrails at a Beijing summit the same week, per MarketScreener citing Bessent. If this paper is cited in policymaker testimony in the next sixty days, Anthropic will have demonstrated that frontier labs can move chip policy directly. If it doesn't, it's a well-written white paper that landed in a busy news week.
arXiv Just Made Hallucinated Citations a Career-Ending Mistake
If you use AI to draft research papers and don't check every citation, you have a serious new problem.
arXiv began enforcing a blunt policy this week: submit a paper with AI-hallucinated citations — references to papers that don't exist — and you're banned for a year. After that, all future arXiv submissions must first clear peer review at a reputable venue. For AI researchers who rely on arXiv for the preprint-first workflow that defines the field, that's a career-grade penalty.
The severity matches the scale. Hallucinated citations have risen tenfold since 2023, hitting 1 in every 277 papers in early 2026. At NeurIPS 2025, GPTZero scanned 4,841 accepted papers and found 100+ fabricated citations across 53 of them — each of which had cleared three human reviewers, because reviewers almost never check whether cited papers actually exist. A separate large-scale audit on arXiv estimated 146,932 hallucinated citations in 2025 alone.
The asymmetry is what to watch: well-resourced labs with citation-checking workflows will be fine. Solo researchers and early-career authors — disproportionately represented in the empirical data — carry the most exposure. If NeurIPS and ICML adopt similar enforcement before ICML 2026 opens in Seoul on July 7, AI-assisted writing becomes a measurable career risk across the field. If they don't, arXiv becomes a quarantine zone while the conferences remain leaky — and the policy will have been theater.
OpenAI Quietly Fixed the Boring Problem Holding Back Coding Agents
The most useful AI news of the day is the least glamorous. OpenAI published a Windows sandbox for Codex — a purpose-built local execution environment that lets the agent run code on a Windows machine without either trusting it with weak isolation or forcing the user through awkward workarounds.
This matters: agentic AI is only useful when it can touch tools, files, and workflows without turning a laptop into a crime scene. Sandboxes are the seatbelts. They're what turn "the model can write code" into "the model can reliably do work on the operating system 80% of corporate desktops actually run." If OpenAI follows this with broader Codex rollout assuming local execution by default, coding agents quietly cross from enthusiast toy to corporate IT reality. The signal to watch: enterprise Codex case studies citing on-machine execution rather than cloud sandboxes.
Figure's Real Story Is the Factory, Not the Livestream
Buried beneath the 30-hour livestream is the number that actually changes the industry: Figure's BotQ manufacturing facility has gone from one Figure 03 robot per day to one per hour — a 24x throughput improvement in under 120 days — with over 350 third-generation units delivered, per TechTimes citing Figure's own disclosures.
Endurance demos are marketing. Production capacity is the constraint that determines whether humanoids become an industry or stay a YouTube genre. At one unit per hour, Figure is now producing roughly 8,000 robots annually if it runs flat-out — a scale that crosses from pilot deployments into something real industries can actually plan around. The watch item: whether a Fortune 500 logistics operator names Figure as a contracted supplier in the next two quarters. That's the moment humanoid robotics stops being a demo category.
Google Just Bought 500 MW of Texas Sun for 15 Years
DatacenterDynamics reports Google signed a 15-year power purchase agreement with Linea Energy for 500 megawatts of solar in the ERCOT market, explicitly tied to data center operations in Texas. On its own, it's a power deal. In context — Google committed $40 billion to Texas cloud and AI infrastructure through 2027 last November — it's another data point that AI capacity is now being telegraphed through electricity contracts before it shows up in product roadmaps.
Five hundred megawatts isn't sustainability branding; it's utility-scale, long-duration procurement, the kind of move you make when you expect sustained load. If a second hyperscaler announces a comparable Texas PPA before Q3, ERCOT becomes the de facto frontier of AI infrastructure planning, and every state grid operator outside Texas starts negotiating from a weaker position.
Physical AI Is Now an Insurance Underwriting Category
While the public watches humanoid demos, the insurance industry is figuring out how to price them. Coverage surfaced this week describes underwriters designing policies for "physical AI" — humanoid robots, autonomous vehicles, AI-driven machines — and asking questions manufacturers have largely deflected: who's liable when an agent misclassifies an obstacle, how do you audit decision logs when a fleet causes a multi-million-dollar loss, what override mechanisms must exist for coverage to apply?
The mechanism worth watching: insurance requirements routinely become de facto safety standards before regulators catch up. If carriers refuse to underwrite humanoid deployments without telemetry, explainability, and human override, those become operational requirements regardless of what any federal regulator does. If a major carrier publishes physical-AI policy templates this year, that's the moment safety standards start getting written in actuarial tables instead of legislative hearings. (Reporting surfaced via a Google News aggregator — track the underlying outlet for primary detail.)
⚡ What Most People Missed
- Xiaomi is vertically integrating AI from model to appliance: Xiaomi president Lu Weibing confirmed a new air conditioner line with edge AI chips for local sensor processing plus cloud-connected large models — the same week MiMo v2.5 Pro shipped with a 1M context window. This is more than another model release: a consumer electronics giant putting silicon and frontier models inside ordinary appliances shortens the route from research to mass-market deployment and changes competitive dynamics with Western hardware-software stacks. [Source: Phoenix News — Chinese]
- The FDA is already running AI-triaged inspections: The agency completed dozens of one-day inspection pilots driven by an AI risk model by late April, with investigator findings feeding back into the model — a closed regulatory loop. Most resulted in "No Action Indicated" outcomes, which is exactly what would make industry stop complaining about it.
- Houston Methodist published an AI platform that proposed a real drug candidate: iS2C2 combines mathematical modeling with LLM reasoning to decode cellular communication; applied to bone-cancer metastasis data, it surfaced an existing breast-cancer therapy as a candidate to block early spread. Peer-reviewed, dateable, testable in a lab.
- Vulnerability discovery just spiked, and defenders are behind: VulnCheck documents a wave of AI-assisted CVE reports affecting Mozilla, Microsoft, Apache, Curl, and Palo Alto. A community-sourced r/singularity claim that Anthropic's withheld Claude Mythos Preview helped produce a macOS kernel exploit on Apple M5 in days is unverified Tier 3 — but the broader pattern is real, and patching backlogs are growing.
- GitHub Trending is converging on agent memory: Multiple agent-framework projects are climbing the rankings together, all focused on persistent memory and reusable skill scaffolding. Developers have stopped arguing about base models and started solving the operational bottleneck that actually breaks multi-step agent workflows.
📅 What to Watch
- If Figure announces a named Fortune 500 logistics customer within 60 days, humanoid robotics crosses from livestream to contract — and major logistics firms may be forced to renegotiate labor agreements, accelerate capital allocation for automation, and commit to multi-year deployment plans.
- If NeurIPS or ICML adopts arXiv's hallucination policy before ICML 2026 opens July 7, AI-detection tools — notoriously prone to false positives — become career-defining for early researchers without institutional citation-checking pipelines.
- If a second hyperscaler signs a 500+ MW Texas PPA this quarter, ERCOT becomes the planning constraint for US AI infrastructure, and state grid regulators outside Texas lose negotiating leverage.
- If a major insurance carrier publishes physical-AI underwriting templates this year, robot safety standards get written by actuaries before they get written by legislators.
The Closer
Three robots named Bob, Frank, and Gary worked a 30-hour shift while their human counterparts slept; a frontier AI lab published a chip-policy lobbying brief and called it research; and academic publishing finally noticed its citations were being written by something that couldn't read them. The robots have names now, the safety researchers have lobbyists, and the citations have a one-year ban — somewhere in there is a sentence about who's still doing original work, but Bob's on a smoke break. That's the briefing.
Forward this to the friend who keeps asking what's actually happening in AI this week — Bob, Frank, and Gary would want them to know.