The Lyceum: AI Daily — Mar 26, 2026
Photo: lyceumnews.com
Thursday, March 26, 2026
The Big Picture
A humanoid robot addressed world leaders at the White House. A new benchmark proved the best AI systems on Earth can't figure out a simple puzzle game. And a senator and a representative introduced a bill to freeze the construction of every building where AI gets built. The technology is accelerating into the physical world, the evaluation tools are finally catching up to the hype, and the politics just showed up with a shovel ban.
Today's Stories
A Humanoid Robot Just Walked Into the White House — and Gave a Speech
Figure AI's third-generation humanoid, Figure 03, walked on two legs into the White House East Room yesterday, stood beside First Lady Melania Trump at her international technology summit hosting spouses of leaders from 45 nations, and delivered prepared remarks — concluding with greetings in 11 languages that CNN described as having "perfect pronunciation." Then it walked back down the Cross Hall under its own power.
The robot runs on Figure's proprietary vision-language-action model called Helix, which processes real-time visual data from cameras in its hands and responds to verbal commands, per NBC Washington. Figure AI CEO Brett Adcock told reporters the robot was operating "fully autonomous" during the demo, per SFGate. NPR noted that Figure 03 was introduced in October 2025 for household tasks — laundry, cleaning, dishes — not political theater. The AP's coverage emphasized the educational framing: the robot's remarks were about empowering children with technology.
The event was clearly staged PR. But a humanoid robot just addressed world leaders on the record at 1600 Pennsylvania Avenue, and that hadn't happened before yesterday. NBC reports the Trump administration has separately explored using similar robots as military aids. The real test comes next: if Figure 03 or a competitor shows up in an unscripted operational setting — a factory floor, a hospital ward — rather than a controlled ceremony, that's when the technology claim becomes verifiable. Until then, this is a cultural milestone dressed as a policy event, and the image of a robot walking the Cross Hall is going to live in people's heads longer than any spec sheet.
ARC-AGI-3 Drops — And Frontier Models Score Basically Zero
The ARC Prize Foundation launched ARC-AGI-3 yesterday, and the first scores are brutal. Google's Gemini 3.1 Pro Preview: 0.37% on ARC-AGI-3. OpenAI's GPT-5.4 High: 0.26% on ARC-AGI-3. Anthropic's Opus 4.6 Max: 0.25% on ARC-AGI-3. xAI's Grok-4.20: 0.00% on ARC-AGI-3. Human baseline: 100% on ARC-AGI-3.
This isn't another leaderboard where models trade fractions of a percent. ARC-AGI-3 changes the format entirely. Each test is a turn-based puzzle game with no instructions, no stated win conditions, and no prior exposure. The agent sees a visual state, takes an action, observes the result, and must figure out what it's trying to do — like being dropped into a video game with the manual ripped out. Humans find these games fun and finish them quickly. Per the ARC Prize Foundation, over 1,000 such scenarios are included, and the benchmark was calibrated against 1,200+ human players across 3,900+ games.
The key innovation is a metric called Relative Human Action Efficiency: you're graded not just on solving the puzzle but on how many moves you burn compared to a human. An AI can technically beat every level and still "fail" the benchmark. Per Fast Company, a high score would serve as genuine evidence of artificial general intelligence, because real-world agents need to reason through unfamiliar situations, form abstractions, and generalize — not recall patterns from training.
If a major lab posts above 20% on ARC-AGI-3 within 60 days, it would indicate targeted engineering can crack the benchmark and could shift R&D priorities and investor expectations. If scores stay below 15% through the summer of 2026, François Chollet and his team have built the evaluation that finally measures something the industry can't game. This is currently the only major AI benchmark where humans are winning by a factor of eight. ARC Prize 2026 has over $2 million in prizes, with a first scored deadline of June 30, 2026. Watch whether a Chinese lab — DeepSeek, Kimi, or a university team — lands on the leaderboard ahead of U.S. frontier labs. That result would likely appear in congressional testimony within weeks.
Sanders and AOC Want to Freeze the AI Data Center Buildout
Senator Bernie Sanders and Representative Alexandria Ocasio-Cortez unveiled the AI Data Center Moratorium Act on Tuesday, March 24, 2026, legislation that would halt all new AI data center construction nationwide until Congress enacts comprehensive safeguards covering safety, environmental impact, labor protections, and electricity pricing. The bill defines an AI data center as any facility used to develop or operate AI at scale, or one with a peak power load exceeding 20 megawatts with high-performance racks or liquid cooling, per Gizmodo. The measure was introduced on March 24, 2026, and referred to the Senate Committee on Commerce, Science, and Transportation and the House Committee on Energy and Commerce; no committee votes have occurred.
The political reaction was immediate and polarized. Senator Mark Warner called a blanket moratorium "idiocy" that would hand China an advantage, per Axios. On the same day Sanders unveiled his bill, the White House named Meta CEO Mark Zuckerberg and Nvidia CEO Jensen Huang to a new Council of Advisors on Science and Technology — moving in the exact opposite direction. Per the Guardian, the bill foregrounds energy costs, water use, and community disruption as core drivers, not abstract AI risk.
The bill faces long odds. But it's the first significant U.S. legislative attempt to move the "AI pause" from open letters into enforceable law, and it targets infrastructure rather than models — shovels and power contracts, not algorithms. A March 2026 Pew Research poll found just 10% of Americans say their excitement about AI outweighs their concern, per TechCrunch. Two totally different governments are operating inside the same country on this issue. If the bill gets even one Republican co-sponsor, the political calculus around AI infrastructure flips overnight. Watch the Senate Committee on Commerce, Science, and Transportation calendar.
ARC-AGI-3's Scoring Method Is the Quiet Revolution in AI Evaluation
Buried beneath the headline scores is a methodological shift that matters more than any single number. ARC-AGI-3 is the first major AI benchmark that formally measures learning efficiency — not task completion, not accuracy, but how quickly an agent figures out what to do compared to a human doing the same thing for the first time.
The ARC Prize Foundation collected data from over 1,200 human players across more than 3,900 games to establish the baseline every AI agent is scored against. The environments are interactive and turn-based: no instructions, no descriptions, no win conditions stated. Agents must infer objectives purely from interaction. Per the technical report, this format is designed to test whether models can reason through novel problems rather than recall patterns — and the Kaggle evaluation environment runs with no internet access and no external API calls, meaning the benchmark explicitly rewards systems that generalize rather than look things up.
If this framing takes hold, it redefines what "better" means in AI. Labs that have historically used ARC-style wins to imply proto-AGI progress now face a benchmark explicitly designed to puncture that narrative unless an agent truly generalizes. The shift from "did you solve it?" to "how efficiently did you learn to solve it?" is the most important change in AI evaluation methodology in years — and almost nobody is writing about it that way yet. The observable signal: if major labs start publicly submitting flagship models and posting mediocre scores, the current generation of "almost AGI" marketing will need to be rewritten around more agent-like architectures.
Intel Drops a $949 GPU With 32 GB of VRAM, Targeting the Local AI Crowd
Intel announced the Arc Pro B70 and Arc Pro B65 — workstation GPUs with 32 GB of VRAM, with the B70 starting at $949. Tom's Hardware positions these explicitly as cards for developers and researchers who need memory capacity for running large language models locally without paying for Nvidia's professional tier.
Separately, r/LocalLLaMA is buzzing about rumors of an even cheaper consumer-tier 32 GB Intel card — potentially sub-$300 — though that remains unconfirmed and sourced from enthusiast forums, not Intel. The confirmed Arc Pro B70 at $949 is still significant: 32 GB of VRAM is enough to run 70-billion-parameter models without the quantization contortions or multi-GPU setups that currently gate serious local inference.
The catch is ecosystem. Intel's Vulkan and oneAPI support in frameworks like vLLM and llama.cpp is improving but still trails Nvidia's CUDA stack significantly. If Intel lands solid framework support by mid-2026, Nvidia's lock on local inference finally has a credible crack at the low end. If driver stability under sustained loads remains flaky, this becomes another "great on paper" moment. The gating signal: watch whether vLLM and llama.cpp maintainers prioritize Intel backends in their next release cycles, and whether Nvidia or AMD respond with price or SKU adjustments.
⚡ What Most People Missed
- Oracle is turning the database into an agent substrate. Oracle announced AI Database 26ai with a "Unified Memory Core" and no-code "Private Agent Factory" that fuses vector, JSON, graph, and relational data with row-level security and persistent agent memory, per Futurum Group. If databases become where agents live rather than bolt-ons, enterprise adoption gets easier and the integration tax drops significantly.
- Mozilla shipped "Cq," a Stack Overflow built for AI agents, not humans. The early-stage project stores structured "recipes" — error signatures, minimal failing examples, step-by-step fix scripts — that coding agents query via API. The security question is immediate: how do you prevent poisoned knowledge from propagating across every agent that queries the system?
- Korean chaebols are betting their next decade on actuators. Samsung's Rainbow Robotics unit is building "physical AI engines" in-house; LG declared 2026 the "starting gun" for robotics and is manufacturing robot joints that represent 40% of humanoid costs, per Korea Times. If they crack affordable actuators at scale, expect sub-$20K robots flooding manufacturing and logistics.
- China approved its first invasive brain-computer interface for commercial clinical use. The NEO system received a Class III medical device certificate from China's National Medical Products Administration, shifting BCI from experimental to revenue-stage — and creating new input channels for embodied AI systems (per 36Kr).
📅 What to Watch
- If a major lab scores above 20% on ARC-AGI-3 within 60 days, it would indicate targeted engineering is cracking the benchmark and would likely shift R&D priorities and investor expectations toward sample-efficiency and learning-architecture research.
- If the Sanders-AOC moratorium bill attracts a single Republican co-sponsor, the framing shifts from progressive protest to bipartisan legislative threat, and long-term power purchase agreements, site financing, and permitting strategies for planned data centers could be upended.
- If Intel's Arc Pro B70 gets prioritized in vLLM and llama.cpp release cycles by mid-2026, it signals Nvidia's CUDA moat is finally eroding at the local-inference tier — watch for Nvidia or AMD price or SKU responses.
- If another humanoid robot appears at a Chinese state-backed policy event this spring, the U.S.–China competition in physical AI symbolism would join chips and models as a third front and could accelerate procurement decisions and export-control responses tied to embodied systems.
The Closer
A robot gave a speech at the White House and nobody asked it a follow-up question. The smartest AI systems on the planet scored zero on a video game designed for children. A senator and a representative tried to ban the buildings where all of it happens, and the White House responded by putting the builders on an advisory council.
The benchmark that finally humbles frontier AI measures how fast you learn — which, ironically, is the one thing Congress has never been tested on either.
More tomorrow. —The Lyceum
If someone you know is trying to keep up with AI without losing their mind, forward this.