The Lyceum: Agentic AI Weekly — May 05, 2026
Photo: lyceumnews.com
Week of May 5, 2026
The Big Picture
This was the week the agent economy started looking like a real economy — with real bills, real regulators, and real backseat drivers. Uber blew through its entire 2026 AI budget by April after engineers widely adopted their coding assistant. Sierra raised $950 million on the strength of agents that already handle billions of customer conversations. And the Trump White House — which spent its first year cheering AI on from the sidelines — is now reportedly drafting an executive order to vet frontier models before they ship. The "move fast" phase isn't dead, but it just got an invoice and a permit office.
What Just Shipped
- Microsoft Agent 365 (GA) (Microsoft): Generally available May 1; observes and secures both Microsoft and third-party agents across an organization, including detection of unsanctioned tools.
- Agentforce Operations (Salesforce): Pushes Agentforce out of the front office and into approvals, audits, and onboarding workflows.
- OpenAI on AWS Bedrock (OpenAI + AWS): Limited preview bringing OpenAI models, Codex, and managed agents inside Amazon Bedrock — usage counts toward AWS commitments.
- Project Glasswing with Claude Mythos Preview (Anthropic): Defensive-only deployment to 12 partners including Microsoft, Apple, and CrowdStrike; backed by up to $100M in usage credits.
- Jido 2.0 (Jido): Elixir-based agent framework targeting fault tolerance and high concurrency for production-grade agent fleets.
- GitHub Official MCP Server (GitHub): Production-ready MCP server with OAuth, allowlisting, and host compatibility for VS Code, Cursor, and Claude Desktop.
This Week's Stories
Uber Burned Its Entire AI Budget by April — and That's Actually a Good Sign
Most enterprise AI stories complain about adoption being too slow. Uber's story is the opposite — and it's more instructive.
CTO Praveen Neppalli Naga acknowledged at a TechCrunch event that Uber had already spent its full annual AI budget with eight months left on the calendar. Anthropic's Claude Code, rolled out in December 2025, spread through Uber's 5,000-engineer organization faster than finance had modeled. Usage doubled by February 2026. By March 2026, 84% of developers were classified as agentic coding users. By April 2026, the money was gone.
The production numbers are striking: roughly 10% of all code at Uber is now generated autonomously as of April 2026, and one team built a hotel-booking integration in six months that would normally have taken a year. Visa, on a similar trajectory, consumed roughly 1.9 trillion tokens in March 2026 alone — about double its February 2026 figure. At Disney, one user reportedly prompted Claude around 460,000 times over nine working days, a rate only achievable through agents running in the background.
The question this forces: does AI spend belong on the cloud-infrastructure line of the budget, with its own governance, or in software-as-a-service, where overruns this size would get someone fired? If usage keeps compounding, expect CFOs to push for hard token caps per team — and watch for the first public company to flag AI compute as a material risk in an earnings call. That'll be the moment this stops being a TechCrunch quote and starts being an SEC disclosure.
Sierra Raised $950 Million as Agents Are Already Picking Up the Phone
If you want to know where the agent economy is heading, follow the money — and nearly a billion dollars just moved in one direction.
Sierra raised $950 million at a $15.8 billion post-money valuation, led by Tiger Global and Google's GV with Benchmark, Sequoia, and Greenoaks participating. The company was founded three years ago by Bret Taylor (who died January 2024) and Clay Bavor. according to TechCrunch, Sierra has gone from four design partners to serving over 40% of the Fortune 50, with agents now handling billions of customer interactions a year for Prudential, Cigna, Blue Cross Blue Shield, Rocket Mortgage, and roughly a third of the world's largest banks. ARR went from $100 million in November 2025 to $150 million in February 2026.
The detail that matters most isn't the round size — it's the pricing model. Sierra charges only when its agent successfully resolves an interaction. No resolution, no charge. That bet is the entire thesis: if outcome-based pricing becomes the standard, every enterprise AI vendor will eventually have to defend their per-seat or per-token bill against a competitor saying "we only charge when it works."
What to watch: whether Sierra's outcome-based pricing spreads, and whether Ghostwriter — Sierra's agent-that-builds-other-agents — turns enterprise deployment from an engineering project into a procurement one. Failure looks like enterprises balking at the metering or finding that "successful resolution" is harder to define than the contract suggests.
Microsoft Just Turned Agent Governance Into a Product Category
If your company has started experimenting with agents, there's a decent chance you already have a "who approved this thing?" problem. Microsoft's May 1 general-availability launch of Agent 365 is essentially an admission that enterprises have stopped just testing agents and started trying to count them.
Agent 365 gives IT and security teams a single place to observe and secure agents across the organization — Microsoft's own and third-party. The most revealing detail is that Microsoft explicitly calls out "shadow AI," noting that customers in its Frontier program can detect whether unsanctioned tools are running inside the organization. A June 2026 update to Defender will add asset mapping showing which devices agents run on, which Model Context Protocol servers they touch, and what identities and cloud resources they can reach.
That's not a feature list. It's a map of what enterprises are afraid of. Agents are becoming a new class of privileged user — and the companies selling visibility into their behavior just identified a new market category. If rivals start shipping near-identical "identity and security for agents" suites in the next two quarters, the experiment phase is over.
The White House Is Quietly Reversing Course on AI Oversight
Six months ago, the Trump administration's AI position was simple: get out of the way and let American companies win. That position is changing.
According to a New York Times report relayed by Reuters and others on Monday, the White House is considering an executive order that would establish a working group of tech executives and government officials to examine pre-release review procedures for new AI models. Senior officials reportedly briefed executives from Anthropic, Google, and OpenAI last week. A White House official told reporters that "any policy announcement will come directly from the president" and called the discussions "speculation."
The timing isn't subtle. It follows Anthropic's release of Claude Mythos Preview, which Anthropic says found thousands of zero-day vulnerabilities in every major operating system and browser — including a 27-year-old flaw in OpenBSD. When a single agent can autonomously surface decades-old holes in critical infrastructure, the question of who gets to release these systems stops being theoretical.
What changes if this happens: every frontier lab gains a new variable in its release calendar, and the open-weight community gains an existential one. The hardest enforcement question is whether vetting applies only to closed APIs or also to model weights anyone can download. Watch for the executive order's text — and whether Anthropic, Google, and OpenAI publicly support it. Quiet support would be the strongest signal yet that incumbents see regulation as a moat.
Anthropic's Most Powerful Model Just Found Bugs That Survived 27 Years of Human Review
The most consequential agent deployment of the past month isn't about customer service or coding. It's about a model Anthropic decided was too dangerous to release publicly and handed instead to twelve security partners.
Through Project Glasswing — a coalition that includes Amazon, Apple, Broadcom, Cisco, CrowdStrike, the Linux Foundation, Microsoft, and Palo Alto Networks — Claude Mythos Preview has, according to Anthropic, found thousands of zero-day vulnerabilities. The standout examples: a 27-year-old remote-crash vulnerability in OpenBSD, an operating system that's spent decades cultivating a reputation as one of the most security-hardened in the world, and a 16-year-old flaw in FFmpeg in a code path that automated fuzzers had hit five million times without catching. Anthropic has committed up to $100 million in Mythos Preview usage credits and $4 million in direct donations to open-source security organizations.
The uncomfortable implication is that the same capability that finds these bugs could exploit them — which is precisely why Anthropic isn't releasing Mythos publicly. The "defensive-only" deployment model is itself the news here: a frontier lab choosing curated partner access over commercial launch, betting that capability gating becomes a legitimate governance pattern. If other labs follow, expect a tier of frontier models that simply never reach an API. If they don't, expect the partners to start publishing findings — and a long, awkward conversation about liability for bugs that were known and unpatched.
The Pentagon Just Made Frontier AI a Classified-Network Utility
Consumer AI stories ask whether a bot can book dinner. Defense asks whether it can help make decisions on classified networks without creating a national-security headache.
On May 1, the U.S. Department of Defense signed agreements with Nvidia, Microsoft, AWS, and Reflection AI to deploy their AI on classified networks for what the department called "lawful operational use." The Pentagon also disclosed that more than 1.3 million Defense personnel have used GenAI.mil as of May 1, 2026, its secure enterprise generative AI platform. That doesn't mean a million autonomous agents are loose in the SCIF, but it does say the experimentation phase is over.
The deliberate vendor diversification — following earlier deals with Google, SpaceX, and OpenAI — is the strategic signal. The Defense Department is building a bench, not a marriage. The thing to watch: how the department defines "lawful operational use" in practice — task boundaries, human approvals, and which workflows are agent-eligible. Vague answers mean the policy is still being written. Specific answers mean it's already operational.
The Harness Is Now the Product
A quiet but important shift in how developers evaluate coding agents surfaced this week. A Hacker News post titled "Agentic Coding Is a Trap" argued that letting an agent loose on your codebase produces an illusion of speed while quietly accumulating technical debt. It crossed the threshold into proper public debate when a Hacker News commenter responded with production logs of his own — sometimes saving three hours, sometimes losing four.
The more interesting consensus emerging from the practitioner community: agent performance is a joint property of model × harness × context strategy. The "harness" is the scaffolding around the model — how it retrieves code, ranks it, compresses it, handles errors, routes between tools. According to recent community testing reported in Latent Space, prompt and middleware changes alone moved gpt-5.2-codex from 52.8% to 66.5% on Terminal-Bench 2.0 in recent tests. The model didn't change. The wrapping did.
That reframes the competitive map. If the best context pipeline matters more than the best model, the moats shift from labs to whoever owns the orchestration layer — LangGraph, deepagents, Cursor, or whatever ships next. Watch whether enterprise RFPs start asking about harness architecture. If they do, the model providers just got commoditized in slow motion.
⚡ What Most People Missed
- DeepClaude lets you run Claude Code on DeepSeek for 17x cheaper: GitHub user "aattaran" published a tool this week that swaps Claude Code's backend for DeepSeek V4 Pro using nothing but environment variables. The repo has nearly 1,000 stars, DeepSeek has published official docs for the pattern, and the deeper signal is that the agent loop and the model layer are separating in practice — not just in theory.
- Anthropic published real autonomy data: Between October 2025 and January 2026, the 99.9th-percentile turn duration in Claude Code sessions nearly doubled — from under 25 minutes to over 45 minutes in that window. The increase is smooth across model releases, suggesting it's user trust scaling up, not capability jumps. Agents are being given longer, more autonomous work; the data is now empirical.
- The "distillation panic" is creating bad-policy risk: Researcher Nathan Lambert argues that conflating "distillation attacks" by a few Chinese labs with distillation as a general technique — used by virtually every AI lab, including U.S. ones — could push Congress toward sweeping rules that effectively cripple academic AI and open-weight ecosystems. Legislative proposals are being discussed in Washington; the actual bad behavior being exploited is API jailbreaking, while distillation is a standard training method.
- Europe opened a live consultation on Android agent interoperability: The European Commission is consulting through May 13, 2026 on Digital Markets Act measures that would require Alphabet to let rival AI agents act more like the system assistant — wake-word access, app context, hardware responsiveness. If adopted, mobile agent access shifts from a platform favor to a regulatory right in Europe.
- China's AI weekly usage passed the U.S. — twice in a row. Reporting flagged this week shows Chinese platforms now lead in weekly active model usage, with Tencent's Hunyuan 3 preview ranking first among free models. The benchmark gap is narrow; the deployment gap is closing in the opposite direction. Source reporting was in Chinese-language outlets.
📅 What to Watch
- If Salesforce starts publishing hard production metrics from named Agentforce Operations customers — not the vendor-marketing 427% engagement figure from Engine — back-office agent execution becomes a verifiable product category that procurement teams can benchmark against contractual SLAs.
- If the White House executive order on model vetting covers open-weight releases, the U.S. open-source AI ecosystem faces an enforcement problem with no clean answer.
- If a Fortune 500 company flags AI compute overruns as a material risk in an earnings call, expect investor questions and potential new risk-factor disclosure requirements in SEC filings describing token-use exposure and operational dependency on third-party models.
- If Anthropic's Glasswing partners begin disclosing the vulnerabilities Mythos found, every major software vendor faces pressure to run autonomous audits — and a new liability question for known-but-unpatched bugs.
- If GitHub's June 1 usage-based billing transition triggers visible developer revolt, expect Cursor, Copilot, and Claude Code to delay similar moves and quietly absorb margin pressure.
- If AWS expands the OpenAI-on-Bedrock preview to general availability quickly, cloud procurement — not vendor websites — becomes the dominant channel for enterprise agent adoption.
The Closer
A coding assistant ate Uber's annual AI budget in four months, an Anthropic model found a bug that had been hiding in OpenBSD since the first Clinton administration, and a White House that spent a year telling AI to run free is now drawing up a guest list for who gets to enter the room. Somewhere in there, a developer rerouted Claude Code to a Chinese model with three environment variables and called it a Tuesday.
Catch you next week.
Forward this to the colleague who keeps asking what an "agent" actually is — they'll get further in five minutes here than in a quarter of vendor demos.