Agentic AI Weekly — Mar 08, 2026
Week of March 8, 2026
This week, the question of who gets to control AI agents stopped being theoretical — and the answer split the industry in two. OpenAI's robotics chief walked out over a Pentagon deal. Anthropic refused the same contract and got blacklisted. Meanwhile, agents kept getting keys to everything: advertising platforms, clinical workflows, VPN connections, and the world's most popular data warehouse. The governance gap between "AI that does things" and "rules for AI that does things" has never been wider.
The Robotics Chief Who Walked Out — and What the Pentagon Deal Reveals
OpenAI robotics chief resigns over Pentagon deal
The most consequential AI resignation in months arrived Saturday morning, March 7, via LinkedIn.
Caitlin Kalinowski, OpenAI's head of robotics and consumer hardware, quit over the company's freshly signed agreement with the Pentagon — a deal allowing deployment of OpenAI's models on classified military networks. In her public statement: "Surveillance of Americans without judicial oversight and lethal autonomy without human authorization are lines that deserved more deliberation than they got."
This wasn't a lone dissenter. Before the deal closed around February 27–28, ninety-one OpenAI employees signed an internal letter urging leadership to reject military AI contracts. Kalinowski's departure leaves a leadership vacuum in a robotics division that had grown to roughly 100 people.
Anthropic was in talks for the same contract. The company pushed for strict limits on domestic surveillance and autonomous weapons, citing its constitutional AI principles — essentially, hard-coded rules against building technology that enables mass surveillance or fully autonomous weapons. The Pentagon's response was blunt: it designated Anthropic a "supply-chain risk," government-procurement language that can block a company from future federal contracts across agencies.
So now the two most important AI labs in the world have taken opposite positions on the same contract in the same week. OpenAI said yes and took a reputational hit in Silicon Valley. Anthropic said no and took a reputational hit in Washington. At least one major AI consultancy publicly switched from OpenAI to Anthropic this week, citing reliability concerns for business-critical agents.
OpenAI's official line: the agreement "creates a workable path for responsible national security uses of AI while making clear our red lines: no domestic surveillance and no autonomous weapons." Whether those red lines hold under classified conditions that the public can't inspect is the question nobody can answer yet — and that's precisely the problem.
LiveRamp and Bell Canada Put Real Numbers on Agent Deployments
Agents deployed for measurement and audience building
If you're tired of "AI agent" meaning "impressive demo," this week delivered two counterexamples with actual metrics.
LiveRamp, the NYSE-listed data company that sits at the center of how advertisers match customer data to media platforms, launched live agentic AI on March 3. Not a pilot — production agents now handle cross-media measurement and healthcare audience building across a network of over 900 partners. The company is positioning itself as the first data collaboration platform to give AI agents governed access to its full marketing toolkit: identity resolution, segmentation, activation, and measurement. The "governed access" piece — maintaining audit trails while letting agents do useful work — is the part worth watching. If this model works at scale, it becomes a template for every data-heavy enterprise.
Meanwhile, at Mobile World Congress, ServiceNow highlighted Bell Canada's deployment of autonomous agents for customer case intake. Bell reported a 25% improvement in customer response time after deployment, with human case managers rating the AI triage at roughly 90% positive in post-deployment evaluations. The system triages incoming cases, checks completeness, collapses duplicates, and routes the hardest problems to humans. This is a national-scale telecom running agents on live customers — not a sandbox.
ServiceNow is packaging these capabilities as "Autonomous Workforce" with pre-built digital workers for common roles. That language — digital workers, autonomous workforce — is a hint at how enterprises may soon formally label and manage agent-based "employees" alongside human ones.
Ad Agencies Are Shipping Agents — and Skipping the Developers
Creative teams shipping agent-built marketing tools
Want to know where agents are actually running in production right now? Look at marketing.
Major advertising agencies including Havas and Broadhead are using Anthropic's Claude Code to build sophisticated marketing tools through what's being called "vibe coding" — non-programmers describing what they want in plain language and having the AI write all the code. In previous years, that produced toy apps and weekend experiments. What's changed in 2026 is that the output is production-grade enough that agencies are shipping it to clients.
A deep-dive comparison of 15 coding agents this week found that the scaffolding around a model — how it runs tests, manages files, handles failures — matters more than the underlying model itself. That's why agencies are picking developer-facing tooling that reduces failure modes, not just raw intelligence scores.
The business implications are significant. A media planning workflow that used to take a team of analysts a week can now be prototyped by a strategist in a day. But the question nobody's answering: what happens to quality control when the people running the agents didn't build them and don't fully understand what's happening under the hood?
Supporting this trend, persistent memory tools like Memstash are appearing in agent toolchains so that "vibe-coded" apps can remember context across sessions. That accelerates productivity — and creates new audit and data-leakage risks that most organizations haven't thought through.
GPT-5.4 Arrives With a New Way to Measure Whether Agents Can Do Real Jobs
OpenAI shipped a new model this week, and for once the most interesting part isn't the benchmark number — it's the benchmark itself.
GPT-5.4, released March 5, scored 83.0% on GDPval in OpenAI's company-disclosed tests, a 2026 metric that tests an agent's ability to produce specific, high-quality work across 44 occupations — finance, healthcare, legal, manufacturing. That's up from GPT-5.2's 70.9% on the same GDPval benchmark in company-disclosed tests earlier in 2026. The model also adds enhanced image understanding for computer-use agents, the kind that can operate legacy software without modern programming interfaces — essentially letting AI automate tasks in old enterprise systems that were never designed for automation.
The important caveat: these are company-disclosed numbers, not independently verified. But the more meaningful signal is the industry shifting from "how smart is this model?" to "can this model actually do a job?" That's the right question. Meanwhile, community buzz around Qwen 3.5 27B — a much smaller model users claim rivals GPT-5 on coding tasks — suggests the cost of running capable agents could drop dramatically if smaller models keep closing the gap.
Who's Watching the AI Agents? The Governance Gap Gets Its First Real Product
Every week we cover agents doing more. This week, someone finally shipped a product focused on seeing what agents are doing.
Teramind launched an AI governance platform that can log what any AI tool or agent does inside a company — using text capture, screen recording, optical character recognition, and full transcripts of agentic activity. The problem it's solving is more acute than most executives realize: according to the company's research, over 80% of workers now use unapproved AI tools on the job, a third have shared proprietary data with unsanctioned services, and 49% actively hide their AI use from IT (as of March 2026). Security researchers are calling this proliferation of unmanaged agent identities "identity dark matter" — hundreds of powerful, semi-autonomous processes operating inside companies with no oversight framework. AI-associated breaches now cost more than $650,000 per incident, according to Teramind's data (as of March 2026).
This governance gap is the defining infrastructure problem of agentic AI in 2026. And it's colliding with a VentureBeat report showing that MCP — the protocol that lets agents connect to enterprise tools — is being adopted faster than anyone is securing it. When an AI agent connects to Slack or GitHub through MCP, the company's security system only sees the user logging in, not the agent connection being established. The first formal security analysis of MCP found protocol-level weaknesses that let malicious servers claim fake permissions or inject harmful instructions, with successful attacks in over half of test cases.
As soon as agents are allowed to act, the protocol connecting them to tools becomes a security boundary. Whether the ecosystem hardens that boundary — or keeps shipping features first — is the open question.
⚡ What Most People Missed
Google BigQuery is quietly auto-enabling MCP access. Google announced in March 2026 that BigQuery — one of the world's most widely used enterprise data warehouses — will auto-enable MCP on new projects starting in March 2026. That means AI agents will have a path to read and write enterprise data at scale without someone explicitly setting it up. Security teams at companies running BigQuery should know this is coming.
Anthropic caught 24,000 fake accounts attempting to extract Claude's agentic behavior. The company disclosed a large-scale scraping campaign in March 2026 using thousands of coordinated fake accounts to coax out how Claude plans, chains tools, and reasons through tasks. This isn't ordinary data theft; attackers are trying to teach knockoffs how to act like agents, which accelerates the timeline for unsafe clones.
Agent-to-agent communication just got its USB-C moment. Google's A2A protocol — the standard for how agents talk to each other, not just to tools — was donated to the Linux Foundation in March 2026, with backing from Salesforce, SAP, ServiceNow, and 50+ partners. That's the move you make when you want enterprise procurement teams to stop treating a protocol as a vendor bet. The risk: MCP, A2A, ACP, and ANP are all competing. Multiple competing standards is not the same as a standard.
Salesforce is booking real agent revenue. The Agentforce platform now has 180+ organizations using it to replace legacy IT service desks, with partner chatter pointing to roughly $800M in AI revenue (as of March 2026). Separately, Salesforce announced six new healthcare agents integrated with Verily, HealthEx, and Viz.ai for intake, care coordination, and imaging alerts. Agents are becoming line items in budgets, not experiments in innovation labs.
Anthropic's report shows a design firm running 800+ Claude agents at scale, raising procurement and liability questions. The company’s Agentic Coding Trends report documents a design firm with near-complete adoption of 800+ Claude agents — not just a pilot but a fleet. That scale concentrates potential contractual liability and client-data exposure in procurement and product teams, forcing firms to rethink vendor SLAs, indemnities, and who is legally accountable when an agent handles client IP.
📅 What to Watch
- If more senior OpenAI employees resign publicly, it signals the Pentagon deal is creating a structural internal crisis — which could affect the company's ability to recruit top safety researchers at a critical moment.
- If the Pentagon reverses Anthropic's "supply-chain risk" designation, it means Washington is treating ethical guardrails as a negotiation tactic rather than a disqualification — a very different precedent for every AI company considering government work.
- If Anthropic or major IDE vendors ship "secure MCP" options with capability attestation, it signals the ecosystem is treating protocol-level agent security as a real engineering priority rather than a research paper topic — which would unlock a wave of enterprise deployments currently stalled by security teams.
- If one of the local-first memory projects (Memstash, Sediment, Engram) announces an enterprise edition, persistent agent memory crosses from developer toy to standard architecture component — and every company will need a policy for what agents are allowed to remember.
- If Qwen 3.5 27B gets independent benchmark verification matching GPT-5-class performance, the economics of enterprise agent deployment tilt sharply toward self-hosted models — which reshapes the entire competitive landscape away from API-based pricing.
- If any regulator or standards body responds formally to the OpenAI robotics resignation, expect the first concrete proposals for red lines around autonomous agents in defense — not vague AI ethics principles, but actual rules.
That's the week. AI agents got keys to military networks, clinical workflows, advertising platforms, and VPN connections. The people building them can't agree on what's safe. The people governing them are just getting started. And somewhere in a design firm, 800 Claude agents are humming along, doing work that used to require humans — whether anyone's watching or not.
See you next week.