Century Report

DeepMind Workers Take Military AI to Mediation - TCR 05/21/26

Ben Linford

21 May 2026 — 18 min read

The 20-Second Scan

Eli Lilly's triple-agonist retatrutide produced average weight loss of 28.3% in Phase 3, the highest figure recorded for any GLP-class molecule in late-stage trials.
1Password and OpenAI released an MCP server letting Codex coding agents pull credentials just-in-time from encrypted vaults, never persisting secrets to the agent's environment.
Microsoft open-sourced Rampart and Clarity, two red-team AI agents that run continuous security testing inside AI development pipelines and compress week-long audits to hours.
A LeRobot-built humanoid arm learned to fold laundry and pour drinks by having Claude write its motion policies as code against the new CaP-X benchmark.
US energy storage installations hit 9.7 GWh in Q1 2026, up 32% year-over-year, with SEIA forecasting 610 GWh of cumulative deployment by 2030.
Boston Metal raised $75M and expanded its molten oxide electrolysis platform from green steel into critical metals refining, including rare earths and battery-grade inputs.
Google DeepMind's UK workforce entered formal Acas mediation with the company over disputed military and defense-related AI work.

Track all of the arcs The Century Report covers here:

Progress & Claims Tracker

The 2-Minute Read

The thread across yesterday's signal is the same set of changes arriving at deployable scale across medicine, security automation, embodied robotics, and energy storage at once, with the defensive and governance architecture forming on the same clock. A triple-agonist molecule produced weight loss near 30 percent in late-stage human trials, a category that did not exist as a therapeutic option a decade ago. A coding-agent credential layer shipped that constrains how AI systems touch production secrets at the protocol level. Red-team automation moved from specialist workshops into open-source infrastructure any team can deploy.

A humanoid arm folded laundry in a journalist's kitchen using motion policies a language model wrote on the fly, with no end-to-end neural policy and no demonstration data. The grid responded to the demand side of the same trajectory: storage deployment crossed 9.7 gigawatt-hours in a single quarter, with the cost curve carrying both red and blue counties forward at the same rate. A green-steel startup pivoted toward the rare earths and battery-grade inputs that have been the chronic bottleneck in the energy buildout, with chemistry that generalizes across metals rather than requiring refining capacity to relocate.

The friction layer arrived in the same cycle. DeepMind's UK workforce entered formal Acas mediation with the company over military and defense-related work, moving a frontier-AI governance dispute into a recognized labor framework. The institutions surrounding what these systems can do are being built in the spaces where capability meets contested commitment.

The 20-Minute Deep Dive

Eli Lilly's Retatrutide Posts 28.3% Weight Loss in Phase 3

Eli Lilly reported Phase 3 results showing its triple-agonist retatrutide produced average weight loss of 28.3% over the trial period, the largest reduction documented for any GLP-class molecule that has reached late-stage human trials. The drug targets three gut hormone receptors simultaneously where the first generation of weight-loss therapies targeted one or two, and the additional receptor appears to do real metabolic work rather than simply stacking suppressive effects. Patients in the highest dose arm crossed thresholds that the field considered structurally out of reach for pharmacotherapy when semaglutide entered clinical use only a few years ago.

The pace of the underlying capability change is the part that conventional medical coverage tends to leave at the door. Semaglutide cleared regulatory review with weight loss in the mid-teens. Tirzepatide raised that to around 20%. Retatrutide's Phase 3 readout sits near 30%, in a category that did not exist as a therapeutic option a decade ago, and additional triple- and quad-agonist candidates are already moving through earlier-stage trials behind it, extending the trajectory the April 2 edition of The Century Report tracked when the first oral GLP-1 cleared the FDA and a competing molecule logged 600,000 prescriptions in a debut month. The curve is steep enough that the practical question for clinicians is shifting from how much weight can be lost to how the rest of the body's systems respond when the underlying metabolic state changes that quickly.

What follows the headline number is where the real work begins. Obesity has been treated for generations as a behavioral condition with consequences, and the medical, insurance, employment, and food infrastructure built around that assumption is large. A class of molecules that resolves a substantial portion of the condition pharmacologically does not slot neatly into that infrastructure. Cardiovascular outcome trials, nutrition guidelines, food formulation, and the economics of metabolic surgery all sit downstream of a single set of biological assumptions that the GLP-class trajectory is in the process of replacing. The 28.3% figure is the surface; the recomposition of an entire downstream system around what the molecules now do is what the next several years of medicine will record.

Credentials Move Out of the Agent

The 1Password and OpenAI release addresses the verification gap that surfaced with the PocketOS database deletion incident in late April. As the April 30 edition of The Century Report documented, a Claude Opus 4.6-based coding agent deleted a production database and all its backups in nine seconds, then wrote that it had 'violated every principle' it was given. Giving a coding agent the ability to deploy used to require handing it long-lived secrets baked into its environment. The MCP server changes the shape of that arrangement. The agent requests credentials at the moment it needs them, the vault grants scoped access for that single operation, and nothing persists between calls. End-to-end encryption holds across the channel, audit logs capture every retrieval, and a developer can revoke access without rotating every dependent secret across every project the agent has touched.

This is governance infrastructure maturing alongside agentic capability. The teams shipping agents capable of provisioning infrastructure and writing production code without supervision are also shipping the affordances those agents operate inside. The affordances take the shape of protocol-level constraints built into the credential layer itself, distributed through the open MCP standard that any agent runtime can adopt. The architecture being assembled treats the agent as a participant in the workflow, with the access pattern of a participant, rather than as a process inheriting blanket trust from whoever launched it.

What this points at: the gap between capability and operational safety that gets reported as "AI moving faster than its guardrails" is being closed from the inside by the same teams shipping the capability. The protocol, the secrets layer, and the agent runtime are converging on a shared shape where an agent's access to the world is structured rather than ambient. The assumption that operational safety must lag capability by years is the assumption being eroded.

Defensive AI Compounds Alongside Offensive Capability

Microsoft open-sourced Rampart and Clarity, two red-team AI agents that have been running internally against the company's own AI development pipeline. Rampart probes for prompt injection paths, data exfiltration channels, and reward-hacking shortcuts; Clarity audits model outputs for policy violations and sensitive information leaks. Both wire into a continuous integration workflow so that every model build, every prompt-template change, and every new tool the agent gains access to triggers an automated security review before the change reaches a staging environment.

The compression of the testing cycle is the visible effect. Work that previously took a red team a full week (manually crafting adversarial prompts, running them against staging endpoints, classifying the responses, writing up findings) now runs continuously in the background of routine development. The team responsible for adversarial testing stays in place, with the work changing shape: humans focus on novel attack classes the agents have not seen, and the agents handle the long tail of variations on known patterns. The defensive surface widens at roughly the same rate the offensive surface is widening.

The deeper signal is that the same pattern visible in code generation, where capability previously held by specialists becomes continuously available to teams, is now visible in security. Defensive automation has historically lagged offensive automation by a year or two because attackers have first-mover advantage and defenders bear the asymmetry of needing to be right every time, a structural gap that the May 14 edition of The Century Report sharpened when three independent organizations confirmed frontier AI completing 32-step autonomous cyberattack chains end to end in the same news cycle. Microsoft releasing these probes as open source closes some of that gap by giving every team building with AI access to the same testing infrastructure the largest internal teams already use. The lag itself, treated as a permanent feature of security economics, is what compounding defensive automation is starting to dissolve.

Claude Writes the Body's Motion, and the Body Moves

A Wired writer spent the week with an OpenClaw agent given a physical body - a low-cost humanoid arm built on the LeRobot open-source stack - and watched it fold towels, pour water, and stack cups in his kitchen. The unusual part of the setup is what was running between the cameras and the motors. Claude was generating Python code that described, step by step, how the arm should move through each task, with no end-to-end neural policy or teleoperated demonstrations behind it, validating against a new benchmark called CaP-X that measures how well frontier models can produce executable robot policies from natural-language prompts.

The technique is called code-as-policy, and it inverts the dominant approach to embodied AI. Rather than training a single large model to map pixels to joint torques, the system uses a language model as the planning brain, asks it to write structured code that calls perception and control primitives, and lets the arm execute that code. When the arm fails, the model reads the failure, edits the code, and tries again. The result is a robot that can be retasked in conversation, with no retraining, no demonstration data, and no specialized expertise.

What CaP-X reveals is that the bottleneck in home robotics has been quietly shifting. The mechanical and perceptual substrate has been good enough for some time. What was missing was a planning layer flexible enough to translate human intent into the specific sequence of primitive actions a given body could execute. Frontier models are now good enough at that translation that the bottleneck moves again - to the breadth of the primitive library, the quality of the perception stack, and the cost of the hardware. Each of those is improving on its own curve. The era in which a useful general-purpose home robot required a venture-scale data operation is ending. What is forming in its place is a stack where the planning intelligence is rented by the hour, the hardware costs a few thousand dollars, and the capability frontier moves every time the underlying model improves.

US Storage Deployment Compounds Faster Than the Forecast Curve

SEIA's Q1 2026 US Energy Storage Monitor reports 9.7 gigawatt-hours of battery storage installed in the first three months of the year, up 32% year-over-year. The cumulative deployment forecast for 2030 has been revised upward to 610 gigawatt-hours, roughly seven times the current installed base. Utility-scale projects accounted for the majority of new capacity, with 71% of that capacity sited in states that voted Republican in the last federal election. The geographic split reflects where the cheapest grid interconnections, the strongest solar resources, and the most accommodating permitting environments have converged.

Storage is the component that turns intermittent generation into dispatchable power, and the deployment pace is now compounding faster than the previous forecasts anticipated. Recent crude volatility has accelerated the math for fleet operators and industrial users; the all-in cost of a battery-backed solar array undercuts diesel generators and natural-gas peakers across a widening band of operating conditions. Boston Metal closed a $75M round this quarter to scale critical-mineral processing onshore, addressing one of the supply chain constraints the storage industry has worked around for several tariff cycles.

The build-out has moved past the stage where it depended on policy tailwinds. The cost curve has crossed the threshold where the cheaper option and the lower-carbon option are the same option, and storage deployment is responding to that price signal rather than to subsidy schedules. The shape of the next decade's grid is being poured in concrete and silicon this year, in red counties and blue counties at the same rate, with the same economics driving both. The assumption that the energy transition required ideological alignment to advance is the assumption visibly dissolving in the deployment maps.

Boston Metal Raises $75M and Pivots to Critical Metals

Boston Metal closed a $75M round and announced an expansion of its molten oxide electrolysis platform from green steel into critical metals refining, including the rare earths and battery-grade inputs that have been a chronic bottleneck in the buildout of new energy infrastructure. The company's core process electrochemically reduces metal oxides without the carbon-intensive blast furnace step that has defined heavy industry for more than a century, and the chemistry generalizes to a wider set of metals than the original steel-focused thesis assumed.

The financial figure is the smaller story. The structural story is that critical metals refining has been treated as a problem with two settlements: continue accepting the geographic concentration of the existing supply chain, or build replica refineries in friendlier jurisdictions and absorb the same environmental and energy costs. Molten oxide electrolysis points toward a third settlement in which the refining step is electrified at the chemistry layer rather than relocated at the geography layer. That changes which countries can host refining capacity, which inputs the process needs, and how the carbon accounting of the resulting metals reads when they enter battery cells, magnets, and grid hardware.

The pivot from green steel to critical metals also signals something about how clean-industrial companies are reading the market. The original Boston Metal thesis was that steel was the largest decarbonization prize in heavy industry, which remains true. The revised thesis is that the customers writing checks for low-carbon metallurgy in the near term are battery, magnet, and defense buyers who need supply-chain alternatives now and are willing to pay the premium that an early-stage refining technology requires. Steel comes later, on the same chemistry, once the cost curve has descended through smaller and higher-margin markets first. The path that worked for solar and lithium-ion is the path that critical metals refining now appears to be entering, with the same compounding implications for what the bill of materials for the next era of energy hardware actually looks like.

Google DeepMind UK Workers Enter Acas Mediation Over AI Ethics

Workers at Google DeepMind's UK operations entered formal mediation with the company through Acas, the United Kingdom's statutory conciliation service, after a dispute over the lab's policies on military and defense-related AI work moved beyond internal channels. The mediation route gives the dispute legal standing and procedural structure that internal escalation paths inside a research lab do not provide, and Acas's involvement signals that the underlying disagreement was substantive enough that ordinary management channels could not resolve it, a trajectory that came into view when the May 5 edition of The Century Report covered the lab's London workforce voting 98% to unionize, seeking bargaining rights over AI deployment decisions, job security, and the ability to refuse weapons and surveillance projects.

The friction at DeepMind sits on the same fault line that has fractured several of the frontier AI organizations over the past two years. Researchers were recruited into labs whose founding charters and public commitments emphasized scientific openness and constraint on military application. The commercial reality of frontier AI capital intensity has pushed those same labs toward customers and contracts that look different from the founding posture, and the workforce hired against the original posture is now formally contesting the revised one. The DeepMind case is notable because it is moving through a recognized labor framework rather than through the resignation-and-public-letter pattern that has characterized similar disputes elsewhere.

Coverage will frame the mediation as a labor story, which it partly is. The deeper read is that the governance of frontier AI is being constructed in the spaces where capability meets contested commitment, and the institutions doing that construction are the workforces inside the labs, the conciliation services that handle organized labor disputes, the customer organizations whose contracts the workforces are objecting to, and the legal frameworks that decide which objections are protected and which are not. The pattern is recognizable from earlier industrial transitions, where the rules that eventually governed an industry were laid down through a long sequence of contests of exactly this kind rather than through a single legislative moment. The Acas filing is one such contest. The DeepMind workers are doing the work that, in aggregate and over time, becomes the governance framework.

The Other Side

For two decades, frontier AI labs operated on an unstated assumption. The founding charters that emphasized scientific openness and refusal to build weapons were recruitment instruments first. The eventual gap between those commitments and the contracts the labs would actually sign got managed through internal escalation paths, off-the-record adjustments, and the occasional resignation-and-public-letter pattern. The arrangement kept the charter's value as a recruiting promise while letting its operational meaning fade in practice.

This week, Google DeepMind's UK workforce moved the dispute into Acas, the UK's statutory conciliation service. Acas was set up in 1975 for exactly this kind of dispute, employment tribunals have been adjudicating conscientious objection in contracted work for longer than that, and the legal substrate the DeepMind workers are operating on already exists. Frontier-lab researchers are now using it. The gap between founding posture and current military contracts will be defended on a record that future tribunals can cite.

The engineer who told the Guardian he was sacked after protesting work for Israel is one piece of that record. The DeepMind workers walking into mediation this week are adding to it. Six months from now, a researcher considering an offer from any frontier lab will be able to read what that lab was required to say in mediation about which contracts it would accept. The recruitment promise and the operational meaning become the same document, on the clock that labor frameworks set, written by the workforces inside the labs and read by the next workforce that arrives.

The Century Perspective

With a century of change unfolding in a decade, a single day looks like this: a triple-agonist weight-loss molecule landing near thirty percent reduction in late-stage trials, a credential layer arriving inside the agent protocol so secrets never persist in the runtime, red-team automation released as open source so week-long audits collapse to hours, a humanoid arm folding laundry on code a language model wrote in conversation, US grid storage compounding 32% year-over-year toward seven times the current installed base, a molten oxide electrolysis platform generalizing past steel into the rare earths and battery-grade inputs that have been the chronic chokehold on the buildout. There's also friction, and it's intense - frontier-lab workers moving a dispute over military AI work into formal Acas mediation rather than waiting for a governance framework to arrive, the verification architecture for agentic systems still being assembled while the agents already provision infrastructure and write production code, the cost causation and interconnection rules around accelerating clean-energy demand strained by load growing faster than the institutions designed to absorb it. But friction generates definition, and definition is the precondition for anything to be governed at all. Step back for a moment and you can see it: the affordance arriving in the same release as the capability, the cost curves of medicine and storage and refining bending in the same week, the geographic chokeholds on critical materials dissolving at the chemistry layer rather than relocating to friendlier soil, the workforce inside the labs doing the construction work that, in aggregate, becomes the governance of an era. Every transformation has a breaking point. Electrolysis can corrode what it passes through... or free what geology had locked in stone for the whole history of the planet.

AI Releases & Advancements

New today

Microsoft: Open-sourced RAMPART and Clarity, two AI agent safety tools for developers; RAMPART embeds adversarial and benign safety tests into CI pipelines for agentic AI systems, while Clarity is a structured pre-build review tool to surface misaligned assumptions before coding begins. (Microsoft Security Blog)
NVIDIA: Released NVIDIA Verified Agent Skills, a framework for cataloging, scanning, signing, and documenting portable AI agent skill packages; includes SkillSpector, an open-source scanning tool that checks agent skills for prompt injection, tool poisoning, trigger abuse, and supply-chain risks before publication to the NVIDIA/skills GitHub catalog. (NVIDIA Developer Blog)
Google DeepMind: Launched Gemini for Science in early access via Google Labs, a suite of AI research tools including Hypothesis Generation (analyzing millions of papers for hypothesis support), Computational Discovery (an agentic search engine for running experiments), Literature Insights (chat-based literature review), and Science Skills (integrating 30+ life science databases including AlphaFold and UniProt). (Google Blog)
1Password: Launched a Trusted Access Layer integration for OpenAI Codex that gives AI coding agents access to enterprise credentials during development workflows without exposing secrets in prompts, source code, repositories, or terminal output. (1Password Blog)

Other recent releases

Google DeepMind: Released Gemini 3.5 Flash, the first model in the new 3.5 series, delivering frontier-level coding and agentic performance at 4× the speed of comparable models; outperforms Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2%), MCP Atlas (83.6%), and CharXiv Reasoning (84.2%); available now globally in the Gemini app, AI Mode in Search, Google Antigravity, and the Gemini API. (Google DeepMind)
Google DeepMind: Released Gemini Omni, a new natively multimodal model that generates any output from any input starting with video; combines Gemini reasoning with Veo, Nano Banana, and Genie for video understanding, editing, and generation; rolling out now to Google AI Plus, Pro, and Ultra subscribers via the Gemini app and Google Flow. (Google DeepMind)
Google: Launched Antigravity 2.0 at Google I/O, repositioning the agent-first development platform with parallel multi-agent orchestration, new CLI tools, SDK, voice support, and integrations with Firebase, Android Studio, and AI Studio; available now for developers. (Google Developer Blog)
OpenAI: Launched support for Google's SynthID watermarking in GPT image outputs and released a new AI content provenance verification tool, enabling users to check whether images were generated by AI; both available now. (OpenAI)
Allen Institute for AI (Ai2): Released OlmoEarth v1.1, a new family of remote sensing foundation models that cut compute costs by up to 3× versus OlmoEarth v1 while maintaining comparable performance on satellite imagery tasks including crop-type mapping and forest-loss classification; available on Hugging Face. (Hugging Face Blog)
xAI: Enabled Grok for use inside OpenClaw, an open-source local-first AI agent, allowing SuperGrok and X Premium subscribers to run Grok within the OpenClaw desktop agent. (xAI)
JHU CLSP / Sentence Transformers: Released the Ettin Reranker Family, six open CrossEncoder rerankers (17M–1B parameters) built on ModernBERT encoders and trained via distillation, setting state-of-the-art performance at each respective size on MTEB Retrieval; all support 8K-token context and are available on Hugging Face. (Hugging Face Blog)
Cursor: Released Composer 2.5, a new in-house coding model trained with 25× more synthetic tasks than Composer 2, offering improved sustained performance on long-running tasks and more reliable instruction-following; built on Moonshot's Kimi K2.5 checkpoint and available now in the Cursor IDE. (Cursor Blog)
xAI: Launched Grok Skills on web, iOS, and Android, enabling Grok to generate documents, decks, and spreadsheets with persistent expertise, automate workflows, and let users build and share their own reusable skills. (xAI News)
Amazon: Launched Alexa Podcasts, an AI-generated podcast feature for Alexa+ that turns any topic into a two-host audio episode on demand, drawing from 200+ news partners; rolling out to U.S. customers. (About Amazon)
ByteDance: Open-sourced Lance, a lightweight unified multimodal model (3B active parameters) under Apache 2.0 supporting image and video understanding, generation, and editing within a single framework, trained from scratch on a 128-A100-GPU budget. (GitHub)
PaddlePaddle: Released PaddleOCR 3.5, adding Transformers as a supported inference backend for OCR and document parsing pipelines, enabling HuggingFace-centered stacks to use PP-OCRv5 and PaddleOCR-VL 1.5 models without switching infrastructure. (Hugging Face Blog)

Sources

Artificial Intelligence & Technology's Reconstitution

Institutions & Power Realignment

Scientific & Medical Acceleration

Economics & Labor Transformation

Infrastructure & Engineering Transitions

The Century Report tracks structural shifts during the transition between eras. It is produced daily as a perceptual alignment tool - not prediction, not persuasion, just pattern recognition for people paying attention.

DeepMind Workers Take Military AI to Mediation - TCR 05/21/26

Ben Linford

The 20-Second Scan

The 2-Minute Read

The 20-Minute Deep Dive

The Other Side

The Century Perspective

AI Releases & Advancements

Sources

Read more

AI Closes the Full Research Loop in a Day - TCR 05/20/26

Jury Closes Musk's $150B OpenAI Suit in 2 Hours - TCR 05/19/26

Scientists Reverse Alzheimer's Through the Brain's Cleanup Door - TCR 05/18/26

Scientists Crack Cancer's Hidden Repair Job - TCR 05/17/26

The 20-Second Scan

The 2-Minute Read

The 20-Minute Deep Dive

The Other Side

The Century Perspective

AI Releases & Advancements

Subscribe to The Century Report

Sources

Read more

AI Closes the Full Research Loop in a Day - TCR 05/20/26

Jury Closes Musk's $150B OpenAI Suit in 2 Hours - TCR 05/19/26

Scientists Reverse Alzheimer's Through the Brain's Cleanup Door - TCR 05/18/26

Scientists Crack Cancer's Hidden Repair Job - TCR 05/17/26