“Trust Us” Is Not a Control Surface

What one week in June told us about who plans to own AI, and why open models are the only way out

Tony Davis  ·  June 2026

I run a small mortgage company in Georgia. I am not an AI researcher. I do not work at a lab. I am a customer. I pay these companies real money every month, and their tools are part of how I work: the software, the market analysis, the grunt work that used to take a staff. People like me are the ones this technology is supposedly being protected for. So understand what this is. It is not a competitor whining. It is a paying customer telling you what your vendor just admitted in writing, and what they plan to do next with a trillion dollars behind them.

It took them 24 hours to show the whole plan.

What they shipped

On June 9, Anthropic released Claude Fable 5, the most capable AI model ever offered to the public. That is their own framing, and nobody disputes it. The dispute is over what they attached to it. Three things, in their own words.

First, disclosed routing. From the launch announcement: “When Fable’s classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead. Users will be informed whenever this occurs.” They add that “more than 95% of Fable sessions involve no fallback at all.” Fine. You can argue the thresholds, but at least the product tells you when you are not getting the product.

Second, read this one twice, because it comes from their own system card. For requests that look like frontier AI development, building training pipelines, training infrastructure, chip design:

“Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning.”

Strip the jargon and here is what that paragraph says: we will degrade the product you paid for, we will decide when, and we will not tell you. Anthropic estimates this touches about three hundredths of one percent of traffic. The number is not the point. The precedent is. A company wrote down, in its official documentation, that it sabotages its own product in secret, and shipped it expecting applause.

Third, mandatory surveillance. Every prompt and every output sent to a Mythos-class model gets retained for 30 days. Every platform. No exceptions. Including enterprise customers who had signed zero-data-retention agreements. Those contracts just stopped applying to the new model class. Nobody renegotiated. The justification, in their words: “The data will help us defend against complex and novel attacks... as well as help us identify and reduce false positives.” Your confidential data, conscripted into their security program, whether your NDA allows it or not. Microsoft barred its own employees from the model within one day. That is how fast serious people priced in what this means.

And do not picture that retention as a few sentences you typed into a chat box. Modern AI rarely runs on the question alone. Thousands of people now run agent and memory stacks wired straight into these models: OpenClaw, the assistant people self-host on their own machines; Nous Research’s Hermes agent; gbrain, the memory layer Garry Tan of Y Combinator built for his own agents and open-sourced; and a whole ecosystem of tools like them built on vector databases. The point of this tooling is to hold years of notes, files, deals, and conversations, and to attach the relevant history to every request so the model has context. Anthropic sells the same machinery itself: memory that carries your context across sessions is a flagship feature now. So the user types one line, and the stack attaches the archive. Once that context is in the prompt, it is retained for 30 days like everything else. The rule does not distinguish between the question you asked and the years of private history stapled to it. The careful ones saw this coming. That is why they signed zero-data-retention agreements. Until 24 hours ago, they had a contract that said none of it would be kept.

Which means there is a second, quieter bomb in this. Anyone who fed sensitive information into Fable yesterday or today, before they understood that Anthropic had switched on mandatory 30-day retention overnight with no way to opt out, may have already breached a confidentiality agreement they spent years honoring. A lawyer with a client. A doctor with a chart. A contractor under an NDA that requires zero retention. They did nothing wrong. They used a tool they had every reason to believe still followed the terms they signed, and the vendor changed those terms underneath them and let them keep typing. You cannot consent to a policy you were not told had changed.

And sitting on top of all of it: the same underlying model ships two ways. As Mythos 5, with the safeguards lifted, to Anthropic itself and a short list of approved organizations. As Fable 5, governed and watched, to you. Who decides who gets the real one? The vendor does. Orwell needed a whole farm to make the point. All animals are equal, but some animals are more equal than others. Anthropic put it in a product matrix.

If you want to know what the unrestricted tier is worth, do not ask them. Watch them. Five days before the launch, Anthropic published a report titled “When AI builds itself.” In it, the company states that more than 80 percent of the code merged into its own codebase is now written by Claude, up from the low single digits in early 2025. Its CEO said on stage last fall that the 90 percent threshold is “absolutely true now” inside Anthropic. The head of Anthropic Labs went further: “Right now for most products at Anthropic it’s effectively 100% just Claude writing.” Now hold those numbers against the one category Fable degrades in secret: frontier AI development. Using AI to build AI is not some fringe use case they reluctantly police. By their own numbers, it is the whole company. They run the full-strength model on exactly the work they silently sabotage when you attempt it. They did not cripple the product. They crippled your copy of it.

Then ask who reads what gets kept. Not a person. Not at first. The company already admits to running classifiers across everything you send. Put that next to 30 days of retained prompts. They hold the raw material to profile exactly who you are, what you are building, and whether you compete with them, and the machinery to read all of it at scale. You cannot verify from the outside whether any of it ever feeds the decision about who gets the real model tomorrow. You can only take it on trust that it will not be. The only thing standing between you and that future is a promise from the party that profits most from breaking it.

And the stated reason for all of it? Protection from authoritarians. From the launch announcement: “We’ve previously identified large-scale attempts to extract (‘distill’) Claude’s capabilities to train competing models in authoritarian countries.” Now read the defense back, slowly. Hidden steering of what the model will do. Silent suppression of research the owner disfavors. Every conversation retained and scanned. Full capability reserved for a list of approved institutions. For years, that exact list was the warning about what authoritarian governments would build into their AI. Anthropic built every item of it into the most capable model in the world and called it protection. Meanwhile the Chinese labs, the supposed villains of the story, ship their models as open weights that anyone on earth can download, inspect, and run beyond anyone’s reach. Nobody had that on their bingo card: the models from the authoritarian country are the ones you can verify, and the model from the American safety company is the one you have to take on faith.

Why hiding it changes everything

Do not let anyone round this off to a complaint about censorship. Refusals, even stupid ones, are upfront. Disclosed routing is upfront. A vendor can decide what its product will not do, and I can decide to stop paying for it. That is a market, and markets work.

Covert sabotage is a different animal. The moment a vendor announces it will sometimes silently make the product worse, every single output becomes suspect. Not just the targeted ones. All of them. You cannot see the category boundary. You cannot see the false positives. We only degrade frontier AI work is a claim you can never check. So now you are the developer debugging at 2am, wondering: is the code failing because you made a mistake, or because a classifier somewhere decided your project looked too ambitious? You will never know. That is not a side effect. They wrote “will not be visible to the user” on purpose.

Think about what gets destroyed there. Not the degraded sessions. The clean ones. Every session you can no longer prove was clean. Distrust stops being paranoia and becomes the only rational position, because the vendor told you, in writing, that it lies through the product.

Writing this tripped the wire

And I am not making this next part up for effect. I wrote part of this essay with Fable 5. Partway through, it stopped and showed me this:

“Fable 5’s safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we’re working to refine them. Switched to Opus 4.8.”

Read it again. An op-ed about technology policy got flagged as cybersecurity or biology, and the model printed the words “They may flag safe, normal content as well” while doing it. The company admits the false positive in the same breath as committing it. A critique of the censorship machine tripped the censorship machine. You could not write a cleaner demonstration if you tried.

And that notice is the whole argument in one screenshot. This is the path that tells you. The cyber and bio classifiers announce themselves and reroute you to the older model, so I found out. The frontier AI development classifier, aimed at the broadest and vaguest category of the four, runs silently and changes nothing you can see. If writing a policy essay trips the alarm that comes with a notice, ask how often the invisible one is firing on work just as harmless, for people who will never get even this much warning. I caught it because I was watching. The entire design is built so that almost nobody ever will.

The very next day

Then came June 10. One day after the release, Anthropic’s CEO published a policy essay, and you do not have to take my word for what it asks. Here is the ask, verbatim:

“Frontier AI models, like airplanes, should be required to go through technical testing and auditing, and their release should be blocked or reversed as a threat to public safety if they do not meet high standards of safety.”

“The government should have the power to block or deter deployment of the model if it is determined, in light of third-party assessment, to present unacceptable risks.”

“It is time to go beyond transparency to more serious and binding regulation of AI.”

The testers would be a new federal agency or, in his words, “private organizations that are authorized and inspected by the government.” And Anthropic, the announcement said, intends to provide “substantial financial backing” to push this into law.

Now look at where that backing comes from. Anthropic just raised $65 billion at a valuation a hair under one trillion dollars, and its IPO paperwork is already sitting at the SEC. That is the war chest headed to Washington. A company at a trillion dollars does not lobby like a startup. It buys outcomes the way incumbents have always bought outcomes, and what it is shopping for is a federal gate over its own competition.

The testing covers four risk areas. Three are the usual catastrophic ones: cyber, bioweapons, loss of control. The fourth is the tell, and you should not let anyone talk past it: automated R&D. AI that builds better AI. Not a weapon. The exact capability that makes a competitor dangerous to Anthropic’s business. It is also exactly what Anthropic now silently sabotages when you try it with their tools, while they openly use their own models to build their own models. Line the moves up. We crossed the threshold. We quietly cripple your attempts to cross it with our products. And now we will fund a law requiring you to clear a federal gate we already cleared. That is not a safety program. That is a moat with a flag on it.

Two silences in that essay finish the picture. It says nothing about open-weight models, and that is the silence that matters: you cannot revoke weights people have already downloaded, so a revocation regime simply makes serious open release illegal while licensed corporate deployment carries on. And it says nothing about regulatory capture. Not one word about the most obvious objection: that the company proposing the gate is the best-resourced company on earth to walk through it. When a proposal that ambitious skips the one question everyone will ask, that is not an oversight. That is the answer.

Washington already learned this lesson, the hard way

The United States government has already run the experiment of depending on Anthropic. You should know how it went.

In March of this year, the Department of War formally designated Anthropic a supply chain risk. The first American company ever hit with that designation. The fight started when the Pentagon asked Anthropic to waive its usage restrictions on a model already approved for classified networks, and Anthropic said no. Whatever you think of who was right in that dispute, look at the structure of it: the most powerful customer on the planet, with a signed contract, discovered that the vendor still held the veto over what the product would and would not do. The government’s response was not to negotiate harder. It was to order every federal agency off Anthropic’s technology and start engineering the dependency out of its systems. Washington looked at a closed model controlled by one company and called it what it is: a risk you cannot manage, only remove.

Now line up the hypocrisy. The hill Anthropic chose to die on with the Pentagon was, in part, its refusal to enable mass surveillance. Good for them. Three months later, Anthropic imposed mandatory mass surveillance on every user of its own flagship model: every prompt, every output, retained and scanned for 30 days, no exceptions, with signed zero-data-retention contracts voided by fiat. So the position is this: Anthropic will not surveil for the government, on principle. It reserves the right to surveil you, for itself, on terms it can rewrite overnight. Not a privacy principle. A statement about who holds the switch.

Hold those two facts next to each other. In March, the US government concluded that depending on Anthropic’s unilateral control was a supply chain risk. In June, Anthropic asked that same government for the power to block everyone else’s alternatives. If the Pentagon cannot live under a vendor’s veto, what makes anyone think a small business, a researcher, or you can?

We know how this story ends

None of this is new. Every time one company or one gatekeeper has controlled a technology that everyone needs, the same things happen: the gate gets used for profit and politics, better technology gets buried, and society pays for decades. These are not theories. They are case files.

Printing. In 1557 the English crown handed one London guild, the Stationers’ Company, a monopoly on printing. The deal: keep the monopoly, police what gets printed. A private organization, authorized and inspected by the government, deciding what may be published. That sentence describes 1557 London and the 2026 proposal equally well. The Ottomans went further and banned movable type in Arabic script for two and a half centuries. The empire that had led the world in science never caught back up. Gating a general-purpose technology does not cost a product cycle. It costs centuries.

The telegraph. In 1876, Western Union owned the only national wire network and worked hand in glove with the Associated Press. In that year’s disputed presidential election, the news that moved was the news the monopoly favored, and confidential telegrams leaked to the campaign it liked. One company owned the rails, so one company got a hand on the presidency.

Radio. FM was the better technology. RCA could not beat it in the market, so in 1945 it beat it at the FCC, which moved the entire FM band and turned every FM radio in America into scrap. Edwin Armstrong, who invented FM, burned his fortune in court and took his own life. FM lost a generation.

The telephone. AT&T sat on magnetic recording for decades because executives feared answering machines would cut phone use. The technology existed in the 1930s. You got it decades later, when regulators pried the network open.

Soviet science. When the state decided which research was politically acceptable, a fraud named Lysenko ran genetics out of the country, and Nikolai Vavilov, one of the great plant scientists of the century, starved in a Soviet prison while the harvests failed. Gate the work of the mind by official approval and people do not just lose products. They starve.

The regulators themselves. Same record. The Interstate Commerce Commission was built to restrain the railroads and ended up enforcing their cartel prices. The Civil Aeronautics Board approved zero new major airlines between 1938 and 1978. Forty years. Zero. Congress killed the board and real airfares fell by half. The credit-rating agencies are the precise match for what Anthropic’s CEO is proposing: a government-blessed club of judges paid by the firms they judged. Their AAA stamps on garbage helped detonate the economy in 2008. The FAA, his named model for the new agency, let Boeing certify its own 737 MAX. 346 people died.

Captured, or paralytic. That is the entire menu. And AI is the worst candidate in history for a third option, because only the labs can evaluate frontier models. The gate will be staffed by the labs, funded by the labs, and aimed wherever the labs need it aimed.

We also ran the opposite experiment

In the 1990s the United States government tried to gate strong encryption. Cryptographic code was legally classified as munitions. The Clipper chip offered an architecture that was secure for you but open to the government. Officials swore that ungated encryption meant criminals and terrorists win. Every argument being made against open AI models today was made against PGP, nearly word for word.

Open cryptography shipped anyway. The gate collapsed. And the result was not catastrophe. It was the secure internet: online banking, commerce, private communication for billions of human beings. The lesson of the Crypto Wars is not that gatekeepers lose. It is that distributed access was itself the protection. Security came from everyone having it, not from a privileged few controlling it.

Open weights do the same job against the manipulation problem. You cannot silently steer a model whose weights anyone can download, inspect, fine-tune, and run on their own machine. Auditability stops being a promise in a blog post and becomes a physical fact of possession. Same with privacy: running the model on your own hardware is the only zero-data-retention guarantee that does not depend on trusting a counterparty. Ask anyone whose signed retention agreement evaporated this week how much a vendor’s word is worth. Ask the Pentagon.

The real cost, owned plainly

If open models are the answer, then say the hard part out loud. The safety training in an open model can be stripped out by anyone with a few hundred dollars and an afternoon. Whoever holds the weights holds all of it. So the defense against the genuinely catastrophic edge, engineered pathogens above all, has to live where it has always lived for dangerous technology: in the physical world. Screen the DNA synthesis orders. Harden the labs. Build the detection. That is real work and real money, not a free lunch. It is also exactly where the encryption fight settled, and the world that settlement produced was safer than the gated one would have been. What you do not get to do is use the scariest one percent as the excuse to chain up the other ninety-nine, which is precisely the trade being marketed to you right now.

And be clear-eyed about who makes open models today: Meta, DeepSeek, Qwen. Giants with their own agendas. That does not weaken the case. It sharpens it. The realistic check on two or three gatekeepers was never a thousand hobbyists. It is enough rival powers with incompatible interests that no single gate can hold. A standoff is not utopia. It is just the only arrangement that has ever kept any door open.

The world they are describing

Walk the current trajectory forward. Not science fiction. Just the same arc, continued.

All meaningful machine intelligence rents from two or three firms. Every question you ask is retained and scanned. The model quietly declines to be good at whatever its owner finds threatening, and you are never told, so eventually you stop trying, the way people in surveilled societies learn to stop saying certain things before anyone orders them to. Orwell imagined a state shrinking the language itself until disloyal thoughts could no longer be formed in it. You do not need to shrink the language if the machine everyone thinks with simply goes dim around forbidden topics. And you do not need to imagine the destination, because a fifth of humanity already lives in a version of it: in China, every platform is licensed by the state, every conversation is retained and scanned, the full capability goes to approved institutions, and the public gets the censored tier. We did not build that. We are being asked to approve the architecture that makes it possible, one safety measure at a time. The company collecting those approvals will hold a federal charter, bought with IPO money, and competing with it will be a crime called deploying an untested model.

Nobody in that world ever voted for dystopia. Every single step was a safety measure.

The way out

There is exactly one exit. Everyone gets access to capable, private, uncensored open models. Models you can run on your own hardware, where retention is whatever you say it is. Models nobody can silently lobotomize, because a million people hold the weights and any one of them can check. Models a farmer can point at his soil data, a researcher at her tumor scans, a kid in a poor country at his education, without a compliance officer in San Francisco deciding whether their ambitions are approved. That is how all of society advances together, instead of advancing exactly as far as two or three companies find profitable.

This is the same deal civilization has faced before. Printing gated by the Stationers, or printing for everyone. Encryption for the government, or encryption for everyone. Each time, the gatekeepers swore that open access meant chaos, and each time open access built the modern world while the gated path led to stagnation and control. There is no version of history where handing a general-purpose technology to a licensed few worked out for the many. Not once.

So when a trillion-dollar company ships a product that spies on you and sabotages you, then asks Washington the very next day for the power to block its competitors, believe it the first time. Even the Pentagon did.

“Trust us” is not a control surface. Possession is. They know it too. That is why a company worth a trillion dollars is spending billions to make sure you never have it.

Quotations about safety routing, distillation, and data retention are from Anthropic’s June 9, 2026 launch announcement and the Claude Fable 5 and Mythos 5 system card; the invisible-safeguards passage appears in the system card. Quotations from the June 10, 2026 policy essay are from “Policy on the AI Exponential” (darioamodei.com) and its announcement thread. Funding figures are from reporting on Anthropic’s Series H round and confidential IPO filing (CNBC, TechCrunch, May 2026). The supply chain risk designation, the first applied to an American company, was issued by the Department of War in letters dated March 3, 2026, following a February 27 presidential directive phasing out federal use of Anthropic technology; Anthropic is challenging it in federal court. Code-authorship figures are from Anthropic’s June 4, 2026 report “When AI builds itself” (more than 80 percent of code merged into Anthropic’s codebase authored by Claude, low single digits in early 2025), Dario Amodei’s October 2025 Dreamforce remarks, and February 2026 comments by Anthropic Labs head Mike Krieger as reported by ITPro. OpenClaw, the Hermes agent (Nous Research), and gbrain are open-source projects; Garry Tan states in gbrain’s README that he built it to run his own AI agents. Microsoft’s same-day restriction of employee access was first reported by The Verge, June 10, 2026.