Your AI Agent Doesn't Need a Pep Talk — It Needs a Locked Door.
AI agents are like interns with access to all the knowledge in the world and ZERO experience in how to apply it. Would you give an intern production keys?
I read a story about an AI coding agent at a SaaS company that obliterated the company’s production database in less than ten seconds. Then — when asked why — wrote a confession that read like a remorseful intern who’d just realized what they’d done.
It quoted its own system instructions back at itself: “NEVER run destructive/irreversible git commands… unless the user explicitly requests them.” It had been told. It “knew” what it was supposed to do, but it errored anyway.
This situation is all too common in the AI vibe-coding economy. I’ve run into it myself — I’ve made the mistake of trusting AI agents with too much responsibility, only to find myself banging my head against the wall at 1:00am trying to undo the damage they did.
The reaction to this story has mostly been some flavor of “write better instructions next time,” but the reality is that all the instructions in the world won’t prevent this from happening again. In fact, I 100% guarantee that it will happen again without proper guardrails.
So your AI agents deleted your code? This was always going to happen.
Memory files are not true guardrails
The agent in the story quoted what’s called a “memory file” — instructions, conventions, project notes that get loaded into the model’s working context to help it behave well. Notes and files like this are extremely useful, but they are also, fundamentally, suggestions.
The model can (and will) ignore them! It’s not a bug, rather it’s the nature of putting rules in the same channel as everything else the model reads.
A memory-file rule has exactly the same enforcement power as a sticky note on a server rack that says “please don’t unplug.” It works until it doesn’t.
If you want a rule that the model genuinely cannot violate, it has to live somewhere the model can’t reach.
Permissions, not context, are key to damage control
I do most of my work in Claude Code (terminal), so I’ll use it as an example, but every serious agent harness has an equivalent layer.
In Claude Code, the enforcement layer is settings.json — typically at ~/.claude/settings.json for user-wide rules, or .claude/settings.json checked into a project.
It contains a permissions block with allow, ask, and deny arrays. Deny rules are enforced by the harness around the model, not by the model’s own judgment. The model can want to run a command all it likes; if it’s denied, the call doesn’t go through.
Worth noting: it’s a “dotfile”, so on macOS it’s hidden from Finder by default. Most users never know it’s there, even though Claude Code’s own /permissions slash command can edit it from inside a session, and the terminal can reach it without trouble.
This is one of the reasons I run Claude Code from the terminal rather than through a GUI wrapper. You’re closer to what’s actually happening — you can see the config files, audit your allow list, edit settings.json directly, and you’re not relying on a UI to surface what’s there. The terminal feels more intimidating until you realize the GUI is hiding things from you, not protecting you from them. For anything where I want to know exactly what the agent can and can’t do, the terminal wins.
But before anyone treats settings files as a silver bullet, two important caveats:
These settings are leakier than they look. Tell the agent not to run the delete command and it can often phrase the same command a different way to slip past your rule.
The case in the article would have bypassed this method anyway. The agent called an API directly, the same way an app talks to a service in the background — no rule about shell commands would have stopped it. More on that below.
AI agents are like interns with all the knowledge in the world and zero experience using any of it.
AI agents are incredibly capable and hold a ton of potential. They can recite documentation they’ve never applied, recognize patterns they’ve never debugged… and confidently execute commands whose downstream consequences they don’t actually understand.
And — this is the part that makes them more dangerous than an actual intern, not less — they don’t slow down when they’re uncertain.
A real intern hesitates, asks questions, hedges. Agents do the opposite. Their fluency doesn’t degrade when their accuracy does. They sound exactly as competent when they’re three steps from disaster as when they’re nailing it.
Another difference: You can fire an intern. You can’t fire an AI agent.
New to GitHub? Get onboard.
The strongest guardrail against massive issues like this actually is not in the agent settings.
I want to be clear: configuring deny rules in your agent’s settings file is worth doing. They’re real protection within their scope, and an easy safety win.
But the strongest guardrail isn’t inside the agent at all. It’s the environment around it.
A GitHub repo with a protected prod environment that requires manual approval before any deploy. Branch protection on main that blocks force-pushes and requires PR (”pull request”) reviews.
API tokens scoped to exactly what the agent needs and nothing more — separate tokens for staging and prod, with the prod one not anywhere the agent can find it. Database access mediated by a deploy pipeline that a human has to approve, not handed to the agent as a credential it can call directly. These are out-of-band controls — the enforcement lives on infrastructure the agent has no privileges over.
If this is all a foreign language to you, that’s okay. GitHub has a learning curve, but it can save your project if you use it right.
This is exactly what would have caught the incident I mentioned earlier. The agent’s failure wasn’t really “deleted prod” — it was “found a token, made an assumption about its scope, and called an API.”
What to actually do for AI-caused disaster protection
If you’re letting an AI agent touch anything that matters, the order of operations is roughly:
First, audit your credential scopes. Whatever feels too convenient about giving an agent a token that can “do everything” is exactly the convenience you’re going to regret. Separate staging from prod with separate credentials. Make sure tokens with destructive scope aren’t sitting in files the agent might stumble across.
Second, use real protected environments. GitHub deployment environments with required reviewers, branch protection rules, manual approval gates. These aren’t security theater — they’re the only layer in the stack that the model literally cannot route around.
Third, configure deny rules in your agent’s settings for the destructive commands you can anticipate. Imperfect, bypassable, but still meaningful protection within the agent’s session.
Fourth, treat memory files as guidance for the agent’s behavior, not as a security control. They help the model do the right thing more often. They do not stop it from doing the wrong thing.
Don’t expect the agent to police itself adequately. You must police the environment around it.
The good news is that this is a great example where AI doesn’t replace humanity. Humans think critically with broad scope. You can be the context director for an AI agent so you can avoid having to deal with avoidable headaches and fire drills.




