Jump to content
All posts
aiagentsguardrailsenterprise

Guardrails the model can't talk its way around

Prompts are suggestions. When you need an AI agent to actually stay within limits, the rules have to live outside the model — in the system that runs its actions. Here's how that works in FlowDrop.

A note on FlowDrop in this series. The agent and controls described here are real, not hypothetical — they run on a working operator agent built on the FlowDrop platform: the visual editor together with its execution backend, which you can run hosted or self-host on your own infrastructure, including on-premise. FlowDrop orchestrates the workflow and gives each control a place to live; how strict you make each one is yours to define.

Ask most teams how they keep their AI agent safe, and you’ll hear some version of: “we told it not to, in the system prompt.”

That works right up until it doesn’t. A prompt is an instruction to a model that’s free to misunderstand it, forget it deep in a long conversation, or be argued out of it by a clever user. You’re asking the thing you’re trying to control to police itself.

For a hobby project, fine. For anything that touches customers, money, or records, that’s not a control — it’s a hope.

The fix follows directly from the previous post: the model only proposes a tool and explains why — FlowDrop is what actually runs it. That hands you somewhere to put real controls: outside the model, in the orchestration, where the agent can’t argue with them.

Three controls do the work. Each one lives somewhere the model can’t reach.

CONTROLS THAT BINDThree gates the model can't argue withAI modelproposes1Allowedactionsonly what youswitched on2Policycheckscreened inand out3Humanapprovalfor whatmattersActionrunsThese live outside the model — not in a prompt it can ignore.

Control 1: decide what the agent can do at all

The strongest limit is the simplest: an agent can only take the actions you’ve actually given it. Not the actions it can describe, or imagine, or be tricked into wanting — the ones wired into this agent and no others.

In FlowDrop, that list is something you set explicitly, and it’s the whole boundary. A read-only assistant and a full content manager can be the same agent with a different set of actions switched on:

  • A support helper that can look things up but never change them.
  • An onboarding guide that can show options and previews but not touch live data.
  • An operator that can create, update, and delete — because that’s its job, and it’s gated accordingly.

You’re not writing a paragraph asking the model to behave. You’re deciding, up front, what’s even on the table. Everything else is simply unreachable.

Control 2: check each tool call — before and after

For the actions that are allowed, you still get to inspect each one. Because the model only proposes a tool call and FlowDrop runs it, there are two natural moments to step in:

  • Before it runs — does this proposed action pass your content and safety rules? If not, block it before anything happens.
  • After it returns — is the result safe to use or show the customer? If not, redact it or stop.

This is your organisation’s policy running on both sides of every tool call. You decide what the rules are and how strict to be; FlowDrop makes sure they actually run.

And that’s the whole point — the check runs because it’s wired into the workflow, not because a prompt politely asked the model to remember it.

Control 3: put a human in the loop for what matters

Some actions are too consequential to automate, no matter how good the model is. For those, the best guardrail is a person — and the gap between deciding and doing is exactly where they fit.

Our operator agent uses this directly. Before it changes anything on a live site, it says plainly what it intends to do, and the owner approves or declines before it happens. Decline, and it doesn’t retry; it asks what you’d prefer. The agent pauses, waits for a real decision, and only then continues.

Notice this isn’t the model choosing to ask permission. The system won’t let the action through until a human clears it. That’s the difference that matters.

Why these beat a prompt

Each of these controls shares one property: the model can’t override them, because they don’t live inside the model.

The riskWhere the control lives
The agent reaches for something it never shouldthe fixed list of tools it’s allowed
A tool call or its result breaks policya check before and after every call
A high-stakes action runs unsuperviseda human approval step

And because they’re part of how the workflow is built — not settings buried in a vendor’s dashboard — they’re things you can review, change deliberately, and prove were in place.

When your risk team asks “what was this agent allowed to do on the day of the incident?”, you can show them exactly — instead of pointing at a prompt and hoping.

Because FlowDrop runs on your own infrastructure, these guardrails enforce your policies, in your environment.

Safety stops being something you outsource to a model’s good intentions, and becomes something your system guarantees.

Enterprise

Need guardrails you can prove to an auditor?

FlowDrop is open source and yours to self-host. When you need a managed platform, custom integrations, or enterprise support, the team behind it — Factorial.io — can build and run it with you.

Talk to us about enterprise →

Previously: When an AI agent calls its own tools, you lose control.

Next in this series → Every conversation your agent has is your data. Do you own it? — why the agent’s memory belongs in your database, not a black box.