I've spent enough years guarding production databases to flinch whenever someone says "just give the agent read access to the customer table." That sentence is where most data leaks start. So when I started playing with Pi, the minimal coding agent from pi.dev, the first thing I wanted to test wasn't whether it could write code. It was whether I could stop it from handing real credit card numbers to another agent that had no business seeing them.
Pi is a good place to try this precisely because it's small. Four tools — read, write, edit, bash — under a thousand tokens of instructions. No MCP, no sub-agents, no permission popups baked in. The pitch is "adapt Pi to your workflows, not the other way around," and you customize it with skills, extensions, prompt templates, and a SYSTEM.md. That minimalism matters for what follows. When the harness isn't doing much, the boundary you build is the boundary that exists. There's no hidden middleware quietly logging the raw payload somewhere you forgot about.
The setup
Two agents. Agent A sits on top of a customer database — names, emails, credit cards, social security numbers, the usual sensitive columns. Agent B is building its own service and needs a list of users to seed a table and run tests against. Classic internal data request. The lazy version is to let B query A's database directly, or worse, give B the connection string. Now you have two systems holding the same live PII and twice the blast radius.
The version I want: B asks A for users, A answers, and the real card numbers and SSNs never leave A's process. B gets records of the same shape — valid-looking, same field types, enough to insert and test — but masked. 4111 1111 1111 1111 becomes 4111 **** **** 1111. 123-45-6789 becomes ***-**-6789. Names and emails get tokenized. B can do its job. B cannot exfiltrate what it never received.
The two agents talk over Pi's RPC mode, which is just JSON over stdin/stdout. So this is an agent-to-agent handoff, not a shared database. That distinction is the whole point.
Mask at the producer, never trust the consumer
Here's the part people get wrong. The instinct is to put the rule on B: "don't store the raw values," "redact before you log." That's trusting the consumer, and the consumer is the side you don't control. Maybe B is fine today. Maybe next quarter someone forks B's prompt and the redaction step quietly falls off. You'll never know until it shows up in a log aggregator.
The rule belongs on A, because A is the only side that ever touches the real data. A masks before the bytes cross the wire. By the time anything reaches RPC, the sensitive values are already gone. B's behavior — careful, sloppy, compromised — stops mattering, because there's nothing dangerous left for B to mishandle.
This is just data minimization at the boundary, the same principle a DBA uses when deciding which columns a reporting role is even allowed to select. You don't hand out the full row and ask people to be polite about it.
Don't prompt for safety. Enforce it.
The tempting shortcut with any LLM agent is to write something like "never reveal full credit card numbers" into the system prompt and call it done. I don't trust that and neither should you. A prompt is a suggestion to a non-deterministic system. It holds until the model gets distracted by a clever request, or the context fills up and the instruction slides out of the window. Treating a prompt as an access control is how you end up explaining an incident.
So the masking lives in code, not in vibes. In Pi, you wrap the database read in a skill or extension so the model literally cannot emit a raw row — the tool itself returns masked data. The model never sees the real card number, which means it can't leak the real card number. The masking function is small and boring, which is exactly what you want from a security control:
import re def mask_record(row: dict) -> dict: masked = dict(row) if cc := row.get("credit_card"): digits = re.sub(r"\D", "", cc) masked["credit_card"] = f"{digits[:4]} **** **** {digits[-4:]}" if ssn := row.get("ssn"): masked["ssn"] = f"***-**-{ssn[-4:]}" if email := row.get("email"): name, _, domain = email.partition("@") masked["email"] = f"{name[0]}***@{domain}" masked["full_name"] = f"user_{row['id']}" return masked
The data A's tool hands back over RPC ends up looking like this — same schema B expected, none of the live values:
{ "users": [ { "id": 1042, "full_name": "user_1042", "email": "j***@example.com", "credit_card": "4111 **** **** 1111", "ssn": "***-**-6789" } ] }
B inserts that straight into its table. Same column types, same lengths, a card number that still passes a shape check. Good enough to build and test against, useless to anyone who steals B's database.
Why the minimal harness helps
This pattern works on any agent framework in principle. It's easier to trust on Pi because there's so little between the tool and the wire. The agent loop is small, the tool surface is four primitives, and the customization is files you can read in one sitting — a SYSTEM.md, a skill, an extension. When a security reviewer asks "where exactly does the SSN get masked, and can anything bypass it," I can point at one function and one tool wrapper and actually answer. Auditing a fat framework with plugins, hidden middleware, and a dozen injection points is a much worse afternoon.
That's the part I keep coming back to. The masking is trivial. The discipline is putting it at the producer, in code, on a harness thin enough that the boundary is the whole story. Give an agent the real customer table and a polite instruction, and you've built a leak with extra steps. Give it a tool that can only ever return masked rows, and the question of whether the other agent behaves never comes up.
