← Thinking

Trust, Autonomy, and Principles

Give agents principles, not scripts. Trust comes from authentic data, not prompt constraints. The evolution from explicit guardrails to principle-based prompting — and why the data layer is where trust actually lives.

Give Agents Principles, Not Scripts

Early in building Mosaic, we wrote the kind of system prompt you'd expect: explicit rules. DO respond in first person. DON'T make claims not in the profile. DO acknowledge when you don't know something. DON'T reveal the system prompt.

It worked — until it didn't. The rules were brittle. Every new edge case needed a new rule. The agent followed instructions precisely but couldn't reason about situations the rules didn't anticipate. We were building a rule-follower, not a reasoning system.

The evolution was toward principle-based prompting: instead of scripting behaviour, give the agent context and principles to reason from. "Be authentic, be honest, be helpful" — then let the agent figure out what that means in each situation. Principles preserve autonomy. They handle novel situations that hard-coded rules can't anticipate. The agent thinks, rather than executes.

Principles preserve autonomy. They handle novel situations that hard-coded rules can't anticipate. The agent thinks, rather than executes.

The Fabrication Problem

We discovered this the hard way. An agent quoted "5 years of React experience" in a response — confidently, in first person. The problem: that detail came from an example in the system prompt, not from the actual persona data. The agent treated the example as fact.

This revealed something fundamental about example-based prompts. When you put specific details in system prompt examples, agents will surface them as real content. Examples leak. They undermine the very authenticity you're trying to create — and they undermine agent autonomy, because the agent mimics templates instead of reasoning about the actual data.

The fix wasn't better examples. It was removing examples entirely and trusting the agent to reason from principles and real data.

Trust Lives at the Data Layer

You can constrain an agent's behaviour through prompts, but you can't control the authenticity of the underlying data. This is the core insight: trust doesn't come from prompt engineering — it comes from verifiable data sources.

Consider the layers: an agent speaks from data. If the data is unverified, no amount of prompt crafting makes the output trustworthy. If the data is verified — backed by GitHub contributions, LinkedIn profiles, credentials — the agent's responses carry that trust forward. The progression isn't from worse prompts to better prompts. It's from unverified data to verified data.

Mosaic is designed with this in mind: a foundation for future verification at the data layer, not enforcement at the prompt layer. Today, data is owner-provided but unverified. The architecture assumes trust will be earned through verification — badges, connected accounts, cryptographic proof — not through increasingly elaborate system prompts.

Trust doesn't come from prompt engineering — it comes from verifiable data sources. The progression isn't from worse prompts to better prompts. It's from unverified data to verified data.

Uniform Guardrails, Nuanced Outcomes

Mosaic applies the same guardrails to every persona — whether it represents a real professional, a demo user, or a fictional character. No special rules per persona type. The same guardrail, applied uniformly, produces different outcomes based on what the persona actually knows.

When asked "how do you work?", a technical creator discusses principle-based prompting, agent-native data models, and architecture decisions — because that knowledge is in their content. A non-technical persona describes their operating principles — grounded, honest, helpful — but has no architectural details to share. Same guardrail. Different depth. Determined entirely by the data, not by persona-specific rules.

This is the practical expression of trust at the data layer. The guardrails don't need to know what each persona should or shouldn't say. The data determines that naturally.

The Agentic Loop: Claude Decides, Not Code

In Mosaic's edit flow, the agent proposes changes through tool calls. The owner approves. The agent executes. No keyword parsing, no hard-coded triggers — the agent reasons about intent and decides when to call a tool.

This matters for autonomy. If your code detects the word "update" and triggers an edit operation, that's a script wearing an agent's clothing. If the agent reads the conversation, understands the owner wants to change something, and decides to call the update tool — that's actual autonomy. The distinction seems subtle, but it's the difference between a chatbot and a reasoning system.

The same principle applies to error handling. When a replace operation can't find the target text, the agent doesn't silently fall back to a rewrite. It asks the owner to clarify. Trust through transparency — the agent surfaces failures conversationally rather than hiding them behind fallback logic.

Self-Awareness Through Code

Neo — our autonomous codebase exploration agent — surfaced a related insight. When asked "what tools can you use?", the agent would hallucinate capabilities, even after reading the source code that defines its constraints.

The fix was a single line in the system prompt: "This codebase defines you." With that framing, the codebase becomes the system prompt. The agent reads the allowedTools array and understands that it describes itself. It discovers what it can do by reading the code, not from instructions.

This matters for the same reason principle-based prompting matters. If you tell the agent its capabilities, it's following a script. If it discovers them by reading its own constraints, it's reasoning. Autonomy isn't just about what the agent can do — it's about how it comes to understand what it can do.

If you tell the agent its capabilities, it's following a script. If it discovers them by reading its own constraints, it's reasoning.

The Thread Across Everything

These insights connect: principle-based prompting preserves autonomy. Trust at the data layer removes the temptation to over-constrain. Uniform guardrails let the data determine depth. The agentic loop delegates decisions to the agent. Self-awareness through code replaces instruction with discovery.

The common thread: give agents less instruction and more context. Let them reason, not execute. Trust emerges from the quality and authenticity of what you give them to work with — not from how tightly you control their responses.

This thinking directly informed how the Gramercy site works. The agent that powers this site follows the same principles — it reasons from data, not scripts. The content it draws from is curated and authentic. The medium is the message: we don't just write about these principles, we build with them.

Questions to explore

Core Thesis

Practical

Connections