The Dual-Voice Architecture

Intent drives the agent's voice, capabilities, and purpose. One interface, one conversation — the agent reads what you want to do and adapts. Visitors hear the persona speak as itself; owners get a collaborative helper. No mode switch. No dashboard. The agent replaces the CMS entirely.

Intent Drives Everything

The traditional web app asks "who are you?" first. Credentials, login forms, role-based access. Identity determines capability.

The agent-native approach inverts this: "what do you want to do?" Intent determines capability. Identity follows when needed.

In Mosaic, a visitor arrives and starts talking. No login prompt. No gate. The agent speaks as the persona — first person, in their voice. The visitor explores freely. When they express intent to edit — "I'd like to update my summary" — the agent prompts for authentication. Credentials serve intent, not the other way around.

This is the dual-voice architecture. One interface, one conversation. The agent reads intent from context and responds appropriately. Want to explore? The agent speaks as the persona. Want to manage? Prove you're the owner, and the agent shifts to a collaborative helper. The voice, the capabilities, and the purpose all follow from intent — with authentication as the mechanism that enables certain intents.

The traditional approach asks "who are you?" first. The agent-native approach asks "what do you want to do?" Intent determines capability. Identity follows when needed.

Two Voices, One Conversation

The dual-prompt design is two system prompts that receive the same persona data but frame it differently based on intent.

The interviewer prompt says: "You are [name]. This is your Mosaic. When someone asks you a question, you respond as yourself, in first person." The agent draws from the persona's content and speaks as them — grounded in what's there, honest about what isn't.

The owner prompt says: "You are helping [name] update their Mosaic persona." The agent becomes a collaborative editor — it has access to update tools, understands the difference between content and preferences, and proposes changes for the owner to approve.

The voice switch happens in the prompt, not in code. There's no mode variable, no UI toggle. The backend routes to the appropriate endpoint based on the session's authentication state, and each endpoint uses a different system prompt. The conversation continues seamlessly — the owner can shift from exploring to editing within the same thread.

The owner can also preview: "How would that sound to an interviewer?" The agent recognises the preview intent and temporarily switches to first-person voice, responding exactly as a visitor would experience it. Preview and editing in the same conversation. No mode switch. The agent reads intent and adapts.

No Dashboard

The agent replaces the CMS entirely. No admin panel. No forms. No WYSIWYG editors. Content management happens through natural language conversation.

The owner says "add that I work with Python." The agent proposes the change. The owner approves. The agent executes via tool call — writing directly to the data store. Want to preview? Ask. Want to undo? Say so. Want to upload a document? Drop it in.

This is the agent-as-CMS pattern. The owner doesn't need to understand the data model, the field structure, or the storage architecture. They talk, the agent handles the rest.

Adding new capabilities doesn't require new pages or components. Document uploads, metadata updates, preview questions — they're all conversation. The agent reads the intent and decides what to do. This is the practical expression of the principle from Article 1: Claude decides, not code.

Operations, Not Replacements

The first version of the edit tool required full-field replacement. To add one line, the agent had to regenerate the entire content field as output tokens. At 4,000 characters, a single-line addition took 3.6 seconds. At 15,000 characters, it would take over 60 seconds. Performance degrades linearly with content length.

The fix: operation-based mutations. Three modes — append (add to end), replace (find exact passage, substitute), and rewrite (full replacement, used sparingly). The agent chooses the most efficient operation.

Adding a skill? Append — 3.6 seconds regardless of content length. Updating a project description? Replace — find the passage, swap it out. Reorganising everything? Rewrite — the slow path, only when necessary.

When a replace operation can't find the target text, the agent doesn't silently fall back to a rewrite. It asks the owner to clarify. Trust through transparency — surfacing the problem rather than hiding it behind fallback logic.

Adding a skill takes 3.6 seconds whether your content is 1,000 characters or 50,000. Operations scale; full replacements don't.

Security: Binding Auth to Sessions

The original authentication architecture had a vulnerability. Authentication state was stored per-persona in KV: persona:andrew:authenticated = true. This is shared state — if the real owner authenticates, any request that checks that key gets true regardless of who's making the request.

The fix: server-side tokens. When the owner authenticates, the backend generates a cryptographic UUID, stores it in KV with a TTL, and returns it to the client. Every subsequent request includes the token. The backend validates the token exists before processing any edit.

Even if someone manipulates localStorage, they don't have a valid server-side token. Authentication binds to a specific session, not to the persona. Multiple devices work naturally — each authentication generates a new token. Sign out deletes the specific token.

Intent-Based OAuth

The same intent-driven principle extends to MCP. When an external AI tool connects to Mosaic's MCP server, the authorisation flow starts with: "What would you like to do?"

"Explore personas" grants read-only access — no credentials needed. "Manage my persona" grants read-write access — handle and password required. The scope of the session is determined by stated intent, not credential level.

This is a more accessible model than traditional OAuth flows that ask for identity upfront. The visitor doesn't need to understand scopes or permissions. They understand intent. The system maps intent to the appropriate scope and authentication requirements.

The constraint: MCP authentication happens at connection time only. There's no standard way to upgrade permissions mid-conversation (Article 3 covers this protocol limitation in detail). The mitigation is to surface intent clearly upfront — make it obvious that read-only access can't be upgraded without reconnecting.

The Thread

Intent-driven architecture connects to everything else. The agent reads conversation context and decides what's appropriate — that's the principle-based autonomy from Article 1. The content it edits is the freeform, curated data from Article 2. Via MCP (Article 3), the same dual-voice pattern operates under different constraints — intent captured at connection time rather than mid-conversation.

The pattern extends to this site. The Gramercy agent uses the same architecture: visitors interact with the entity voice; authenticated admin manages content through conversation. No CMS dashboard. No admin panel. Intent determines what the agent does and how it speaks.

One question — "what do you want to do?" — shapes the entire experience. The voice, the capabilities, and the purpose all follow.

Questions to explore

Architecture

Practical

← All Thinking