Architecture

An agent that executes untrusted tool calls shouldn't share a process with your keys, your log, or anyone else's conversation. So edges ride one public API into a control plane that owns the signed event log, every conversation runs its turns in its own sandboxed harness, and any state is rebuilt by replaying the log.

unary callstreamasync / eventual

The two planes

The control plane runs as any number of replica pods. It owns the public API, the Kubernetes reconciler, and the durable event log — plus everything built on the log: replay, summarization, atomic-batch persistence, the approval RPC, and the read-only forensics UI.

The execution plane is one harness pod per conversation. The harness runs the run_turn loop, executes pure tools and MCP calls, pauses for approvals, and hands off to other agents. Every turn-loop LLM call and external MCP call goes through the harness; the control plane reaches the provider only for transcript summarization and classification.

A client calls AgentService on the control plane. The control plane persists events to the log in atomic batches, then calls HarnessService on the harness, propagating the traceparent header across the hop. The harness reaches the LLM provider and external MCP servers.

One turn, end to end

A single turn touches every part of the system. Step through it — or press play — to watch a message travel from an edge (chat, web, email…), through the control plane, reconciler, event log and harness, and back.

1/13

Client → Control plane. The client opens a streaming connection; the edge verifies the caller and derives a stable conversation id from its native identifiers.

unary callstreamasync / eventual

The run_turn loop

Inside the harness, run_turn is the agent. The model can do only two things: emit text or request a tool. The loop runs the tool, feeds the result back, and repeats, up to MAX_STEPS. Two reserved exits suspend the turn for the control plane to resolve: a __handoff_to call, and a tool that needs approval.

Replay and single-writer

Any fresh replica can pick up a conversation by replaying its partition of the log. A coordination.k8s.io/v1 Lease guarantees one writer at a time, and a shared PVC gives every replica the same view. The log itself is a Commonware journal on that shared volume.

Payments

A turn can pay for what it fetches. When a tool hits 402 Payment Required, the harness proxies the charge to the control plane, which settles it and returns the body plus a signed receipt. Signing happens only on the trusted plane — the sandbox that runs the turn holds no key and never sees one. Payments are a foundation capability (polyc-payments), reached through the same proxy seam as any other trusted-side tool, so the orchestration core never special-cases a payment rail or a vendor.

Non-custodial by construction. The server never holds a balance. A linked wallet's funds stay in the caller's own on-chain wallet; the control plane holds only a spend- and call-scoped, time-bounded access key the caller authorized from their passkey. That delegation always expires (bounded by MAX_DELEGATION_LIFETIME_SECS); once it lapses, the next payment fails closed and prompts the caller to re-authorize with a fresh passkey approval.

Two gates guard every spend. A per-request human approval pauses the turn before any money moves — the same approval_request / approval_response pause the loop uses for any sensitive tool. A settled payment then writes a signed outbound_payment_receipt to the event log, binding the on-chain reference to the approved call, so every spend is replayable and independently auditable. Key custody rides a SecretStore seam — a per-wallet Kubernetes Secret in-cluster, a local directory in development, never the event log. Payments covers the wallet-link flow and the self-service tools.

Persisted event kinds

Kind	Records	Signed?
`turn_start` / `turn_complete`	Atomic-batch boundaries (workflow-event semantics)	—
`user_msg` / `output_msg`	Wire-encoded transcript messages	—
`usage`	Per-turn input/output tokens (cost ledger)	—
`summary`	Anchored-iterative compaction of the older transcript	—
`approval_request` / `approval_response`	Human-in-the-loop pause and decision	response ✓
`handoff` / `handoff_return`	Transfer to another agent and return	✓
`outbound_payment_receipt`	Settled agent payment: on-chain reference bound to the approved call	✓

Observability

A turn crosses two planes, several pods, and at least one external provider — so the observability story is built around following a single request across all of them. Every binary wires up the same side server with the same surfaces; learn one pod and you have learned the fleet.

Surface	Question it answers
`/healthz` `/livez` `/readyz`	Is this pod alive and ready? The probes Kubernetes acts on, identical on every binary.
`/metrics`	What is the fleet doing in aggregate? A Prometheus scrape target on the same side server.
OTLP traces	What happened inside one turn? Spans from both planes, stitched into a single tree.
Logs	What did a specific binary say? `RUST_LOG` and `RUST_LOG_FORMAT` tune the tracing subscriber.

Tracing is what makes the split-plane design debuggable. The W3C traceparent header rides the control-plane-to-harness hop, so one trace tree follows a turn from the public API, through persistence and the reconciler, into the harness, and out to the LLM provider. Set OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_SERVICE_NAME to turn the exporter on — there is no separate tracing wiring per binary.