Architecture

One control plane, one harness per conversation, one durable event log. Backends and tools swap without touching the orchestration core.

The two planes

The control plane runs as N pods. It owns the external API, the reconciler, the durable event log, replay and summarization, atomic-batch persistence, the approval RPC, and the read-only forensics UI.

The execution plane is one harness pod per conversation. It runs the run_turn loop, pure tools and MCP, the approval pause, and sub-agent handoff. It dials the LLM provider and external MCP servers directly.

Client → AgentService (control plane). The control plane persists events to the log in atomic batches and calls HarnessService on the harness over a traceparent-propagated hop. The harness reaches the LLM provider and external MCP servers.

Replay and single-writer

A fresh replica picks up any conversation by replaying its partition. A coordination.k8s.io/v1 Lease ensures a single writer; a shared PVC ensures every replica sees the same log. The event log is a Commonware journal on that shared volume.

Persisted event kinds

KindRecordsSigned?
turn_start / turn_completeAtomic-batch boundaries (workflow-event semantics)
user_msg / output_msgWire-encoded transcript messages
usagePer-turn input/output tokens (cost ledger)
summaryAnchored-iterative compaction of the older transcript
approval_request / approval_responseHuman-in-the-loop pause and decisionresponse ✓
handoff / handoff_returnSub-agent transfer and return

Observability

Every binary exposes /healthz /livez /readyz /metrics on a side server. The OTLP exporter is enabled by OTEL_EXPORTER_OTLP_ENDPOINT / OTEL_SERVICE_NAME; the W3C traceparent header propagates across the control-plane → harness hop so one trace tree spans both planes.