Architecture
One control plane, one harness per conversation, one durable event log. Backends and tools swap without touching the orchestration core.
The two planes
The control plane runs as N pods. It owns the external API, the reconciler, the durable event log, replay and summarization, atomic-batch persistence, the approval RPC, and the read-only forensics UI.
The execution plane is one harness pod per conversation. It runs the run_turn loop, pure tools and MCP, the approval pause, and sub-agent handoff. It dials the LLM provider and external MCP servers directly.
Client → AgentService (control plane). The control plane persists events to the log in atomic batches and calls HarnessService on the harness over a traceparent-propagated hop. The harness reaches the LLM provider and external MCP servers.
Replay and single-writer
A fresh replica picks up any conversation by replaying its partition. A coordination.k8s.io/v1 Lease ensures a single writer; a shared PVC ensures every replica sees the same log. The event log is a Commonware journal on that shared volume.
Persisted event kinds
| Kind | Records | Signed? |
|---|---|---|
turn_start / turn_complete | Atomic-batch boundaries (workflow-event semantics) | — |
user_msg / output_msg | Wire-encoded transcript messages | — |
usage | Per-turn input/output tokens (cost ledger) | — |
summary | Anchored-iterative compaction of the older transcript | — |
approval_request / approval_response | Human-in-the-loop pause and decision | response ✓ |
handoff / handoff_return | Sub-agent transfer and return | ✓ |
Observability
Every binary exposes /healthz /livez /readyz /metrics on a side server. The OTLP exporter is enabled by OTEL_EXPORTER_OTLP_ENDPOINT / OTEL_SERVICE_NAME; the W3C traceparent header propagates across the control-plane → harness hop so one trace tree spans both planes.