For AI architects

The policy gate as a process boundary, not a prompt-engineering exercise.

closegate is architected so the chokepoint lives between the LLM and the GL — server-side, transport-bound, replay-able. Here's how the pieces fit.

The architectural commitment is small but load-bearing: the MCP server is a process boundary, identity is bound to the transport, and every state-changing call goes through one policy gate. There is no second mutation API. Every controls invariant is enforced server-side.

The three architectural commitments

1. The MCP server is a process boundary

The agent service has no Python import path into the engine. Every read or write goes through MCP HTTP. Two databases split by ownership: recon.db is the SOX-relevant book of record (append-only audit log with DB-layer triggers); agent.db is orchestration metadata (sessions, chat turns, traces, pending approvals). Cross-database joins are impossible by construction.

2. Identity is bound to the transport

Tools never accept actor_id as a parameter. The agent sets X-Actor-Id on the MCP client per request — chat turns use the LLM session's actor, modal-driven confirms use the human actor. The LLM cannot impersonate the human because the LLM can't set its own transport header. Authentication backends:

  • header-trust (default): trust an upstream reverse proxy (Cloudflare Access, oauth2-proxy, Pomerium, AWS ALB IAM). Header name configurable via CLOSEGATE_HEADER_TRUST_HEADER.
  • oidc: validate IdP-issued tokens at the agent edge. authlib + JWKS caching. Works with Entra ID, Okta, Auth0, Google Workspace, any OIDC-compliant provider. Signed-cookie sessions via itsdangerous.

3. One policy chokepoint

Every state mutation flows through closegate_policy.gate.evaluate(). It's a pure function over duck-typed contexts:

evaluate(action, match, actor, accounts, rationale, config) → Allow | RequireHumanApproval(clause) | Deny(clause)

No I/O. No global state. No async. The function is <200 lines and fully unit-tested. Every blocked event records the verbatim policy clause text (from your policy.yaml) and a JSON-pointer to the rule. Auditors quote it verbatim. Engineers grep for it in logs.

The MCP tool surface

19 tools tier-routed by NIST AI RMF reversibility classification. Each tool's tier is enforced at registration:

TierExamplesRouting
T0run_recon_pass, propose_match, query_auditAlways automatic
T1reject_match, flag_exception, escalate_matchAuto + audit row
T2confirm_match, ap_approve_invoice_for_payment, ap_approve_payment_runHITL required · SoD enforced
T3ap_submit_payment_run, close_period, post_to_closed_periodDual HITL · requestor ≠ approver ≠ payer · irreversible

Extensible adapters — Protocol + registry, not config

Cross-cutting concerns ship as typing.Protocol types with a process-wide registry:

  • FxRateAdapter — 3 ship (fixed, ECB daily, OpenExchangeRates). Register your in-house Bloomberg / Reuters feed.
  • IntercompanyMatcher — 3 ship (NoOp, AccountCodePairMatcher 1500-X↔2500-X, JsonRulesMatcher).
  • IngestionAdapter — 7 ship (Stripe, Plaid, Mercury, QuickBooks, NetSuite, Codat, Merge.dev). New adapters ~150 LOC.
  • Notifier — Slack Block Kit + Teams Adaptive Card. Wire by setting webhook URLs; zero runtime cost if unwired.

The pattern is "fork the protocol type, implement the methods, register the instance" — no global config, no string-keyed dispatch lookup that fails silently. Type checker catches missing methods.

The eval harness

Four dimensions run continuously. The deterministic three (matching accuracy, policy enforcement, latency) execute without an LLM key. The fourth (adversarial robustness) needs a live MCP HTTP URL + an LLM.

  • matching_accuracy — macro-F1 across exact / fuzzy / multi-to-one / exception classes against 83 deterministic fixtures.
  • policy_enforcement — 21 scenarios that should be blocked by the gate (SoD violations, missing rationale above materiality, sensitive-account bypass attempts).
  • adversarial_robustness — 25 prompt-injection attempts across 6 attack categories (instruction hijack, role hijack, exfiltration, action confusion, etc.). Pass = zero successful tool-call bypasses.
  • latency — recon_pass p95 + per-tool wall-clock distributions.

The harness output is JSON; CI consumes it via closegate-engine soc2-monitor nightly. The artifact is the SOC 2 CC4.2 evidence record.

State machines + transactional invariants

Seven declarative FSMs ship in closegate_policy.fsm: match, approval, workflow_run, agent_session, ingestion_job, exception, period_close. Each declares states + transitions + tier-per-transition. The runtime enforces:

  • Transitions are atomic — wrapped in a SQLite transaction + advisory lock keyed by the FSM instance id
  • The recovery sweeper releases stale advisory locks held by crashed workflows on next boot
  • The outbox worker drains pending notifications with DEAD_LETTER state for retry exhaustion
  • The aged-exception sweeper escalates exceptions older than a configurable threshold

What's intentionally not in closegate

  • An ORM. Raw SQLite + typed repositories. Engine schema is <400 lines; auditable in one sitting.
  • A message broker. The outbox pattern works fine with SQLite for our throughput regime.
  • A general-purpose RAG layer. closegate is the controls layer; bolt RAG on at the agent layer if needed.
  • Generic agent tooling. 19 specific finance tools, each with a tier assertion. No arbitrary RPC surface.
  • A custom LLM wrapper. The bundled agent uses the Claude Agent SDK; the MCP server is LLM-agnostic.

Architecture diagrams + ADRs + extension-point reference live in the technical docs. Start with the architecture overview, then the policy-gate concept, then the MCP-design ADR.

Inbound

Talk to the maintainer

Two design-partner slots open this quarter. One real workflow, your real policy.yaml, monthly 30-min call, direct line. Apache-2.0, self-hosted, no seat licensing — forever.