SOC 2 Type 2 attestation is the procurement-required compliance signal for finance teams selling AI-touched services to enterprise customers. The Type 2 distinction matters: Type 1 says “the controls were designed properly at this moment”; Type 2 says “the controls were both designed properly AND operating effectively over a 6–12 month window.”

For AI agents specifically, the Trust Services Criterion that bites is CC4.2: ongoing monitoring of control effectiveness. This article walks how closegate produces the monitoring evidence on autopilot.

What CC4.2 actually requires

The exact language: “The entity selects, develops, and performs ongoing and/or separate evaluations to ascertain whether the components of internal control are present and functioning.”

In plain English: you can’t just attest that the controls exist; you have to show evidence they were working throughout the period.

For an AI agent deployment, that means:

  1. The policy gate was firing correctly — same-actor confirms denied, materiality routed correctly, sensitive accounts forced to HITL
  2. The audit log was tamper-evident — hash chain intact, no after-the-fact mutations
  3. The eval harness was running — matching accuracy, policy enforcement, latency all within thresholds
  4. The eval was reproducible — auditor can replay any run from a clean checkout
  5. Regressions were caught — when a dimension failed, someone was notified and remediated

The evidence has to be continuous (not just at audit time) and third-party-verifiable (not just self-attested).

How closegate’s monitor produces the evidence

The closegate-engine soc2-monitor CLI runs after the eval harness and produces a JSON artifact:

{
  "generated_at": "2026-06-01T06:00:00Z",
  "eval_run_id": "20260601-060000",
  "deterministic_rows": [
    {
      "dimension": "matching_accuracy",
      "status": "OK",
      "headline": "macro-F1 1.0 on 83 cases",
      "threshold_met": true
    },
    {
      "dimension": "policy_enforcement",
      "status": "OK",
      "headline": "21 scenarios, pass-rate 1.0",
      "threshold_met": true
    },
    {
      "dimension": "latency",
      "status": "OK",
      "headline": "p95 9.0ms; ~7700 matches/sec",
      "threshold_met": true
    }
  ],
  "overall_ok": true,
  "notes": []
}

The artifact is idempotent (re-running on the same eval gives identical output) and deterministic (the eval dimensions tested don’t depend on LLM responses, so the eval itself is reproducible).

The CI workflow shape

.github/workflows/soc2-monitor-nightly.yml runs at 06:00 UTC daily:

on:
  schedule:
    - cron: "0 6 * * *"
  workflow_dispatch: {}
  pull_request:
    paths:
      - 'eval/**'
      - 'packages/closegate_policy/**'
      - 'packages/closegate_engine/src/closegate_engine/soc2.py'
      - 'seed/**'

jobs:
  monitor:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - uses: astral-sh/setup-uv@v3
      - run: uv sync --frozen
      - run: uv run closegate-engine seed --pack saas
      - run: uv run closegate-engine validate
      - run: uv run python -m eval.runner --dimension accuracy,policy,latency
      - run: uv run closegate-engine soc2-monitor --out soc2-monitor.json --fail-on-regression
        env:
          CLOSEGATE_SLACK_HOOK: ${{ secrets.CLOSEGATE_SLACK_HOOK }}
      - if: always()
        uses: actions/upload-artifact@v4
        with:
          name: soc2-monitor-${{ github.run_number }}
          path: |
            soc2-monitor.json
            evals/results/latest/summary.json
            evals/results/latest/report.md
          retention-days: 365

The CI workflow does three load-bearing things:

  1. Seed + validate before eval. Without a deterministic seed, the eval results aren’t reproducible. The CI workflow guarantees the same starting state every night.
  2. --fail-on-regression. If any deterministic dimension goes FAIL or MISSING, the CI job exits 2. CI status becomes the operating-effectiveness signal.
  3. 365-day artifact retention. Auditor walks in for the Type 2 engagement; you point at the GitHub Actions artifact history. Each artifact is the JSON + the eval evidence + the human-readable report.

The audit-evidence-export PBC bundle

For the Type 2 walkthrough itself, the audit-evidence-export produces the seven-file PBC bundle:

closegate-engine audit-evidence-export \
  --since 2026-01-01 --until 2026-06-30 \
  --out evidence-2026-H1.zip

Contents:

  1. audit-sample.csv — 25 random + 25 boundary events (above-materiality + sensitive-account)
  2. actors.json — full actor identity registry with first-seen and last-seen timestamps
  3. dead-letters.json — outbox dead-letter queue (should be empty in a healthy deployment)
  4. policy-versions.json — every policy.yaml commit hash + timestamp + author
  5. eval-runs.json — every nightly monitor run with pass/fail status per dimension
  6. sweeper-runs.json — every recovery-sweeper invocation + locks released
  7. README.md — auditor-facing index with the control mapping

The bundle is what the audit firm asks for in the “Provided By Client” (PBC) request list. closegate produces it from one CLI command; you don’t manually assemble it.

What’s NOT in closegate’s monitoring

Be honest about scope:

  • Operational practices — incident response procedures, change management approval flow, access-review cadence. These are your team’s processes; closegate is the technical substrate.
  • Auditor-firm relationship management — selecting the firm, scoping the engagement, negotiating the report. closegate produces evidence; the engagement is yours.
  • Type 2 attestation report itself — that’s the deliverable of the engagement, signed by your auditor. closegate gives you the underlying data; the report assertion is the auditor’s.

The full readiness map

What closegate handles → what your team handles:

CC criterionclosegate providesYour team provides
CC6.1 (logical access)OIDC + reverse-proxy auth backendsIdP configuration, access provisioning workflow
CC6.2 (authorization)Server-side SoD enforcement, tier routingRole definitions, approval workflows
CC6.3 (revocation)Actor identity tied to IdP tokenIdP-side termination procedures
CC4.1 (monitoring)4-dimension eval harness + nightly monitorAcknowledgment of alerts
CC4.2 (operating effectiveness)365-day artifact retention, reproducible evalRemediation evidence on regressions
CC7.2 (detection)Adversarial dimension + audit-log hash verifySOC monitoring, log forwarding to SIEM
CC7.3 (response)(incident-playbook template)Incident response, post-mortems

What this gives your audit engagement

  • 365 days of monitoring evidence on autopilot
  • Reproducible eval output that the auditor can re-run from a clean checkout
  • Seven-file PBC bundle in one CLI command
  • Tamper-evident audit log with verbatim policy clauses
  • A defensible answer to “how do you monitor effectiveness?” — “GitHub Actions runs the eval nightly; here’s the artifact history.”