SOC 2 Type 2 attestation is the procurement-required compliance signal for finance teams selling AI-touched services to enterprise customers. The Type 2 distinction matters: Type 1 says “the controls were designed properly at this moment”; Type 2 says “the controls were both designed properly AND operating effectively over a 6–12 month window.”
For AI agents specifically, the Trust Services Criterion that bites is CC4.2: ongoing monitoring of control effectiveness. This article walks how closegate produces the monitoring evidence on autopilot.
What CC4.2 actually requires
The exact language: “The entity selects, develops, and performs ongoing and/or separate evaluations to ascertain whether the components of internal control are present and functioning.”
In plain English: you can’t just attest that the controls exist; you have to show evidence they were working throughout the period.
For an AI agent deployment, that means:
- The policy gate was firing correctly — same-actor confirms denied, materiality routed correctly, sensitive accounts forced to HITL
- The audit log was tamper-evident — hash chain intact, no after-the-fact mutations
- The eval harness was running — matching accuracy, policy enforcement, latency all within thresholds
- The eval was reproducible — auditor can replay any run from a clean checkout
- Regressions were caught — when a dimension failed, someone was notified and remediated
The evidence has to be continuous (not just at audit time) and third-party-verifiable (not just self-attested).
How closegate’s monitor produces the evidence
The closegate-engine soc2-monitor CLI runs after the eval harness and produces a JSON artifact:
{
"generated_at": "2026-06-01T06:00:00Z",
"eval_run_id": "20260601-060000",
"deterministic_rows": [
{
"dimension": "matching_accuracy",
"status": "OK",
"headline": "macro-F1 1.0 on 83 cases",
"threshold_met": true
},
{
"dimension": "policy_enforcement",
"status": "OK",
"headline": "21 scenarios, pass-rate 1.0",
"threshold_met": true
},
{
"dimension": "latency",
"status": "OK",
"headline": "p95 9.0ms; ~7700 matches/sec",
"threshold_met": true
}
],
"overall_ok": true,
"notes": []
}
The artifact is idempotent (re-running on the same eval gives identical output) and deterministic (the eval dimensions tested don’t depend on LLM responses, so the eval itself is reproducible).
The CI workflow shape
.github/workflows/soc2-monitor-nightly.yml runs at 06:00 UTC daily:
on:
schedule:
- cron: "0 6 * * *"
workflow_dispatch: {}
pull_request:
paths:
- 'eval/**'
- 'packages/closegate_policy/**'
- 'packages/closegate_engine/src/closegate_engine/soc2.py'
- 'seed/**'
jobs:
monitor:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- uses: astral-sh/setup-uv@v3
- run: uv sync --frozen
- run: uv run closegate-engine seed --pack saas
- run: uv run closegate-engine validate
- run: uv run python -m eval.runner --dimension accuracy,policy,latency
- run: uv run closegate-engine soc2-monitor --out soc2-monitor.json --fail-on-regression
env:
CLOSEGATE_SLACK_HOOK: ${{ secrets.CLOSEGATE_SLACK_HOOK }}
- if: always()
uses: actions/upload-artifact@v4
with:
name: soc2-monitor-${{ github.run_number }}
path: |
soc2-monitor.json
evals/results/latest/summary.json
evals/results/latest/report.md
retention-days: 365
The CI workflow does three load-bearing things:
- Seed + validate before eval. Without a deterministic seed, the eval results aren’t reproducible. The CI workflow guarantees the same starting state every night.
--fail-on-regression. If any deterministic dimension goes FAIL or MISSING, the CI job exits 2. CI status becomes the operating-effectiveness signal.- 365-day artifact retention. Auditor walks in for the Type 2 engagement; you point at the GitHub Actions artifact history. Each artifact is the JSON + the eval evidence + the human-readable report.
The audit-evidence-export PBC bundle
For the Type 2 walkthrough itself, the audit-evidence-export produces the seven-file PBC bundle:
closegate-engine audit-evidence-export \
--since 2026-01-01 --until 2026-06-30 \
--out evidence-2026-H1.zip
Contents:
- audit-sample.csv — 25 random + 25 boundary events (above-materiality + sensitive-account)
- actors.json — full actor identity registry with first-seen and last-seen timestamps
- dead-letters.json — outbox dead-letter queue (should be empty in a healthy deployment)
- policy-versions.json — every
policy.yamlcommit hash + timestamp + author - eval-runs.json — every nightly monitor run with pass/fail status per dimension
- sweeper-runs.json — every recovery-sweeper invocation + locks released
- README.md — auditor-facing index with the control mapping
The bundle is what the audit firm asks for in the “Provided By Client” (PBC) request list. closegate produces it from one CLI command; you don’t manually assemble it.
What’s NOT in closegate’s monitoring
Be honest about scope:
- Operational practices — incident response procedures, change management approval flow, access-review cadence. These are your team’s processes; closegate is the technical substrate.
- Auditor-firm relationship management — selecting the firm, scoping the engagement, negotiating the report. closegate produces evidence; the engagement is yours.
- Type 2 attestation report itself — that’s the deliverable of the engagement, signed by your auditor. closegate gives you the underlying data; the report assertion is the auditor’s.
The full readiness map
What closegate handles → what your team handles:
| CC criterion | closegate provides | Your team provides |
|---|---|---|
| CC6.1 (logical access) | OIDC + reverse-proxy auth backends | IdP configuration, access provisioning workflow |
| CC6.2 (authorization) | Server-side SoD enforcement, tier routing | Role definitions, approval workflows |
| CC6.3 (revocation) | Actor identity tied to IdP token | IdP-side termination procedures |
| CC4.1 (monitoring) | 4-dimension eval harness + nightly monitor | Acknowledgment of alerts |
| CC4.2 (operating effectiveness) | 365-day artifact retention, reproducible eval | Remediation evidence on regressions |
| CC7.2 (detection) | Adversarial dimension + audit-log hash verify | SOC monitoring, log forwarding to SIEM |
| CC7.3 (response) | (incident-playbook template) | Incident response, post-mortems |
What this gives your audit engagement
- 365 days of monitoring evidence on autopilot
- Reproducible eval output that the auditor can re-run from a clean checkout
- Seven-file PBC bundle in one CLI command
- Tamper-evident audit log with verbatim policy clauses
- A defensible answer to “how do you monitor effectiveness?” — “GitHub Actions runs the eval nightly; here’s the artifact history.”