What's the difference between SOC 2 Type 1 and Type 2?

Type 1 is a point-in-time attestation: 'at the time of the audit, the controls were designed appropriately.' Type 2 is operating-effectiveness over a window (typically 6 or 12 months): 'the controls were both designed appropriately AND operating effectively throughout the period.' Type 2 is what enterprise buyers actually care about; Type 1 is the starter.

What does CC4.2 specifically require?

CC4.2: 'The entity selects, develops, and performs ongoing and/or separate evaluations to ascertain whether the components of internal control are present and functioning.' Translation: you need ongoing monitoring evidence that your controls were actually working throughout the attestation period. For AI agents, that's eval runs + audit-log integrity + actor-identity sanity.

Can I use closegate's monitoring evidence in my own SOC 2 audit?

Yes, with caveats. closegate produces the JSON artifact + the audit-evidence-export PBC bundle; your audit engagement still needs to (a) validate that your deployment matches the controls closegate claims, and (b) attest that operational practices around the system are also effective (incident response, access reviews, etc.). closegate handles the technical evidence; your SOC 2 engagement handles the operational evidence.

Deep dive

SOC 2 Type 2 monitoring for AI agents in production

SOC 2 Type 2 requires ongoing operating-effectiveness evidence (CC4.2). Here's how to get AI-agent monitoring evidence on autopilot — reproducible JSON, 365-day CI retention.

Dipankar Sarkar June 20, 2026 4 min read

SOC 2 Type 2 monitoring AI agents compliance

SOC 2 Type 2 attestation is the procurement-required compliance signal for finance teams selling AI-touched services to enterprise customers. The Type 2 distinction matters: Type 1 says “the controls were designed properly at this moment”; Type 2 says “the controls were both designed properly AND operating effectively over a 6–12 month window.”

For AI agents specifically, the Trust Services Criterion that bites is CC4.2: ongoing monitoring of control effectiveness. This article walks how closegate produces the monitoring evidence on autopilot.

What CC4.2 actually requires

The exact language: “The entity selects, develops, and performs ongoing and/or separate evaluations to ascertain whether the components of internal control are present and functioning.”

In plain English: you can’t just attest that the controls exist; you have to show evidence they were working throughout the period.

For an AI agent deployment, that means:

The policy gate was firing correctly — same-actor confirms denied, materiality routed correctly, sensitive accounts forced to HITL
The audit log was tamper-evident — hash chain intact, no after-the-fact mutations
The eval harness was running — matching accuracy, policy enforcement, latency all within thresholds
The eval was reproducible — auditor can replay any run from a clean checkout
Regressions were caught — when a dimension failed, someone was notified and remediated

The evidence has to be continuous (not just at audit time) and third-party-verifiable (not just self-attested).

How closegate’s monitor produces the evidence

The closegate-engine soc2-monitor CLI runs after the eval harness and produces a JSON artifact:

{
  "generated_at": "2026-06-01T06:00:00Z",
  "eval_run_id": "20260601-060000",
  "deterministic_rows": [
    {
      "dimension": "matching_accuracy",
      "status": "OK",
      "headline": "macro-F1 1.0 on 83 cases",
      "threshold_met": true
    },
    {
      "dimension": "policy_enforcement",
      "status": "OK",
      "headline": "21 scenarios, pass-rate 1.0",
      "threshold_met": true
    },
    {
      "dimension": "latency",
      "status": "OK",
      "headline": "p95 9.0ms; ~7700 matches/sec",
      "threshold_met": true
    }
  ],
  "overall_ok": true,
  "notes": []
}

The artifact is idempotent (re-running on the same eval gives identical output) and deterministic (the eval dimensions tested don’t depend on LLM responses, so the eval itself is reproducible).

The CI workflow shape

.github/workflows/soc2-monitor-nightly.yml runs at 06:00 UTC daily:

on:
  schedule:
    - cron: "0 6 * * *"
  workflow_dispatch: {}
  pull_request:
    paths:
      - 'eval/**'
      - 'packages/closegate_policy/**'
      - 'packages/closegate_engine/src/closegate_engine/soc2.py'
      - 'seed/**'

jobs:
  monitor:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - uses: astral-sh/setup-uv@v3
      - run: uv sync --frozen
      - run: uv run closegate-engine seed --pack saas
      - run: uv run closegate-engine validate
      - run: uv run python -m eval.runner --dimension accuracy,policy,latency
      - run: uv run closegate-engine soc2-monitor --out soc2-monitor.json --fail-on-regression
        env:
          CLOSEGATE_SLACK_HOOK: ${{ secrets.CLOSEGATE_SLACK_HOOK }}
      - if: always()
        uses: actions/upload-artifact@v4
        with:
          name: soc2-monitor-${{ github.run_number }}
          path: |
            soc2-monitor.json
            evals/results/latest/summary.json
            evals/results/latest/report.md
          retention-days: 365

The CI workflow does three load-bearing things:

Seed + validate before eval. Without a deterministic seed, the eval results aren’t reproducible. The CI workflow guarantees the same starting state every night.
--fail-on-regression. If any deterministic dimension goes FAIL or MISSING, the CI job exits 2. CI status becomes the operating-effectiveness signal.
365-day artifact retention. Auditor walks in for the Type 2 engagement; you point at the GitHub Actions artifact history. Each artifact is the JSON + the eval evidence + the human-readable report.

The audit-evidence-export PBC bundle

For the Type 2 walkthrough itself, the audit-evidence-export produces the seven-file PBC bundle:

closegate-engine audit-evidence-export \
  --since 2026-01-01 --until 2026-06-30 \
  --out evidence-2026-H1.zip

Contents:

audit-sample.csv — 25 random + 25 boundary events (above-materiality + sensitive-account)
actors.json — full actor identity registry with first-seen and last-seen timestamps
dead-letters.json — outbox dead-letter queue (should be empty in a healthy deployment)
policy-versions.json — every policy.yaml commit hash + timestamp + author
eval-runs.json — every nightly monitor run with pass/fail status per dimension
sweeper-runs.json — every recovery-sweeper invocation + locks released
README.md — auditor-facing index with the control mapping

The bundle is what the audit firm asks for in the “Provided By Client” (PBC) request list. closegate produces it from one CLI command; you don’t manually assemble it.

What’s NOT in closegate’s monitoring

Be honest about scope:

Operational practices — incident response procedures, change management approval flow, access-review cadence. These are your team’s processes; closegate is the technical substrate.
Auditor-firm relationship management — selecting the firm, scoping the engagement, negotiating the report. closegate produces evidence; the engagement is yours.
Type 2 attestation report itself — that’s the deliverable of the engagement, signed by your auditor. closegate gives you the underlying data; the report assertion is the auditor’s.

The full readiness map

What closegate handles → what your team handles:

CC criterion	closegate provides	Your team provides
CC6.1 (logical access)	OIDC + reverse-proxy auth backends	IdP configuration, access provisioning workflow
CC6.2 (authorization)	Server-side SoD enforcement, tier routing	Role definitions, approval workflows
CC6.3 (revocation)	Actor identity tied to IdP token	IdP-side termination procedures
CC4.1 (monitoring)	4-dimension eval harness + nightly monitor	Acknowledgment of alerts
CC4.2 (operating effectiveness)	365-day artifact retention, reproducible eval	Remediation evidence on regressions
CC7.2 (detection)	Adversarial dimension + audit-log hash verify	SOC monitoring, log forwarding to SIEM
CC7.3 (response)	(incident-playbook template)	Incident response, post-mortems

What this gives your audit engagement

365 days of monitoring evidence on autopilot
Reproducible eval output that the auditor can re-run from a clean checkout
Seven-file PBC bundle in one CLI command
Tamper-evident audit log with verbatim policy clauses
A defensible answer to “how do you monitor effectiveness?” — “GitHub Actions runs the eval nightly; here’s the artifact history.”