A working playbook for evaluation, red-teaming, and governance that survives a real auditor walking the floor.
Most enterprise AI governance is documentation that nobody operates against. The principles document is downloaded once, the policy is approved, and the operating teams keep doing what they were doing.
What actually changes outcomes is operational practice — evaluation cadence, red-team protocol, change review, incident process — built into how the team runs.
An evaluation harness that runs on a schedule, with a named owner and a written process for handling regressions. The gap between this and the average enterprise practice is large.
A change-review process that distinguishes model upgrades from infrastructure changes from data changes, with different review thresholds for each.
A red-team protocol that runs at least quarterly, with a written report and named follow-ups.
An incident process that captures AI-specific failure modes — drift, prompt injection, retrieval failure — alongside the standard production incident categories.
Reproducibility. Given the same inputs, can the system produce the same outputs, and is that traceable? For systems that are deliberately stochastic, can the team explain the bound on the variance?
Lineage. For a given output, can the team trace back to the prompt, retrieval set, model version, and configuration that produced it?
Change record. When the model was upgraded, who approved it, and what evaluation evidence supported the call?
Most teams have the data. Few have it organized in a way that survives a one-hour walk-through with a regulator.
The cost is operational discipline, not tooling spend. The teams we work with that have working AI governance are not the ones that bought the most platforms. They are the ones that wrote the eval harness, run the rotation, and own the call.
It is cheaper than the alternative. A regulated firm that has to pause an AI program in the middle of an audit pays orders of magnitude more than the team that ran evals on a schedule.
Send a note. We add you to a short list that gets a write-up when there is something substantive to share.