All insights
INSIGHT — STRATEGY

Generative AI in the enterprise: what actually moves

A field view from regulated-industry programs. Where the value sits, where the failure modes are, and what the next twelve months look like for organizations that have to ship.

What we ship into
DocumentsWorkflowsEvalShipDocumentsWorkflowsEvalShip
SECTION / 01

The pattern across programs

Most enterprises we work with have moved past the demo phase. The conversations are about deployment, evaluation, and operating discipline, not about whether AI is real.

What survives the deployment is rarely the most exciting part of the original brief. The wins compound around document-heavy workflows, evaluation tooling, and operator-grade interfaces. The losses tend to be projects where the model was the focus and the system around it was an afterthought.

SECTION / 02

Where the value sits

Document-heavy workflows are still where most measurable wins land. Risk reviews, regulatory drafting, claims documentation, research synthesis. The reason is straightforward: the existing process is text in, text out, and the value of getting it slightly wrong is bounded.

Operator augmentation is the second cluster. Investigators, analysts, dispatchers, customer-service agents. The system raises the floor, but a person still owns the call.

Evaluation tooling is the unsung win. Teams that build a real eval harness ship faster, fix faster, and survive auditor scrutiny. Teams that don't, don't.

SECTION / 03

Where the failure modes show up

We see three recurring failure modes. First, treating retrieval as solved when retrieval quality is the bottleneck. Second, deferring evaluation until launch, by which point regression is invisible. Third, hand-off models that don't survive the operating team — the system technically works, but nobody can change it.

These are operational failures, not model failures. They show up regardless of vendor or model choice.

SECTION / 04

The next twelve months

We expect three shifts. Agent systems mature beyond demoware in a handful of narrow domains. Evaluation and observability tooling becomes table stakes for regulated industries. Operator-grade UI emerges as a serious investment area, distinct from end-customer interfaces.

None of this requires new model capability. It requires the operational discipline to ship.

Field view
DocumentsWorkflowsEvalShipDocumentsWorkflowsEvalShip

Want our writing in your inbox?

Send a note. We add you to a short list that gets a write-up when there is something substantive to share.