EssayInsights

The AI consulting playbook: how enterprise AI programmes actually ship

A four-phase method for AI consulting that ships and stays shipped. Ethnography, architecture, build, adoption — with the named decision points, citable metrics, and operating cadences that distinguish programmes which compound from those which revert.

Vaibhav·08 May 2026·8 min read

The AI consulting playbook: how enterprise AI programmes actually ship

TL;DR — Enterprise AI consulting that ships has four phases: ethnography (one week sitting next to the operator before any architecture is scoped), architecture (six weeks of decision-mapping that determines whether the system compounds), build (four to nine months of partner-led delivery on the architects' line), and adoption (six to twelve months past handover, instrumented by the percentage of decisions flowing through the new surface). Programmes that skip any phase — and most do — show predictable failure modes. The phases are sequential; the calendar is unforgiving.

AI consulting has, in the last three years, expanded into the largest professional-services category outside of audit. Almost everyone now sells it. Far fewer can show a multi-year programme that is still in production and still being used by the operators it was built for. The gap between the marketing surface and the operating reality is wide; this playbook is what closes it.

It is the method we use across every engagement, from a six-week strategy diagnostic to a multi-quarter transformation. Names of phases vary; the substance does not.

Phase 1 — Ethnography

Length: one week, full-time, no laptop open. Outcome: a list of workarounds, not a deck.

Every enterprise AI engagement begins with a senior operator from the consulting team sitting next to a senior operator from the client team for a full week. The brief is to understand the work the way the operator understands it — including the parts they would not put in a deck. The 4 a.m. shift that does things differently than the day team. The workarounds the team has built up over five years and stopped naming. The systems of record that disagree, and the spreadsheet that reconciles them.

The artefact produced at the end of the week is not a strategy. It is a list of workarounds, written in the operator's language, with the implicit business rules each one encodes. That list determines everything that follows. Architectures that ignore the workaround list ship and revert; architectures that absorb it ship and compound.

The most common reason this phase gets skipped is that it does not feel like consulting work. There is no model selected, no architecture sketched, no roadmap drafted. The operator is sometimes uncomfortable with someone watching them work. Both reactions are correct in tone and wrong in conclusion: the absence of consulting work is exactly the point, and the discomfort is exactly the signal that the consulting team is finally close enough to the work to do it well.

Phase 2 — Architecture

Length: four to six weeks. Outcome: decisions, not slides.

Architecture in AI consulting is decision-mapping, not technology selection. The technology choices are downstream consequences of three architectural decisions made early:

Where does the AI surface sit in the decision flow? Is it making the decision, surfacing the decision to an operator who confirms, or auditing a decision the operator already made? Each of the three has different latency, governance, and adoption properties; choosing wrong is irrecoverable later.
What is the schema contract with upstream systems of record? Which systems own which fields. How schema changes flow. How quickly. Who is in the change-review loop. This contract gets signed in week three of architecture, not month six of build.
What is the adoption metric, named and instrumented? Not model accuracy. The percentage of decisions flowing through the new surface, measured weekly, reported to the executive sponsor every Friday. The metric is fixed in week one of architecture; the instrumentation is built in week two; the first reading is taken from the existing operating cadence as a baseline.

Programmes that resolve these three decisions in architecture move smoothly through build. Programmes that defer them — and most do, because each is uncomfortable — discover them again at month four of build, after the architecture has hardened, and the cost of changing them has compounded.

Phase 3 — Build

Length: four to nine months. Outcome: a system in production, instrumented for adoption.

Build is where the architects' line earns its keep — the senior engineers who held the same problem two clients ago, working on the deploy log every day, not reviewing pull requests once a week. The unglamorous work of build is the work that determines whether the system survives the first three quarters in production:

Drift telemetry. Every model in production needs daily metrics on input distribution drift, output distribution drift, and prediction-versus-actual divergence. Without these, the model degrades silently, the operator notices before the data team does, trust collapses, and the engagement reverts.
Adoption instrumentation. Every decision the new surface processes is logged with the operator's context, the alternative the operator could have taken, and the latency of the operator's confirmation or override. This data is what the Friday metric is built from.
Operator-side debugging. When the operator does not trust an answer, they need a way to see why the model gave it. Not "explainability" in the marketing sense; just enough trace data that a senior operator can decide whether to override. This is a build cost, not a "future enhancement".
Rollback path. Every AI surface in build needs a feature flag and a one-click revert to the prior workflow. Not as a courtesy — as a load-bearing piece of the trust contract with the operator. We use this in the first six weeks of every deployment; programmes that have not built it discover the need at exactly the wrong moment.

Build phases that produce the four artefacts above ship and compound. Build phases that defer any of them ship and degrade. The instinct to defer is strong — each item costs build time, and the upside is invisible until something breaks. We have learned to be unembarrassed about pricing them in.

Phase 4 — Adoption

Length: six to twelve months past handover. Outcome: the in-house team shipping faster than the consulting team did.

Adoption is the phase most consulting models ignore. It is also the phase that determines whether the eighteen-month outcome is "still in production" or "quietly reverted". Three commitments distinguish the programmes that compound:

A named enablement lead on the client side. Not the executive sponsor. Someone whose primary job for six months is to absorb the operator questions, defend the surface internally, and push back on the consulting team when something is not working. If the client cannot name this person on day one of build, the programme is unfunded for adoption regardless of what the budget says.

Weekly adoption telemetry, walked. The Friday metric is not just emailed — a senior partner walks through it with the executive sponsor every two weeks for the first three months past launch, then monthly, then quarterly. The walking is what catches drift early. Most programmes that revert at month nine had visible warning signs at month four that nobody walked.

Quarterly reviews to month eighteen. A senior partner from the consulting team reads the operating cadence, the adoption telemetry, and the engineering velocity numbers every quarter for eighteen months. The output is a one-page note to the executive sponsor on what is drifting and what to do about it. This is the work that is hardest to underwrite at large-firm rates and easiest at boutique rates — it is the structural advantage of the boutique consulting model on AI specifically.

Where this method comes from

This playbook is drawn from twelve years of programmes shipped across the United States, the United Kingdom, Western Europe, and India — for enterprises in finance, health, retail operations, and AI-native software. Roughly 180 programmes total. Two hundred operators currently across three studios. The method has evolved; the four phases have not. They are the structure that survived contact with every shape of work we have run.

It is also a deliberately partial method. There are programmes for which a different shape is correct — a global rollout of an established system, a procurement-led engagement, a specific compliance migration. The four phases above are the method for genuine AI consulting work that has to ship and stay shipped. They are not a universal consulting framework, and we do not pretend they are.

FAQ — AI consulting

What does AI consulting actually involve?

Genuine AI consulting work covers four phases: a one-week ethnography to understand how decisions actually flow through the operating reality; four to six weeks of architecture that resolves the surface's position in the decision flow, the schema contract with upstream systems of record, and the named adoption metric; four to nine months of build with drift telemetry, adoption instrumentation, operator-side debugging, and a tested rollback path; and six to twelve months of adoption past handover with weekly metrics, quarterly reviews, and a named enablement lead on the client side. Programmes that skip any phase show predictable failure modes.

How long does an AI consulting engagement take?

A real one runs 12 to 24 months end to end. The first 8 to 12 weeks are diagnostic and architecture. Build runs 4 to 9 months. Adoption runs alongside build and continues for 6 to 12 months past launch. Engagements scoped at less than 8 weeks tend to be diagnostic-only; engagements scoped at less than 6 months tend to be feature builds rather than transformations.

What does enterprise AI consulting cost?

For mid-market enterprises in the US, UK, and Western Europe, fees for a senior-led AI programme run from $400k to $2.5M for the consulting work itself, plus the change cost — typically 30 to 50 percent of the consulting fee, covering enablement, executive air cover, and the in-house operator hours absorbed by adoption. Platform and model costs sit on top. Variance is driven by the depth of the embed and the number of operating workflows changed, not by the model technology.

How do I choose between an AI consulting firm and an in-house team?

Three signals favour bringing in a consulting firm: the work spans more than one operating workflow, the architecture decisions need someone who has seen them go wrong before, and the executive sponsor needs an external party to model the change cost honestly without the conflict of interest of an in-house team. Three signals favour staying in-house: the work is well-scoped within one team, the team has shipped two prior systems together, and the architectural decisions are reasonably reversible. Most enterprise AI work has at least one of the first three signals; that is why the consulting market exists at the scale it does.

Where this method lives: data, AI & automation practice · technology strategy practice · who we are · case studies that have shipped to this bar.

Written by

Vaibhav

Studio operator

Continue the reading list

Bring us the work that needs the reading list to be true.

Send us the brief ← Back to all dispatches