OPERATIONS · 2026-05-25

What does an AI agents services company actually deliver?

A specific, non-marketing list of what a managed AI agents services company is on the hook to produce in 2026 — and what is too often missing from the scope of work.

By Logitelia editorial team · 11 min read

One of the cleanest filters when evaluating an AI agents services company is to ask, in writing, exactly what you will receive. The answers cluster into two groups. Vendors who answer with a thin list ("a custom AI agent built for you, with ongoing support") are selling something closer to a freelance project. Vendors who answer with a long list ("a running workflow, evaluation suite, observability dashboard, named operator, exportable IP, monthly review, on-call coverage, defined SLA") are selling a real managed service. The cost gap between the two answers should also be obvious — and if it is not, the vendor in the first group is overpriced.

This article unpacks the long answer. It is the companion to our pillar guide on how to choose an AI agents services company.

The workflow itself

Obvious but worth being specific. A workflow is a definition of inputs, a sequence of agent steps, the tools the agent can call, the outputs it produces, and the place those outputs go. It lives somewhere you can inspect — a directed graph in the vendor's platform, a YAML file, a set of LangGraph nodes — not "in the head of the engineer who built it."

What the deliverable should specify: every input source (with sample payloads), every external tool the agent calls (with auth method and rate limits), every output destination (with schema), the success criteria for a single run, and the failure handling for each foreseeable error type. If any of these are missing, you have not been delivered a workflow — you have been delivered a script that happened to work the first time.

The evaluation suite

An eval suite is a set of test cases the agent must pass before any change ships to production. It is the difference between "we believe this works" and "we can show that it works." Without it, the vendor is making changes blind and so are you. See AI agent evaluations explained for the long version.

What the deliverable should specify: a versioned set of test cases (ideally 50–500 for a typical workflow), the ground-truth answers for each, the grading rubric (deterministic where possible, model-graded where not, human-graded for the small set that needs it), the pass-rate history over time, and the regression policy (what happens when a pass rate drops).

If the vendor cannot show you an eval report from another client (redacted), they do not have eval discipline. They will improvise it once you sign — possibly well, possibly not.

Observability and a portal you can look at

Every action the agent took, replayable. Not "available on request." A live activity feed with timestamps, inputs, outputs, tool calls, model used, latency, and cost per run. A search across runs by user, by status, by error type. The ability to click into a single run and see the full trace.

This is the most undervalued part of the deliverable and the part that most distinguishes good vendors from passable ones. Without it, you do not know what the agent did yesterday and the vendor does not either — they just have a better story about it.

A named senior operator

Not "a team." A specific person with a name, a calendar, an email address, and the authority to make decisions about your workflow. The operator reviews the agent's activity daily, handles exceptions, owns the relationship with you, runs the monthly review, and is the human accountable for the workflow's outcome.

What the deliverable should specify: who the operator is, what their seniority is, how many other accounts they hold (cap should be 4–8 for a boutique), what coverage looks like when they are on holiday, and the notification window when they rotate off the account.

Integrations that survive maintenance

Real connectors to your CRM, help desk, accounting system, document store, communication tools. With credential rotation, error handling, retry logic, version pinning, and a written notification path when an upstream API changes.

The bar here is "this will still be working in 18 months." A Zapier scenario that worked on the demo day will not clear that bar. Ask the vendor how they have handled an upstream API change in the past — a specific story, with a date, is the signal of operational maturity.

Exportable artefacts and IP assignment

Within 10 business days of termination: every prompt, every workflow graph, every eval, every integration code path, every log file. In a format that lets you rebuild on another vendor or in-house within 30 days. The contract should assign IP in this work product to you, not license it.

This is the deliverable buyers most often forget to ask for, and the one that determines whether you are a client or a hostage 18 months in. The bar to pass: you could replicate the workflow without the vendor's platform if you had to. If the workflow only runs because it lives inside the vendor's stack, you do not own the workflow.

Cadence: written reviews and live access

Daily review by the operator (internal, you do not see it but you should know it happens). Weekly written summary delivered to you with key metrics and any exceptions. Biweekly tuning session where the operator walks you through changes proposed and shipped. Monthly business review tying the workflow's KPIs back to the business outcome it serves.

Anything less than this and the workflow drifts and you find out late.

An SLA you can actually enforce

Vendors love to publish "99.9% uptime" as their SLA. That is mostly meaningless for an AI agent workflow because the right uptime metric is at the workflow level, not the infrastructure level. The SLA you want covers: first-response time on exceptions (e.g. 4 hours during business hours), throughput floors (e.g. minimum 100 transactions/day during normal operation), and escalation path with named individuals. With remedies that are real — credit against the next invoice if the SLA is missed for two consecutive months, termination right if missed for three.

Security and compliance posture

SOC 2 Type II at maturity. For younger vendors: written security policy, annual penetration test report, named DPO or equivalent, signed DPA, named sub-processors, defined breach notification window (72 hours is standard). EU data residency if you are in the EU — see AI agents data residency in the EU and AI agents and GDPR compliance.

Documentation and runbooks

A real managed AI service delivers documentation that lets a successor team (yours, or another vendor) understand the workflow without inheriting tribal knowledge. At minimum: an architecture diagram showing the agent's inputs, decision points, tool calls, and outputs; a written description of every prompt and why it is structured the way it is; a runbook for the three to five most common operational issues with the workflow; and a changelog of every change shipped, with reasoning.

This is the deliverable buyers undervalue at signing and overvalue at exit. The right time to negotiate the documentation deliverable is now, before the vendor decides whether to invest in writing it.

A monthly business review document

Not a screenshot of the dashboard. A written narrative covering: what the workflow did this month against the KPIs, what changed in the agent's behaviour and why, what failed and what the operator did about it, what is proposed for next month, and an honest assessment of where the workflow is degrading or improving. 2–4 pages, plus an appendix with the numbers.

The quality of this document over time is the single best leading indicator of the relationship's health. When the monthly review starts looking like a template, the engagement is winding down — usually before either side admits it.

What is NOT included (and that's fine)

A real managed AI service does not include: building new workflows outside the agreed scope (that is a separate engagement), training your team to do the work themselves (that is consulting), being the internal sponsor for the project (that is your COO's job), or owning outcomes the agent does not influence (an SDR agent does not own pipeline if your sales team is broken).

Vendors who include all of these in scope are over-promising. Vendors who exclude all of them are under-scoping. The middle is where competent services companies live.

Where Logitelia fits

Logitelia's managed AI teams deliver each item in this list as the standard scope of work. If you want to read the full buyer's framework, the pillar is How to choose an AI agents services company in 2026. If you want to test the fit on a specific workflow, book an intro call and we will walk through the scope of work in 30 minutes.

Want to see what a real managed AI agent scope of work looks like for your team? We will share an annotated example in a 30-minute call.

Book intro call