Agent demos look like the future. Production agents look like brittle plumbing with extra steps. A signed position on what to deploy now, what to wait on, and what to ignore.
The position
Most enterprise agent demos are theatre. They run on hand-picked tasks, well-shaped data, and forgiving evaluation criteria. The agent moves through a five-step task with apparent autonomy. The audience nods. The next slide promises that this pattern will replace a department within twelve months. The pattern does not replace the department. It replaces a contractor who used to handle one specific task that has now been redefined to fit the demo.
Production agents look different. They are simpler, slower, and more boring than the demos suggested. They do one named thing in a deterministic loop. They have explicit handoff points where humans review the work. They log everything they did and why they did it. They are evaluated continuously against goldens that the team has built from real failures. That is the point. Boring agents are the ones that survive contact with the operating system. The fashion-forward agents that demoed beautifully are quietly retired within six months of going live, replaced by simpler architectures that the operations team can actually keep running.
What is real today
Single-purpose agents with bounded tools are real. Drafting, summarizing, routing, classifying, extracting are all jobs where a focused agent with three or four tools does excellent work, reliably, with audit trails the compliance team can read. The pattern is not glamorous. The economics are excellent.
Retrieval-supported drafting is real. An agent that composes an answer from a clean knowledge base, with citations the user can verify, produces output that is faster than human-from-scratch and more accurate than the unfounded alternative. The work that makes this real is upstream, in the knowledge base, not in the agent.
Workflow agents with explicit handoffs are real. Multi-step processes where each step is owned by a named agent, schema-typed messages flow between them, and a terminating agent produces the user-facing output. The architecture looks like a small distributed system. It runs like one. It is observable, debuggable, and operable in production.
What is theatre
Open-ended agents demoed on staged data are theatre. The demo shows the agent handling a forty-step task without intervention. The footnote is that the demo was run twenty times to capture the one good run. Production reality is the other nineteen runs, which the demo did not show.
Multi-agent crews with no named owner per role are theatre. The crew is presented as a collaboration of specialists. In production, the lack of role boundaries produces a trace that no one can debug after twenty hops. When something goes wrong, no one owns the failure.
Autonomous decision-making in regulated workflows without human override is theatre dressed as innovation. The agent decides. The customer is denied. The compliance team is then told that the agent's decision is final. This pattern does not survive its first incident, and the incident is usually expensive.
Agent demos that do not show the evaluation rubric are theatre. If the team cannot tell you how they measure whether the agent did well, the team is measuring whether the audience clapped.
What to do this quarter
Ship three bounded agents in production. Measure cycle time and override rate honestly. Refuse to deploy any agent in regulated workflows without a documented override path that includes named humans. Build the shared platform of identity, retrieval, and evaluation that lets the boring agents compound across use cases. Treat the agent demos coming in from vendors with the skepticism they have earned.
Closing
The agents that matter in 2026 will not look like the 2024 demos that defined the category. They will look like quiet, well-governed software that does specific work well, runs without drama, and produces audit trails the operations team can read. That is the future worth building toward. The theatre will continue, and it will continue to disappoint the buyers who paid for the demo rather than the work.