Harness engineering is the discipline of designing the complete control system that wraps around an AI agent to make it reliable. The model reasons. The harness does everything else: context, constraints, feedback loops, validation, and iterative refinement. In early 2026, the formula Agent = Model + Harness became the defining insight of the agentic AI conversation, with OpenAI, Martin Fowler and Thoughtworks, LangChain, and an academic paper by Kim and Hwang all publishing frameworks within weeks of each other. The shared insight is that the model is not the bottleneck. The environment around the model determines whether it produces reliable work.
This article defines harness engineering, explains its three dimensions, and shows how they map to organisational governance for all AI agents that fill roles, not just those that write code. For the full picture, including how to build organisational context, connect through MCP, and see the difference in a concrete scenario, read the longer piece: Context Engineering as a Harness for AI Agents.
The term was crystallised by Mitchell Hashimoto (creator of Terraform) in early 2026. His principle: "Every time you discover an agent has made a mistake, you take the time to engineer a solution so that it can never make that mistake again." Within weeks, Ryan Lopopolo at OpenAI published a field report describing how a small engineering team used Codex agents to build a product with over one million lines of code and zero manually written source code. The discipline that made this possible was not better prompting. It was designing the harness: the scaffolding, feedback loops, and control systems that kept the codebase coherent.
Martin Fowler and Thoughtworks published a detailed commentary introducing a guides-and-sensors taxonomy. LangChain formalised the architecture with a comprehensive breakdown, noting that changing the harness while keeping the model the same can move an agent from average to top-tier performance.
The evolution is clear. Prompt engineering (2022 to 2024) focused on crafting the right instruction. Context engineering (2025) focused on building the system that delivers the right information surrounding that instruction. Harness engineering (2026) encompasses both and adds constraint enforcement, feedback loops, and iterative refinement. Each step widened the lens from the instruction, to the information environment, to the complete control system.
The academic formalisation came from Kim and Hwang in their SSRN paper, which organised harness engineering along three dimensions.
Context is the declarative and procedural knowledge that informs the agent. In a coding harness, this means project instructions, codebase maps, and AGENTS.md files. OpenAI found that giving agents "a map, not a 1,000-page instruction manual" was essential.
Constraint is the set of rules that govern agent output. In a coding harness, this means linters, test suites, type checkers, and architectural boundary rules. Constraints prevent bad patterns deterministically.
Convergence is the iterative process by which constraints are refined until the harness reaches what Kim and Hwang call "structural idempotence": a state where re-application produces no further structural change. In a coding harness, this means cleanup loops and CI/CD cycles that iterate until the system passes without changes.
The frameworks published so far apply primarily to software engineering. But organisations are not just deploying coding agents. They are deploying agents that fill Customer Success roles, conduct research, prepare meetings, process governance proposals, and coordinate content pipelines. These agents do not produce code. They produce decisions, communications, updates, and actions within an organisational context.
The three-dimensional model transfers remarkably well. What changes is what fills each dimension.
The mapping is structural, not metaphorical. A type checker enforces type safety by preventing invalid operations at compile time. A domain definition prevents an agent from crossing an authority boundary at runtime. Both are structural constraints. A CI/CD pipeline iterates until the build passes. The governance process iterates until no one has a reasoned objection. Both are convergence mechanisms.
"Structural idempotence" in organisational terms describes a role definition so clear that re-running the governance process produces no further changes. That is a well-governed role.
It means the harness your AI agents need already exists, if your organisation has the structural clarity to provide it. Role-based governance, the kind that practitioners of Holacracy, Sociocracy, and other self-organisation frameworks have been refining for decades, provides all three dimensions of the harness natively.
Platforms like Nestr that make this governance structure explicit, living, and accessible through MCP (Model Context Protocol) effectively serve as the harness for organisational AI agents. The context layer (nested purpose, skills, project state) informs the agent. The constraint layer (domains, policies, accountabilities) governs what the agent can do. The convergence layer (the consent-based governance process and its timestamped history) captures how the organisation has learned and adapted.
The critical advantage over a coding harness is that this harness is self-updating. When a policy changes through a governance meeting or async proposal, the agent reads the updated policy on its next activation. No one needs to update a system prompt. The harness keeps itself current because the governance structure and the context source are the same system.
For the full depth on how organisational context works in practice, including a detailed customer success scenario, the nested context pattern, why orchestrator agents underperform role-level agents, and a practical guide to getting started, read Context Engineering as a Harness for AI Agents: Why Organisational Structure Is the Missing Layer.
Harness engineering is the discipline of designing the systems, constraints, and feedback loops that wrap around AI agents to make them reliable. The foundational formula, Agent = Model + Harness, was formalised in early 2026 through publications by OpenAI, LangChain, Martin Fowler, and an academic paper by Kim and Hwang.
Prompt engineering focuses on the instruction. Context engineering focuses on the information environment. Harness engineering encompasses both and adds constraint enforcement and iterative refinement. Each discipline widened the lens from instruction, to information, to the complete control system.
Because the harness determines whether an agent produces reliable work, not the model. The same model with a task-level harness (system prompt only) produces fundamentally different output than the same model with an organisational harness (nested purpose, domains, policies, governance history). The full article demonstrates this with a concrete customer-support scenario.
No. Explicit roles, clear authority boundaries, living policies, structured governance, and shared organisational context do not require any specific framework. Holacracy and Sociocracy provide ready-made patterns, but the concepts stand on their own merit. The AI Agent Governance article covers the practical requirements.
Nestr makes governance structure explicit, living, and accessible through MCP. Roles, circles, purposes, accountabilities, domains, policies, skills, projects, and governance records are all structured and queryable by AI agents. When a governance change is adopted (in a meeting or asynchronously), the agent reads the updated context on its next activation. The MCP setup guide covers the full process.
The EU AI Act requires documented governance, continuous risk management, human oversight, and traceability for high-risk AI systems. An organisational harness generates compliance evidence as a natural byproduct of working, the same way a coding harness generates traceability through test logs and CI/CD pipelines.
OpenAI, Harness Engineering: Leveraging Codex in an Agent-First World (February 2026)
Martin Fowler, Harness Engineering for Coding Agent Users (2026)