---
title: "The Process Model"
source: "https://docs.vertesiahq.com/processes/model"
markdown: "https://docs.vertesiahq.com/llms/processes/model.md"
---

# The Process Model

This page is about **how to think** about a process, not how to use the API. If you read only one page in this section before writing your first process, read this one.

## The thesis

A business process is a sequence of decisions and actions. Some of those decisions are deterministic ("if value > 50K, route to legal"). Some are probabilistic ("extract the parties from this contract"). Traditional tools force you to pick a side:

- A **workflow engine** forces deterministic code. LLM output is a string you parse at your own risk.
- A **chat agent** forces probabilistic reasoning. The same input gives different routes depending on mood.

The Vertesia process engine is designed to sit at the seam. The engine owns the deterministic parts — transitions, guards, routing, validation. The agents own the probabilistic parts — reading a document, judging risk, drafting output — *bounded by a schema the engine enforces*.

This is the whole point. Everything else — `result_schema`, `writes`, `_next_node`, guards, condition nodes — follows from this separation.

As the native format evolves, the control-flow vocabulary is settling into three distinct ideas:

- `condition` for choosing one path
- `branch` for fixed split/join fanout
- `foreach` for collection iteration

That split matters because BPMN structured parallelism maps to `branch`, while multi-instance fanout maps to `foreach`.

Persisted native definitions are also explicitly versioned with `format_version: 1`. That field is part of the process-definition contract, not incidental editor metadata.

## Three axes: control, state, behavior

A process definition cleanly separates three concerns:

| Concern | Where it lives | Who owns it |
| --- | --- | --- |
| **Control flow** | `transitions`, `branches`, `guards` | The engine (deterministic) |
| **State** | `context` (typed by `context.schema`) | The engine, mutated through `node.writes` |
| **Behavior** | Node bodies — prompts, tools, interactions, tasks | Agents, interactions, humans, or deterministic code |
| **Presentation** | `metadata.phase`, `metadata.lane`, `metadata.order`, transition labels | Authors, for observability |

A node never decides unilaterally. A tool node fills a fixed value. An interaction returns a schema-shaped object. An agent returns a schema-shaped object and optionally picks from declared transitions. A human task writes declared fields and enables guarded transitions. In every case, the engine validates, applies writes, evaluates guards, and picks the next node.

If you catch yourself writing a prompt that says "then decide whether to do X or Y" — stop. Lift that decision into a transition guard, and make the node return the signal the guard needs.

Presentation metadata is not part of runtime correctness. The workflow can execute without it. It matters because real business processes get wide quickly: approval gates, revision loops, escalation paths, and finalization branches become hard to read as a raw graph. Set `metadata.phase`, `metadata.lane`, and `metadata.order` so the run UI can group the process into navigable stages. Add transition or branch `label` values when a guard deserves a business name.

## Writes scope is a contract

`node.writes` isn't a nit; it's *the* contract between a node and the rest of the process. A node that emits a non-empty context update must declare the fields it intends to write. Missing `writes` is treated as "this node writes nothing," not "this node may write anything."

For an agent node, the engine:

1. Builds a `result_schema` from `node.writes` intersected with `context.schema.properties`.
2. Hands that schema to the child conversation so the LLM is physically constrained to emit those fields (and only those fields).
3. Validates the returned object against both the schema and the writes list before applying anything.

Consequences:

- **Agent impact is bounded.** A misbehaving agent can't write `total_value` on a node that only declares `legal_decision`. The engine refuses.
- **Routing is trustworthy.** Guards can key off `total_value` without worrying whether some earlier node silently clobbered it.
- **Debugging is crisp.** The per-node context diff in the run inspector shows exactly what each node changed. Nothing else.

Keep writes as tight as possible. Three named fields is better than a single blob.

## Why the engine owns routing

Agents are non-deterministic by design. For a chat, that's a feature. For a business process that decides whether a contract is reviewed by a human, it's a liability.

So routing lives in the definition:

- **`auto`** transitions with JSON Logic guards — the engine picks based on context.
- **`agent`** transitions with a `_next_node` enum in the result schema — the agent picks, but only from the declared set, and only with a value the schema accepts.
- **`user`** transitions — a human signal drives the move, through the Task Inbox or the Advance button.
- **`condition`** nodes — pure routing, no behavior, required `default: true` fallback.

What you *don't* do is let an agent call a `transition_to` tool mid-thought. That tool exists (see below) but only in supervised mode, and only for the top-level orchestrator.

## Durability and human time

Because a process runs as a Temporal workflow, every checkpoint is a resumable point. Concretely:

- An agent crash retries the node. The run doesn't restart.
- A human task can wait days. The workflow parks, the cluster can redeploy, and the signal still fires when the answer arrives.
- A worker redeploy mid-run resumes from the last checkpoint. Context is preserved.

This is load-bearing for real processes. Contract review, compliance checks, fund operations, content pipelines — all have steps that genuinely take time or block on people. The engine absorbs that naturally.

## Versioning

Process definitions are versioned. Each `create_process_definition` or edit bumps `version`. When a run starts, the engine snapshots the current definition into the run (`process_definition_snapshot`). The run walks that snapshot; editing the published definition afterward does **not** affect in-flight runs.

This means publishing a revision is safe — no lurking state about which runs will see the new version. New starts see new, old continues old.

## Two execution modes

Every run has a `run_type`: **programmatic** or **supervised**. These are the two ways a process can be driven.

### Programmatic (default)

The engine walks the definition node by node, applying writes and picking transitions exactly as above. No outer LLM is in the loop. This is what every run starts as unless you explicitly pick supervised.

Use programmatic for the 95% of cases where the process flow *is* the logic. Predictable, auditable, cheap.

### Supervised

Supervised runs add a top-level `ProcessSupervisor`: a long-lived child conversation workflow that starts with the process and receives structured process events as the run advances. It sees the current node, recent history, available transitions / branches, current context, and any failure metadata.

The supervisor can respond with commands:

- **`continue_process`** — let the deterministic process keep going.
- **`set_context`** — propose a context repair. The process validates the merged context against the process schema and blocks all `_` fields.
- **`transition_to`** — move the process through a declared exit from the current node.
- **`skip_node`** — treat the current node as skipped and move forward when the node is explicitly skippable.
- **`retry_node`** — re-enter the current or requested node.
- **`fail_process`** — fail the process with a supervisor-provided reason.

The supervisor has process-control tools the worker agents never get: `set_context`, `transition_to`, `skip_node`, `continue_process`, `retry_node`, and `fail_process`. The workflow converts those tool calls into commands and applies them only after validation.

The orchestrator is given the same observability a human would have and can steer when things go sideways. This is the path for:

- Running a process with a human-in-the-loop chat — "review this step before continuing."
- Handling ambiguous inputs where the deterministic flow isn't enough.
- Meta-reasoning over many process instances (e.g. batch runs where the orchestrator decides priority).

The supervised orchestrator is effectively an overlay. The engine still owns state mutation and routing — `set_context` must satisfy the process context schema, internal `_` fields cannot be written, and `transition_to` must follow a declared transition or branch from the current node. `skip_node` is accepted only when the current node has `skippable: true`.

If you need break-glass behavior, opt in explicitly with supervisor policy metadata:

```json
{
    "metadata": {
        "supervisor": {
            "allow_transition_override": true,
            "allow_skip": true
        }
    }
}
```

The same `metadata.supervisor` policy can also be placed on an individual node. Node-level policy is preferable when only one step needs a controlled override.

A run's `config.user_message` carries an optional message from the user to the orchestrator — "prioritize speed over thoroughness on this batch."

The worker agents inside nodes are **always** in programmatic posture: constrained by result_schema, bounded by writes. They don't know whether they're running under a supervised or programmatic outer loop. That invariant is intentional — agents at the edge stay simple and auditable regardless of what's driving the overall run.

## When not to reach for a process

A process is the wrong tool when:

- The whole task is a single LLM call. Use an interaction.
- The flow is straight-line and needs no state. Use a workflow.
- The user wants an open-ended chat and there's no fixed sequence. Use an agent.
- The sequence is long but purely deterministic (ETL, file processing). Use a workflow.

Reach for a process when you have **branching**, **state that accumulates**, **human gates**, and at least one step where agentic reasoning pays off. That's the sweet spot.

## Summary — how to think about it

1. **Separate**: control flow, state, behavior. The definition encodes all three explicitly.
2. **Bound agents with writes + result_schema.** The engine only accepts what the schema allows.
3. **Route deterministically.** Guards on context, never on prose.
4. **Treat durability as free.** Let processes wait for humans, retry on failures, span days.
5. **Version confidently.** Snapshots mean publishing is safe.
6. **Pick the mode.** Programmatic by default, supervised when you want an LLM driving the outer loop.

The rest of this section — [node types](/processes/node-types), [agent nodes](/processes/agent-nodes), [authoring](/processes/authoring), [observability](/processes/observability), [task inbox](/processes/task-inbox), [tutorial](/processes/tutorial-contract-review) — is mechanics. Come back to this page when something feels off; usually what's off is one of the principles above.