Overview

Orchestration uses a central “conductor” to call tasks, enforce order, and track state. Choreography lets services react to events independently. Use orchestration when steps, deadlines, and approvals matter; use choreography for loosely-coupled signals with minimal coordination. Model the human and system steps in BPMN 2.0, with decisions in DMN and case work in CMMN.

When to orchestrate (vs. choreography)

Pick orchestration when…

There are approvals, SLAs, deadlines, or human reviews
Steps must run in strict order with compensation rules
Executives need a single place to see status and evidence

Pick choreography when…

Independent services react to events with weak ordering
Loose coupling matters more than central control
Temporary failure can be handled locally

Hybrid

Central orchestration for the critical path; event choreography for peripheral updates (notifications, analytics).

Core components

Orchestrator

Workflow engine/state machine; correlation IDs and timers
Compensations, retries, timeouts, and escalation hooks

Workers

Stateless, idempotent handlers; safe to retry
Access via allowlisted tools/APIs; role-scoped credentials

Queues & storage

At-least-once delivery (AMQP, Kafka) with DLQs
Idempotency keys; exactly-once semantics via design
Audit-grade state store; immutable logs

Patterns (saga, retries, idempotency)

Saga & compensation

Split long transactions; define compensating steps for partial failure
Prefer logical undo over database-level XA

Retries & backoff

Exponential backoff + jitter; cap retries; send to DLQ
Timeouts and circuit breakers for unstable dependencies

Idempotency

Use correlation/idempotency keys; ignore duplicate work
Design handlers to be repeat-safe (HTTP semantics: RFC 9110)

References

Saga pattern — microservices.io
AMQP — OASIS
Kafka — kafka.apache.org
HTTP idempotency — RFC 9110

Routing, queues & SLAs

Work routing

Queues by priority, skill, region; FIFO within class
Assignment: round-robin, load, or skill-based
Limit WIP to protect lead time (Little’s Law)

SLAs

Define per step; timers and escalations; visible aging
Auto-reassign stuck work; notify owners

Metrics

Lead time, queue time, throughput, first-pass yield
Backlog aging; reassignments; breach counts

HITL design & thresholds

Thresholds

Confidence × impact grid: auto-approve, review, block
Dual-control for high-risk steps (four-eyes)

Reviewer UX

Show sources, diffs, and suggested actions
One-click edits; capture rationale; next-best steps

Workforce management

Queue sizing and shifts meet SLA windows
Sampling of “auto” decisions for quality
Feedback loops to improve rules/models

Evidence, audit & controls

Logging

Who did what, when, to which record (user, timestamp, object)
Immutable/tamper-evident logs; retention by policy

Controls

Segregation of duties; thresholds and approvals
Change control for workflows, bots, and integrations

References

ISO/IEC 27001 — iso.org
NIST SP 800-53 — nist.gov

Observability & SLOs

Tracing & metrics

Distributed traces across orchestrator, workers, and queues
SLIs: success rate, latency, error types, retries, DLQ size

SLOs

Targets for latency/success; error budgets to govern change pace

References

OpenTelemetry — opentelemetry.io
Google SRE: SLOs — sre.google

90-day starter

Days 0–30: Model & scope

Draft BPMN L2/L3; list approvals and SLAs
Pick orchestrated vs. choreographed segments
Define compensations and idempotency keys

Days 31–60: Build & guard

Implement retries, backoff, DLQ; add correlation IDs
Add HITL thresholds and reviewer UX
Wire tracing; set SLOs; create runbooks

Days 61–90: Pilot & prove

Pilot one corridor; track lead time, breach rate, rework
Fix hotspots; publish deltas; plan scale-out

References

OMG BPMN 2.0.2 — omg.org
DMN / CMMN — DMN · CMMN
Saga pattern — microservices.io
AMQP / Kafka — OASIS · kafka.apache.org
RFC 9110 (HTTP idempotency) — rfc-editor.org
OpenTelemetry — opentelemetry.io
Google SRE / SLOs — sre.google
ISO/IEC 27001 · NIST SP 800-53 — iso.org · nist.gov

Coordinate the flow. Route the edge cases. Keep evidence tight.

If you want an orchestration checklist (saga, retries, idempotency, HITL, SLOs), ask for a copy.

Contact us

Orchestration & Human-in-the-Loop

Overview

When to orchestrate (vs. choreography)

Pick orchestration when…

Pick choreography when…

Hybrid

Core components

Orchestrator

Workers

Queues & storage

Patterns (saga, retries, idempotency)

Saga & compensation

Retries & backoff

Idempotency

References

Routing, queues & SLAs

Work routing

SLAs

Metrics

HITL design & thresholds

Thresholds

Reviewer UX

Workforce management

Evidence, audit & controls

Logging

Controls

References

Observability & SLOs

Tracing & metrics

SLOs

References

90-day starter

Days 0–30: Model & scope

Days 31–60: Build & guard

Days 61–90: Pilot & prove

References

Coordinate the flow. Route the edge cases. Keep evidence tight.