Process Mining Basics

Process mining turns event data into an objective view of how work flows. It discovers actual paths, checks conformance to the target model, and enhances processes with facts about time, rework, and handoffs.

Overview

Event data in business systems records who did what and when. Process mining reads these logs to reconstruct the flow, compare it to the intended model, and quantify delay, rework, and variants. It complements mapping methods like BPMN by showing the real path, not only the designed path.

Event logs & formats

Minimum fields

  • Case ID (process instance, e.g., order, ticket, claim)
  • Activity (event class)
  • Timestamp (with time zone or UTC)

Helpful attributes

  • Resource/role, lifecycle transition (start/complete), cost, channel, product, region

Standards

  • XES (IEEE 1849-2016) — classic log format for discovery and conformance
  • OCEL — supports many-to-many objects (cases with orders, items, invoices)

Primary links

Three use cases

Discovery

Build a model from data. See the common path and the long tail of variants.

Conformance

Compare data to the target model. Quantify fitness and highlight violations and missing steps.

Enhancement

Annotate the model with times, queues, rework, resources, and cost to find delay and waste.

Preparing data

Extraction patterns

  • Identify the case (order, ticket, claim). Join tables that hold status changes.
  • Build events from lifecycle changes (created, assigned, completed).
  • Keep UTC timestamps; store time zone and daylight rules if needed.

Data hygiene

  • Deduplicate events; sort by timestamp; handle ties and same-second events.
  • Normalize activity names; document filters and exclusions.

Multi-object processes

Use OCEL when a case spans objects (order ↔ items ↔ invoice). Avoid flattening that loses relations.

Algorithms & tools

Discovery (examples)

  • Inductive Miner — robust to noise; produces sound models
  • Heuristics Miner — frequency-based; handles noise with thresholds
  • Alpha Miner — classic; good for teaching, less robust in practice

Conformance

  • Token-based replay (fast)
  • Alignments (optimal matching of log to model)

Ecosystem

Quality, drift & privacy

Data quality

  • Completeness, accuracy, validity, timeliness (CAVT)
  • Missing timestamps, inconsistent IDs, activity naming drift

Concept drift

Processes change over time. Split logs into windows; compare models; detect shifts in variants and cycle time.

Privacy & ethics

Mask personal data; pseudonymize IDs; restrict resource views when needed. Keep access logs and retention rules.

Fitness, precision & other metrics

Model quality

  • Fitness — how much of the log the model can replay
  • Precision — how much behavior the model allows that the log does not show
  • Simplicity — model complexity (prefer smaller, sound models)
  • Generalization — avoids overfitting to the sample log

Operational metrics

  • Lead time and wait time by path/variant
  • Rework rate; return loops
  • Handoffs by role; social network density

Typical applications

Finance

Order-to-Cash, Procure-to-Pay: maverick buying, price variance, three-way match issues.

IT service

Incident-to-Resolution: ping-pong handoffs, SLA breaches, backlog aging patterns.

Healthcare / public

Referral-to-Treatment or Permitting: queue hotspots, rework loops, missing documents.

90-day starter

Days 0–30: Data

  • Pick one flow. Extract case ID, activity, timestamp, resource.
  • Clean names; deduplicate; store UTC; document filters.

Days 31–60: Discovery & conformance

  • Run Inductive/Heuristics Miner; list top variants.
  • Check conformance; log violations with counts and impact.

Days 61–90: Action

  • Target one bottleneck or loop; implement a small change.
  • Re-measure lead time and rework; publish the deltas.

References

Turn event data into a clear path for change.

If you want a log spec (XES/OCEL) and a discovery checklist, ask for a copy.

Contact us