Overview
Event data in business systems records who did what and when. Process mining reads these logs to reconstruct the flow, compare it to the intended model, and quantify delay, rework, and variants. It complements mapping methods like BPMN by showing the real path, not only the designed path.
Event logs & formats
Minimum fields
- Case ID (process instance, e.g., order, ticket, claim)
- Activity (event class)
- Timestamp (with time zone or UTC)
Helpful attributes
- Resource/role, lifecycle transition (start/complete), cost, channel, product, region
Standards
- XES (IEEE 1849-2016) — classic log format for discovery and conformance
- OCEL — supports many-to-many objects (cases with orders, items, invoices)
Primary links
- XES (IEEE 1849-2016): ieeexplore.ieee.org
- OCEL: ocel-standard.org
- Process mining intro: processmining.org
Three use cases
Discovery
Build a model from data. See the common path and the long tail of variants.
Conformance
Compare data to the target model. Quantify fitness and highlight violations and missing steps.
Enhancement
Annotate the model with times, queues, rework, resources, and cost to find delay and waste.
Preparing data
Extraction patterns
- Identify the case (order, ticket, claim). Join tables that hold status changes.
- Build events from lifecycle changes (created, assigned, completed).
- Keep UTC timestamps; store time zone and daylight rules if needed.
Data hygiene
- Deduplicate events; sort by timestamp; handle ties and same-second events.
- Normalize activity names; document filters and exclusions.
Multi-object processes
Use OCEL when a case spans objects (order ↔ items ↔ invoice). Avoid flattening that loses relations.
Algorithms & tools
Discovery (examples)
- Inductive Miner — robust to noise; produces sound models
- Heuristics Miner — frequency-based; handles noise with thresholds
- Alpha Miner — classic; good for teaching, less robust in practice
Conformance
- Token-based replay (fast)
- Alignments (optimal matching of log to model)
Ecosystem
- ProM (research plugins): promtools.org
- PM4Py (Python): pm4py.fit.fraunhofer.de
- Apromore (open-source core): apromore.org
Quality, drift & privacy
Data quality
- Completeness, accuracy, validity, timeliness (CAVT)
- Missing timestamps, inconsistent IDs, activity naming drift
Concept drift
Processes change over time. Split logs into windows; compare models; detect shifts in variants and cycle time.
Privacy & ethics
Mask personal data; pseudonymize IDs; restrict resource views when needed. Keep access logs and retention rules.
Fitness, precision & other metrics
Model quality
- Fitness — how much of the log the model can replay
- Precision — how much behavior the model allows that the log does not show
- Simplicity — model complexity (prefer smaller, sound models)
- Generalization — avoids overfitting to the sample log
Operational metrics
- Lead time and wait time by path/variant
- Rework rate; return loops
- Handoffs by role; social network density
Typical applications
Finance
Order-to-Cash, Procure-to-Pay: maverick buying, price variance, three-way match issues.
IT service
Incident-to-Resolution: ping-pong handoffs, SLA breaches, backlog aging patterns.
Healthcare / public
Referral-to-Treatment or Permitting: queue hotspots, rework loops, missing documents.
90-day starter
Days 0–30: Data
- Pick one flow. Extract case ID, activity, timestamp, resource.
- Clean names; deduplicate; store UTC; document filters.
Days 31–60: Discovery & conformance
- Run Inductive/Heuristics Miner; list top variants.
- Check conformance; log violations with counts and impact.
Days 61–90: Action
- Target one bottleneck or loop; implement a small change.
- Re-measure lead time and rework; publish the deltas.
References
- Process mining portal — processmining.org
- van der Aalst, “Process Mining: Data Science in Action” — Cambridge Univ. Press
- IEEE XES standard (1849-2016) — ieeexplore
- OCEL multi-object logs — ocel-standard.org
- ProM tools — promtools.org
- PM4Py (Python) — pm4py.fit.fraunhofer.de
- Apromore — apromore.org
Turn event data into a clear path for change.
If you want a log spec (XES/OCEL) and a discovery checklist, ask for a copy.