Overview
Before selecting workflow, integration, RPA, or AI, verify that inputs and outputs are defined, sources are trustworthy, and evidence lives where work occurs. Use one logic path for daily operations and month-end reporting—no parallel spreadsheets.
Readiness criteria
Clarity
- Operational definitions for inputs/outputs
- Stable names and units (ISO 8601 dates; RFC 3339 timestamps)
- Versioned schemas; change policy
Trust
- Authoritative sources named; owners assigned
- Data quality checks (completeness, accuracy, validity, timeliness)
- Lineage to system of record
Control
- Access/RBAC; SoD for changes
- Evidence stored at the producing step
- Logging and retention rules
Data contracts & schemas
Define contracts
- JSON Schema for payloads (json-schema.org)
- OpenAPI for REST (openapis.org)
- Enumerations, ranges, requiredness, defaults
Metadata
- Business glossary + technical dictionary
- Metadata registration (ISO/IEC 11179 concept) — iso.org
Good practice
- Reject invalid payloads at the edge (fail fast)
- Version schemas; deprecate with dates
- Automated contract tests in CI
Master & reference data
What to stabilize
- Customers, products, locations, vendors
- Reference lists: currencies, units, tax codes
- Identifiers and merge/survivorship rules
IDs & keys
- Global IDs where possible (e.g., GS1 keys) — gs1.org
- Natural vs surrogate keys documented
Governance
- Data owners/stewards; change SLA
- Golden records; match/merge strategies
- Audit of master data changes
Event data & logs
Minimum fields
- Case ID (process instance)
- Activity name
- Timestamp (UTC or with zone)
- Actor/resource (optional but useful)
Formats
- XES (IEEE 1849) — ieeexplore
- CloudEvents for system events — cloudevents.io
Uses
- Process mining (discovery/conformance)
- SLI/SLO and queue health
- Exception root-cause analysis
IDs, time & idempotency
Identity
- Stable primary keys; correlation IDs across systems
- Idempotency keys for writes/retries
Time
- ISO 8601 / RFC 3339 timestamps; store UTC
- Record start/complete/paused states
Error taxonomy
- Retryable vs non-retryable
- Business vs technical error classes
- Dead-letter rules and alerts
APIs & integration
Contracts & auth
- OpenAPI/GraphQL; input validation
- OAuth 2.0 / OIDC for identity — RFC 6749 · openid.net
Events & queues
- AMQP/Kafka for at-least-once delivery — OASIS · kafka.apache.org
- Idempotent consumers; replay with retention
RPA fallback
Use UI automation only when APIs are absent and screens are stable. Prefer API contracts for durability.
Data quality (CAVT) & profiling
Catalog & lineage
Catalog
- Business glossary + technical metadata
- Schemas, owners, retention, quality rules
Lineage
- End-to-end data flow visibility
- OpenLineage compatible where possible — openlineage.io
Why it matters
Faster impact analysis, cleaner audits, fewer surprises in change windows.
Privacy & security
Testing & monitoring
Pre-prod
- Contract tests (schemas, enums, ranges)
- Golden datasets and replay tests
Prod
- SLIs/SLOs for freshness, completeness, error rate
- Dead-letter queues, retries, alerting
Change windows
- Version bumps coordinated; rollback plans
- Deprecations with sunset dates and dashboards
90-day starter
Days 0–30
- Pick one flow; define input/output contracts (JSON Schema)
- Name owners; catalog fields and sources
- Baseline CAVT checks; fix blocking issues
Days 31–60
- Publish API contracts (OpenAPI) with auth
- Stand up lineage + dashboards; alert on breaks
- Add idempotency keys and error taxonomy
Days 61–90
- Pilot the automation; track lead time, FPY, exception rates
- Harden retention, access reviews, and rollback
- Publish deltas; plan scale-out
References
- JSON Schema — json-schema.org
- OpenAPI Initiative — openapis.org
- ISO/IEC 11179 (metadata registries, concept) — iso.org
- GS1 Identification Keys — gs1.org
- IEEE XES (process event logs) — ieeexplore
- CloudEvents — cloudevents.io
- ISO 8000 (data quality concepts) — iso.org
- NIST e-Handbook (statistics) — nist.gov
- OpenLineage — openlineage.io
- ISO/IEC 27001 — iso.org
- NIST SP 800-53 — nist.gov
- GDPR (EU) — EUR-Lex
Prove the data. Then automate.
If you want a data-readiness scorecard (contracts, quality, lineage, privacy), ask for a copy.