Data Readiness for Automation

Automation succeeds when data is stable, rules are explicit, and systems exchange facts through clear contracts. Prove the data first—then automate the stable parts.

Overview

Before selecting workflow, integration, RPA, or AI, verify that inputs and outputs are defined, sources are trustworthy, and evidence lives where work occurs. Use one logic path for daily operations and month-end reporting—no parallel spreadsheets.

Readiness criteria

Clarity

  • Operational definitions for inputs/outputs
  • Stable names and units (ISO 8601 dates; RFC 3339 timestamps)
  • Versioned schemas; change policy

Trust

  • Authoritative sources named; owners assigned
  • Data quality checks (completeness, accuracy, validity, timeliness)
  • Lineage to system of record

Control

  • Access/RBAC; SoD for changes
  • Evidence stored at the producing step
  • Logging and retention rules

Data contracts & schemas

Define contracts

Metadata

  • Business glossary + technical dictionary
  • Metadata registration (ISO/IEC 11179 concept) — iso.org

Good practice

  • Reject invalid payloads at the edge (fail fast)
  • Version schemas; deprecate with dates
  • Automated contract tests in CI

Master & reference data

What to stabilize

  • Customers, products, locations, vendors
  • Reference lists: currencies, units, tax codes
  • Identifiers and merge/survivorship rules

IDs & keys

  • Global IDs where possible (e.g., GS1 keys) — gs1.org
  • Natural vs surrogate keys documented

Governance

  • Data owners/stewards; change SLA
  • Golden records; match/merge strategies
  • Audit of master data changes

Event data & logs

Minimum fields

  • Case ID (process instance)
  • Activity name
  • Timestamp (UTC or with zone)
  • Actor/resource (optional but useful)

Formats

Uses

  • Process mining (discovery/conformance)
  • SLI/SLO and queue health
  • Exception root-cause analysis

IDs, time & idempotency

Identity

  • Stable primary keys; correlation IDs across systems
  • Idempotency keys for writes/retries

Time

  • ISO 8601 / RFC 3339 timestamps; store UTC
  • Record start/complete/paused states

Error taxonomy

  • Retryable vs non-retryable
  • Business vs technical error classes
  • Dead-letter rules and alerts

APIs & integration

Contracts & auth

  • OpenAPI/GraphQL; input validation
  • OAuth 2.0 / OIDC for identity — RFC 6749 · openid.net

Events & queues

  • AMQP/Kafka for at-least-once delivery — OASIS · kafka.apache.org
  • Idempotent consumers; replay with retention

RPA fallback

Use UI automation only when APIs are absent and screens are stable. Prefer API contracts for durability.

Data quality (CAVT) & profiling

Dimensions

  • Completeness, Accuracy, Validity, Timeliness (CAVT)
  • Consistency and Uniqueness as supporting checks

Profiling

  • Nulls, ranges, patterns, referential integrity
  • Drift detection on key distributions

Standards

  • ISO 8000 (data quality concepts) — iso.org
  • NIST statistical methods — nist.gov

Catalog & lineage

Catalog

  • Business glossary + technical metadata
  • Schemas, owners, retention, quality rules

Lineage

  • End-to-end data flow visibility
  • OpenLineage compatible where possible — openlineage.io

Why it matters

Faster impact analysis, cleaner audits, fewer surprises in change windows.

Privacy & security

Access & retention

  • Least privilege; role-based access; periodic reviews
  • Retention by policy and law (e.g., GDPR) — EUR-Lex

Protection

  • Encrypt in transit/at rest; mask PII where possible
  • Log reads/writes; immutable audit trails

Frameworks

Testing & monitoring

Pre-prod

  • Contract tests (schemas, enums, ranges)
  • Golden datasets and replay tests

Prod

  • SLIs/SLOs for freshness, completeness, error rate
  • Dead-letter queues, retries, alerting

Change windows

  • Version bumps coordinated; rollback plans
  • Deprecations with sunset dates and dashboards

90-day starter

Days 0–30

  • Pick one flow; define input/output contracts (JSON Schema)
  • Name owners; catalog fields and sources
  • Baseline CAVT checks; fix blocking issues

Days 31–60

  • Publish API contracts (OpenAPI) with auth
  • Stand up lineage + dashboards; alert on breaks
  • Add idempotency keys and error taxonomy

Days 61–90

  • Pilot the automation; track lead time, FPY, exception rates
  • Harden retention, access reviews, and rollback
  • Publish deltas; plan scale-out

References

Prove the data. Then automate.

If you want a data-readiness scorecard (contracts, quality, lineage, privacy), ask for a copy.

Contact us