Using Iceberg Time Travel for Migration Validation and Rollback

Migration validation is where projects stall. The conversion is done, the code compiles, the jobs run. But how do you prove that the output matches the legacy system? How do you demonstrate, with auditable evidence, that 47 million rows of customer transactions migrated from a SAS environment produce identical results when processed by the new PySpark pipeline writing to Iceberg tables?

Traditional validation approaches — parallel runs, sample comparisons, row-count checks — are slow, incomplete, and generate mountains of spreadsheets that satisfy no one. Apache Iceberg's snapshot-based time travel changes the validation equation fundamentally. Every write creates an immutable snapshot. Every snapshot is queryable. Every change is auditable. And if something goes wrong, rollback is instantaneous.

The Validation Challenge

Migration validation has been an unsolved problem for decades. The core difficulty is straightforward: you need to prove equivalence between two fundamentally different systems. The legacy system (SAS, Informatica, DataStage, COBOL) produces output using one set of processing semantics, type systems, and rounding behaviors. The target system (PySpark on Iceberg) uses a completely different stack. Proving that these two systems produce identical results requires comparing output at scale, and doing it repeatedly as migration batches progress.

Traditional approaches each have significant limitations:

Parallel run: Run both legacy and modern pipelines on the same input, compare outputs. Requires maintaining the legacy system throughout migration, doubling compute costs, and building custom comparison tooling. Typically takes 4-8 weeks per major pipeline.
Sample comparison: Select a random sample of records and compare manually. Statistically unsound for catching systematic errors (type conversion bugs, rounding differences, timezone handling). Misses edge cases that appear only in specific data distributions.
Row count and checksum: Compare aggregate metrics between source and target. Catches data loss but misses value-level discrepancies. A table can have the correct row count with every single value wrong.
Manual spot checks: SMEs review specific records they know well. Subjective, non-reproducible, and does not scale. Useful for confidence but not for audit evidence.

The deeper problem is temporal. Migration is not a single event. It unfolds over weeks or months. Data is migrated in batches. Pipelines are converted incrementally. Source data continues to change. Without the ability to pin down exactly what data existed at a specific point in time, validation becomes a moving-target problem that never fully resolves.

Apache Iceberg — enterprise migration powered by MigryX

Iceberg Snapshots: Built-in Version Control for Data

Apache Iceberg implements version control at the table level. Every write operation — whether an INSERT, UPDATE, DELETE, or MERGE — creates a new immutable snapshot. Each snapshot captures the complete state of the table at that moment: every data file, every manifest, every partition spec, every schema version. Snapshots are identified by a unique 64-bit snapshot ID and are timestamped to millisecond precision.

The snapshot metadata is lightweight. It does not duplicate data files. Instead, each snapshot points to a set of manifest files that in turn reference the actual data files on object storage. When a new write adds files, the new snapshot shares all unchanged manifests with the previous snapshot and adds only the new manifests for the new files. This structural sharing means that maintaining thousands of snapshots adds minimal storage overhead.

Crucially, snapshots are immutable. Once created, a snapshot can never be modified. The data files it references cannot be overwritten or deleted (until the snapshot itself is expired through explicit garbage collection). This immutability is the foundation of time travel: you can always go back to any snapshot and see exactly the data that existed at that point in time.

-- List all snapshots for a table
SELECT snapshot_id, committed_at, operation, summary
FROM catalog.analytics.customer_txn.snapshots;

A typical migration table might accumulate snapshots like this:

Snapshot ID	Timestamp	Operation	Summary
738291046	2026-03-15 08:00:00	append	Batch 1: 12M rows initial load
738291102	2026-03-16 08:00:00	append	Batch 2: 8M rows incremental
738291198	2026-03-17 08:00:00	overwrite	Batch 2 reprocess: fixed date conversion
738291256	2026-03-18 08:00:00	append	Batch 3: 11M rows incremental

Each of these snapshots is permanently queryable. You can see the table as it was after the initial load, after the failed batch 2, after the corrected batch 2, and after batch 3. This history is invaluable for migration validation because it gives you precise, immutable reference points to compare against.

MigryX: Idiomatic Code, Not Line-by-Line Translation

The difference between MigryX and manual migration is not just speed — it is code quality. MigryX generates idiomatic, platform-optimized code that leverages native features of your target platform. A SAS DATA step does not become a clunky row-by-row loop — it becomes a clean, vectorized DataFrame operation. A PROC SQL query does not become a literal translation — it becomes an optimized query that takes advantage of your platform’s pushdown capabilities.

Migration Validation with Time Travel

Time travel transforms migration validation from a manual, error-prone process into a systematic, repeatable, and auditable one. The fundamental technique is straightforward: query the Iceberg table at a specific point in time and compare the results against the legacy system output for the same point in time.

Point-in-Time Comparison

After each migration batch, the legacy system output is known and fixed. The Iceberg table now has a snapshot representing the state after that batch. You can query both and compare:

-- Query the Iceberg table as it was right after Batch 1
SELECT *
FROM catalog.analytics.customer_txn
FOR SYSTEM_TIME AS OF TIMESTAMP '2026-03-15 08:30:00';

-- Or use a specific snapshot ID for exact precision
SELECT *
FROM catalog.analytics.customer_txn
FOR SYSTEM_VERSION AS OF 738291046;

This query returns exactly the rows that existed at that snapshot. Not the current state of the table with four batches loaded. Not an approximation. The exact data from that specific snapshot. This precision is what makes validation tractable.

Aggregate Validation

Time travel enables running validation queries that compare pre-migration and post-migration snapshots side by side — MigryX generates these validation suites automatically. Row counts, aggregate checksums, distinct value counts, and date ranges are all compared between the Iceberg snapshot and the legacy baseline to confirm equivalence.

Value-Level Diff

When aggregates match, value-level comparison confirms that individual records are identical between legacy and migrated outputs. Time travel provides a stable reference point for joining legacy validation data against a specific Iceberg snapshot, surfacing mismatches at the row and column level.

Cross-Batch Consistency

As migration batches accumulate, cross-batch consistency checks verify that earlier batches were not corrupted by later operations. By comparing the current state of batch 1 rows against the original batch 1 snapshot, you can confirm data immutability across the entire migration timeline.

MigryX precision parser — Deep AST-level analysis ensures every construct is understood before conversion begins

Platform-Specific Optimization by MigryX

MigryX maintains deep knowledge of every target platform’s strengths and best practices. When converting to Snowflake, it leverages Snowpark and native SQL functions. When targeting Databricks, it uses PySpark DataFrame operations optimized for distributed execution. When generating dbt models, it follows dbt best practices for modularity and testability. This platform awareness is what makes MigryX output production-ready from day one.

Rollback as a Safety Net

In traditional data platforms, rollback after a failed migration batch is a nightmare. You might need to restore from a backup (which could take hours for a large table), or replay all previous batches from scratch, or manually identify and delete the bad data. Each option is slow, risky, and error-prone.

Iceberg makes rollback instantaneous. Because every snapshot is immutable and retained, rolling back to a previous state is a metadata-only operation that simply changes which snapshot is current. No data files are deleted. No restore operations run. The table immediately reflects the state of the target snapshot.

-- Roll back to the state before the failed batch
CALL catalog.system.rollback_to_snapshot(
  'analytics.customer_txn',
  738291102  -- snapshot before the failed operation
);

This operation completes in milliseconds regardless of table size. A petabyte table rolls back as quickly as a megabyte table because the operation only updates a metadata pointer. The data files from the failed batch remain on storage (they will be cleaned up by snapshot expiration later) but are immediately invisible to all readers.

Rollback Scenarios in Migration

Rollback is not just for catastrophic failures. In the context of migration, it serves as a safety net for a wide range of scenarios:

Type conversion error discovered: Batch 3 had a date conversion bug that truncated timestamps. Roll back to the pre-batch-3 snapshot, fix the conversion logic, and re-run.
Data quality issue in source extract: The source system extract included test records that should have been filtered. Roll back, fix the extraction filter, and re-run.
Performance regression: The new partition strategy introduced by batch 5 degraded query performance. Roll back and try a different approach.
Business logic discrepancy: Validation reveals that the migrated code calculates a derived field differently than legacy. Roll back while the logic is corrected.

In each case, rollback takes milliseconds. There is no "restore from backup" step, no "re-run all previous batches" recovery process. The safety net is always there, always instant, and always reliable.

Rollback vs. Restore: A Comparison

Aspect	Traditional Backup/Restore	Iceberg Rollback
Time to recover	Minutes to hours (depends on table size)	Milliseconds (metadata only)
Data at risk during recovery	Yes (restore may fail or be partial)	No (original snapshot is immutable)
Requires pre-planned backups	Yes (must schedule and monitor)	No (every write creates a snapshot automatically)
Storage overhead	Full copy of data per backup	Shared data files, metadata-only overhead
Granularity	Backup schedule (typically daily)	Every write operation

Audit Trail for Compliance

Regulated industries — financial services, healthcare, insurance, government — face strict requirements around data lineage, change tracking, and auditability. Regulations like SOX (Sarbanes-Oxley), GDPR (General Data Protection Regulation), and BCBS 239 (Basel Committee on Banking Supervision) require organizations to demonstrate exactly how data was transformed, when changes occurred, and who or what initiated those changes.

During a migration, these requirements intensify. Regulators want to know that data migrated from a legacy system to a new platform is complete, accurate, and traceable. They want evidence, not assertions. And they want that evidence to be reproducible: if an auditor asks to see the state of a table on a specific date six months ago, you need to produce it.

Iceberg's Built-in Audit Capabilities

Iceberg's snapshot history provides exactly this capability. Every change to a table is recorded with a timestamp, an operation type, and summary metadata. The snapshot log is immutable: it cannot be retroactively modified. And the data referenced by each snapshot is preserved until explicitly expired.

-- Full history of changes to the table
SELECT
  h.snapshot_id,
  h.made_current_at AS change_timestamp,
  s.operation,
  s.summary['added-records'] AS records_added,
  s.summary['deleted-records'] AS records_deleted,
  s.summary['added-files-size'] AS bytes_added
FROM catalog.analytics.customer_txn.history h
JOIN catalog.analytics.customer_txn.snapshots s
  ON h.snapshot_id = s.snapshot_id
ORDER BY h.made_current_at;

SOX Compliance

SOX requires that financial data transformations are documented, auditable, and reproducible. Iceberg's snapshot history provides a complete log of every data change. Combined with the ability to query any historical snapshot, auditors can verify the state of financial data at any point in the reporting period. No custom audit logging required. No separate data versioning system. The table format itself is the audit trail.

GDPR Right to Erasure

GDPR requires the ability to delete personal data upon request. In Iceberg, delete operations create new snapshots just like any other write. The deletion is recorded in the snapshot history with a clear timestamp and record count. Older snapshots still reference the deleted data (which is required for the audit trail), but those snapshots can be expired on a defined schedule to achieve actual erasure while maintaining the audit log of the deletion event.

BCBS 239 Data Lineage

BCBS 239 requires financial institutions to demonstrate end-to-end data lineage for risk reporting. Iceberg's metadata — combined with column-level lineage from the migration conversion — provides a traceable chain from source data in the legacy system through transformation logic to the final Iceberg table. Each snapshot in the chain is independently verifiable, creating an unbroken audit trail from legacy to lakehouse.

MigryX + Iceberg Validation

MigryX generates validation queries that compare legacy output against Iceberg snapshots — row counts, aggregate checksums, and sample data diffs — producing audit-ready evidence for migration sign-off.

The combination of Iceberg's immutable snapshots with MigryX's automated validation query generation creates a migration validation framework that is systematic, reproducible, and audit-ready from day one. Every batch is validated against a pinned snapshot. Every discrepancy is traceable. Every rollback is recorded. And the entire history is queryable by anyone, at any time, using standard SQL. This is not validation as an afterthought. It is validation built into the architecture of the migration itself.

Why MigryX Delivers Superior Migration Results

The challenges described throughout this article are exactly what MigryX was built to solve. Here is how MigryX transforms this process:

Production-ready output: MigryX generates code that passes code review and runs in production — not prototype-quality output that needs weeks of cleanup.
Platform optimization: Converted code leverages target platform-specific features for maximum performance and cost efficiency.
25+ source technologies: Whether migrating from SAS, Informatica, DataStage, SSIS, or any of 25+ legacy technologies, MigryX handles it.
Automated documentation: Every conversion decision is documented with before/after code mappings and transformation rationale.

MigryX combines precision AST parsing with Merlin AI to deliver 99% accurate, production-ready migration — turning what used to be a multi-year manual effort into a streamlined, validated process. See it in action.

Ready for audit-ready migration validation?

See how MigryX generates Iceberg time-travel validation queries and rollback scripts that turn migration confidence from hope into evidence.

Schedule a Demo