← All Migrations
🧊 Apache Iceberg Migration Platform

Migrate Everything
to Apache Iceberg.

MigryX converts SAS, Talend, Alteryx, IBM DataStage, Informatica, Oracle ODI, SSIS, Teradata, and SQL dialects to Apache Iceberg — open table format running on Spark, Trino, Flink, or Dremio — with schema evolution, hidden partitioning, time travel, and full column-level lineage.

10+
Legacy Sources
All migrated to Iceberg
+95%
Parser Accuracy
Up to 99% with optional AI augmentation
Open
Table Format
No vendor lock-in
Col.
Level Lineage
Full STTM with partition metadata

Iceberg Targets

What MigryX produces on Apache Iceberg

Every migration generates production-ready Iceberg artifacts — leveraging PySpark, Trino SQL, Flink streaming, schema evolution, hidden partitioning, time travel, and catalog integration.

🧊

Iceberg Tables

ACID-compliant tables with hidden partitioning, metadata-driven pruning, and snapshot isolation — the open table format for modern data lakehouses.

Spark + Iceberg

PySpark reading/writing Iceberg tables natively via Iceberg catalog integration — pushdown computation with partition pruning and predicate filtering.

🔍

Trino + Iceberg

High-performance SQL analytics on Iceberg tables with predicate pushdown — federated queries across Iceberg catalogs with sub-second latency.

🌊

Flink + Iceberg

Streaming writes and CDC to Iceberg tables via Flink Iceberg connector — real-time data ingestion with exactly-once semantics and automatic compaction.

🔄

Schema Evolution

Add, rename, drop, or reorder columns without full table rewrite — Iceberg tracks schema changes in metadata, ensuring backward and forward compatibility.

Time Travel

Point-in-time queries via snapshot isolation, rollback to any previous state — audit, debug, and reproduce historical results with zero data duplication.

📊

Partition Evolution

Change partition strategy (daily to hourly, add new partition columns) without data rewrite — Iceberg handles mixed partition layouts transparently.

🗂️

Catalog Integration

Hive Metastore, AWS Glue, Nessie, Polaris, Unity Catalog, and REST catalog — register Iceberg tables in any catalog for unified governance and discovery.

Migration Sources

Every legacy source — migrated to Iceberg.

Purpose-built parsers for each source platform. Not generic scanners. Every conversion produces explainable, auditable, Iceberg-native code — PySpark, Trino SQL, or Flink streaming inserts.

SAS

SAS to Iceberg

Base · Macros · PROC SQL · SAS/IML

Automate SAS Base, Macro, PROC SQL, and IML conversion to PySpark with Iceberg table writes. DATA step logic, FORMAT/INFORMAT handling, PROC SORT/MEANS/FREQ translated to Spark SQL on Iceberg.

Iceberg Tables PySpark Trino SQL Schema Evolution
⚙️

Talend to Iceberg

Studio · Open Studio · tMap · Cloud

Parse Talend project exports (ZIP/Git), .item artifacts, tMap joins, metadata, contexts, and connections — converted to PySpark Iceberg jobs with full component-level lineage and catalog registration.

PySpark Iceberg Tables Flink CDC
📈

Alteryx to Iceberg

Designer · Workflows · Macros · Apps

Convert Alteryx Designer workflows (.yxmd/.yxwz), macros, and apps to PySpark with Iceberg catalog writes — tool-by-tool translation with full lineage preservation and partition strategy recommendations.

Iceberg Tables PySpark Schema Evolution
IBM
DS

DataStage to Iceberg

Parallel · Server · DataStage X

Migrate IBM DataStage parallel and server jobs, sequences, shared containers, and XML definitions to PySpark with Iceberg table writes — transformer logic translated to Spark SQL with hidden partitioning.

PySpark Iceberg Tables Flink CDC
INFA

Informatica to Iceberg

PowerCenter · IDMC · IICS

Migrate Informatica PowerCenter (.xml exports) and IDMC/IICS mappings — sources, targets, transformations, and workflows — to PySpark with Iceberg writes and catalog lineage registration.

Iceberg Tables Trino SQL PySpark
ODI

Oracle ODI to Iceberg

Repository export · KMs · Packages

Parse Oracle ODI repository exports — mappings, interfaces, knowledge modules, packages, and load plans — converted to PySpark Iceberg writes with full column-level lineage in Iceberg REST catalog.

Iceberg Tables PySpark Schema Evolution
SSIS

SSIS to Iceberg

.dtsx · .ispac · Data Flow · Scripts

Parse SSIS .dtsx packages and .ispac archives — data flow, control flow, SSIS expressions, C#/VB.NET script tasks — to PySpark Iceberg pipelines with Flink CDC for streaming ingestion patterns.

PySpark Flink CDC Iceberg Tables
BTEQ

Teradata to Iceberg

BTEQ · FastLoad · QUALIFY · Macros

Migrate Teradata BTEQ, FastLoad, MultiLoad, and Teradata SQL — QUALIFY rewriting, BTEQ command translation, and PRIMARY INDEX converted to Iceberg hidden partition strategies and sort orders.

Trino SQL Iceberg Tables PySpark
ORA

Oracle PL/SQL to Iceberg

Procedures · Packages · Triggers

Migrate Oracle PL/SQL procedures, packages, and triggers with 2000+ function mappings, CONNECT BY to recursive CTE rewriting, BULK COLLECT to PySpark batching, writing to Iceberg tables with full lineage.

PySpark Trino SQL Iceberg Tables
SQL

SQL Dialects to Iceberg

15+ Dialects · 500+ Function Maps

Transpile SQL from Oracle, T-SQL, Teradata, DB2, Netezza, Greenplum, Hive HQL, and Vertica to Trino SQL on Iceberg tables — 500+ function mappings, window function normalization, and schema evolution support.

Trino SQL Iceberg Tables Schema Evolution
DFX

SAS DataFlux to Iceberg

dfPower Studio · DMS · DQ Schemes

Migrate SAS DataFlux dfPower Studio jobs and DQ schemes — standardize/parse/match/validate patterns — to PySpark UDFs writing Iceberg tables with data quality constraints and anomaly detection.

PySpark Iceberg Tables Flink CDC
🔍

MigryX Compass

Discovery · Lineage · Iceberg Catalog

Before you migrate, map your estate. Compass extracts column-level lineage, STTM, and dependency graphs from any source — and publishes them directly into the Iceberg REST catalog for governance.

Iceberg Tables STTM Lineage Graphs

How It Works

From legacy codebase to Apache Iceberg in five steps

The same proven methodology applies to every source — SAS, Talend, Alteryx, DataStage, Informatica, or ODI — all landing natively on Apache Iceberg.

1

Ingest

Upload source artifacts — SAS scripts, Talend exports, DataStage XML, .dtsx packages — into MigryX for parsing.

2

Parse & Analyze

Custom parsers build complete ASTs, expand macros, resolve dependencies, and produce column-level lineage — with Iceberg-readiness scoring.

3

Convert

Convert to PySpark with Iceberg catalog writes, Trino SQL views, or Flink streaming inserts — with auto documentation and partition strategy recommendations.

4

Validate

Row-level and aggregate data matching between legacy and Iceberg outputs — using Spark-native comparison queries for audit-ready sign-off.

5

Govern

Publish lineage, STTM, and data contracts to Iceberg REST catalog. Merlin AI surfaces risk and recommends partition strategies, sort orders, and compaction policies.

Platform Capabilities

Built for Open Lakehouse Architecture

Every MigryX migration leverages the full Apache Iceberg ecosystem — PySpark compute, Trino analytics, Flink streaming, schema evolution, hidden partitioning, time travel, and multi-catalog integration.

⚙️

Custom-Built Parsers

Purpose-built for each source language — SAS macro expansion, DataStage XML, Talend .item files, SSIS .dtsx — full fidelity, no approximation, deterministic output.

🧊

Open Table Format Output

Legacy ETL logic converted to Iceberg table writes via PySpark — no vendor lock-in. Output runs on any engine that supports Iceberg: Spark, Trino, Flink, Dremio, or Databricks.

🔄

Multi-Engine Support

Output runs on Spark, Trino, Flink, Dremio, Snowflake, or Databricks — the same Iceberg tables are queryable from any engine with no data movement or format conversion required.

📐

Catalog & Lineage Registration

Source-to-target column mappings registered in Iceberg REST catalog, AWS Glue, or Nessie — full STTM export with partition metadata, schema versions, and snapshot history for compliance.

🤖

Merlin AI for Partition Optimization

AI analyzes data patterns to recommend hidden partitioning, sort orders, and compaction policies. Automatic partition evolution suggestions based on query patterns and data volume growth.

🔒

On-Premise & Air-Gapped

Full deployment behind your firewall. Source code and lineage never leave your network. Iceberg catalog integration works with on-prem Hive Metastore or Nessie. SOX, GDPR, BCBS 239 ready.

Measurable Results

Quantifiable Value — On Apache Iceberg

Organizations using MigryX to land on Apache Iceberg accelerate delivery, eliminate manual rewrite cost, and unlock multi-engine lakehouse performance from day one.

85%
Faster Delivery

Automated lineage extraction and parser-driven analysis eliminate months of manual discovery and rewrite.

70%
Risk Reduction

Complete dependency visibility prevents production incidents and migration-related data defects.

60%
Lower Costs

Automated conversion, accelerated time-to-value, and eliminated rework deliver 60%+ cost savings.

+95%
Parser Accuracy

Deterministic custom parsers deliver +95% accuracy out of the box. Optional AI augmentation pushes accuracy up to 99%.

Why MigryX

Custom parsers vs. generic Iceberg migration tooling

Generic ETL scanners approximate lineage. MigryX parses it exactly — every macro, every column, every dialect — then lands it natively on Apache Iceberg with full PySpark and multi-engine support.

Capability MigryX Generic Tools
Custom parser per source (SAS, Talend, DataStage, etc.)
100% column-level lineage to Iceberg catalog~
Native PySpark + Iceberg output generation
Multi-engine support (Spark, Trino, Flink, Dremio)
SAS macro expansion & full dialect support
Parser-driven partition strategy optimization
On-premise / air-gapped deployment
Row-level data validation & parity proof
STTM export & Iceberg REST catalog registration~
Schema evolution & partition evolution support
Hidden partitioning recommendation engine

✓ Full support   ~ Partial / approximate   ✗ Not supported

Ready to land on Apache Iceberg?

Schedule a technical deep-dive on your specific source — SAS, Talend, Alteryx, DataStage, Informatica, or ODI. We'll show you parsed lineage and Iceberg output from code.