Google Cloud Targets

What MigryX produces on Google Cloud

Every migration generates production-ready Google Cloud artifacts — leveraging BigQuery's serverless architecture, Dataform SQL pipelines, Dataproc Spark, Cloud Composer orchestration, Vertex AI, and Dataplex governance.

🔵

BigQuery SQL

Legacy SQL and ETL logic translated to BigQuery Standard SQL — partitioned & clustered tables, materialized views, authorized views, and row-level security policies.

📋

Dataform

ETL pipelines converted to Dataform SQL workflows — SQLX definitions, table declarations, incremental models, assertions, and Git-backed CI/CD deployment via Dataform repositories.

⚡

Dataproc PySpark

Heavy-lifting ETL converted to Dataproc PySpark jobs — serverless or cluster-based execution on Google Cloud, with BigQuery Spark connector for direct table read/write.

🌊

Cloud Dataflow

Streaming and complex batch pipelines migrated to Apache Beam / Dataflow — fully managed, auto-scaling, with native BigQuery sinks and Pub/Sub source integration.

🎼

Cloud Composer (Airflow)

Legacy scheduler and ETL orchestration replatformed to Cloud Composer — DAG-based scheduling, BigQueryOperator tasks, dependency management, and SLA monitoring.

🤖

BigQuery ML & Vertex AI

SAS analytical and scoring models converted to BigQuery ML (CREATE MODEL SQL) and Vertex AI — AutoML, feature engineering, model registry, and online/batch prediction endpoints.

🏔️

BigLake & Iceberg

Legacy data lake assets migrated to BigLake external tables or BigQuery-managed Apache Iceberg tables — open format on GCS with BigQuery query performance and fine-grained access control.

🗂️

Dataplex Governance

Column-level lineage, STTM mappings, data quality rules, and business glossary terms published to Google Dataplex — automated data catalog tagging and policy propagation.

🔗

Cloud Data Fusion

Legacy ETL visual workflows converted to Cloud Data Fusion pipelines — CDAP-based visual DAGs with BigQuery sinks, Wrangler transforms, and reusable plugin configurations for batch and real-time ingestion.

Migration Sources

Every legacy source — migrated to BigQuery.

Purpose-built parsers for each source platform. Not generic scanners. Every conversion produces explainable, auditable, BigQuery-native code — Standard SQL, Dataform SQLX, or Dataproc PySpark.

SAS

SAS to BigQuery

Base · Macros · PROC SQL · SAS/IML

Automate SAS Base, Macro, PROC SQL, and IML conversion to BigQuery Standard SQL and Dataproc PySpark. DATA step logic, FORMAT/INFORMAT, PROC SORT/MEANS/FREQ — with PROC MODEL landing in BigQuery ML or Vertex AI AutoML.

BigQuery SQL Dataproc BigQuery ML Vertex AI

SAS → BigQuery →

⚙️

Talend to BigQuery

Studio · Open Studio · tMap · Cloud

Parse Talend project exports (ZIP/Git), .item artifacts, tMap joins, metadata, contexts, and connections — converted to Dataform SQLX pipelines and Cloud Composer DAGs with full component-level lineage in Dataplex.

Dataform Composer Dataplex

Talend → BigQuery →

📈

Alteryx to BigQuery

Designer · Workflows · Macros · Apps

Convert Alteryx Designer workflows (.yxmd/.yxwz), macros, and apps to BigQuery SQL and Dataproc PySpark — tool-by-tool translation with full lineage, and UDFs registered in BigQuery for reuse.

BigQuery SQL Dataproc BigQuery UDFs

Alteryx → BigQuery →

IBM
DS

DataStage to BigQuery

Parallel · Server · DataStage X

Migrate IBM DataStage parallel and server jobs, sequences, shared containers, and XML definitions to Dataproc PySpark and Cloud Dataflow — transformer logic translated to BigQuery-native patterns with Dataplex lineage.

Dataproc Cloud Dataflow Dataplex

DataStage → BigQuery →

INFA

Informatica to BigQuery

PowerCenter · IDMC · IICS

Migrate Informatica PowerCenter (.xml exports) and IDMC/IICS mappings — sources, targets, transformations, and workflows — to Dataform SQLX models and Cloud Composer orchestration with Dataplex catalog registration.

Dataform Composer DAGs Dataplex

Informatica → BigQuery →

ODI

Oracle ODI to BigQuery

Repository export · KMs · Packages

Parse Oracle ODI repository exports — mappings, interfaces, knowledge modules, packages, and load plans — converted to Dataform SQLX incremental models and BigQuery partitioned tables with column-level lineage.

Dataform BigQuery SQL Dataplex

Oracle ODI → BigQuery →

SSIS

SSIS to BigQuery

.dtsx · .ispac · Data Flow · Scripts

Parse SSIS .dtsx packages and .ispac archives — data flow, control flow, SSIS expressions, C#/VB.NET script tasks — to Cloud Composer Airflow DAGs with BigQueryOperator tasks and Dataflow streaming jobs.

Composer Cloud Dataflow BigQuery SQL

SSIS → BigQuery →

BTEQ

Teradata to BigQuery

BTEQ · FastLoad · QUALIFY · Macros

Migrate Teradata BTEQ, FastLoad, MultiLoad, and Teradata SQL — QUALIFY → BigQuery window functions, BTEQ command translation, PRIMARY INDEX → partition/cluster key advisory, and 500+ function remappings.

BigQuery SQL Dataform Partitioning

Teradata → BigQuery →

ORA

Oracle PL/SQL to BigQuery

Procedures · Packages · Triggers

Migrate Oracle PL/SQL procedures, packages, and triggers with 2000+ function mappings, CONNECT BY → recursive CTE rewriting, BULK COLLECT → BigQuery ARRAY_AGG, and full package dependency resolution.

BigQuery SQL BigQuery UDFs Stored Procs

Oracle → BigQuery →

SQL

SQL Dialects to BigQuery

15+ Dialects · 500+ Function Maps

Transpile SQL from Oracle, T-SQL, Teradata, DB2, Netezza, Greenplum, Hive HQL, and Vertica to BigQuery Standard SQL — 500+ function mappings, ARRAY/STRUCT nested data handling, and partitioning advisories.

BigQuery SQL ARRAY/STRUCT Dataform

Any SQL → BigQuery →

DFX

SAS DataFlux to BigQuery

dfPower Studio · DMS · DQ Schemes

Migrate SAS DataFlux dfPower Studio jobs and DQ schemes — standardize/parse/match/validate patterns — to BigQuery SQL data quality routines and Dataplex data quality rules with Vertex AI anomaly detection.

BigQuery SQL Dataplex DQ Vertex AI

DataFlux → BigQuery →

🔍

MigryX Compass

Discovery · Lineage · Dataplex Catalog

Before you migrate, map your estate. Compass extracts column-level lineage, STTM, and dependency graphs from any source — and publishes them directly to Google Dataplex for catalog, tags, and data quality governance.

Dataplex STTM Lineage Graphs

Explore MigryX Compass →

How It Works

From legacy codebase to BigQuery in five steps

The same proven methodology applies to every source — SAS, Talend, Alteryx, DataStage, Informatica, or ODI — all landing natively on Google Cloud.

Ingest

Upload source artifacts — SAS scripts, Talend exports, DataStage XML, .dtsx packages — into MigryX for parsing.

→

Parse & Analyze

Custom parsers build complete ASTs, expand macros, resolve dependencies, and produce column-level lineage — with BigQuery readiness and partition/cluster key scoring.

→

Convert

Parser-driven conversion to BigQuery Standard SQL, Dataform SQLX, Dataproc PySpark, or Cloud Composer DAGs — with BigQuery best-practice partitioning, clustering, and materialized view patterns.

→

Validate

Row-level and aggregate data matching between legacy and BigQuery outputs — Dataplex data quality scans for audit-ready parity evidence and go-live sign-off.

→

Govern

Publish lineage, STTM, and data contracts to Dataplex. Merlin AI recommends partition strategies, slot reservations, and BigQuery ML vs. Vertex AI routing per workload.

Platform Capabilities

Built for Google Cloud's Data & AI Architecture

Every MigryX migration is engineered for the full Google Cloud stack — BigQuery serverless analytics, Dataform SQL pipelines, Dataproc Spark, Dataplex governance, and Vertex AI for analytical models.

⚙️

Custom-Built Parsers

Purpose-built for each source language — SAS macro expansion, DataStage XML, Talend .item files, SSIS .dtsx — full fidelity, deterministic output, no approximation.

🔵

BigQuery-Native SQL

Legacy SQL translated to BigQuery Standard SQL — partitioned tables (time, range), clustered tables, materialized views, authorized views, row-level security, and INFORMATION_SCHEMA metadata queries.

📋

Dataform Pipelines

ETL workflows converted to Dataform SQLX — incremental table models, table declarations, data assertions, tags, and Git-backed CI/CD via Dataform repositories and workflow configurations.

📐

Dataplex Lineage & Catalog

Source-to-target column mappings and STTM tables published to Google Dataplex — automated Data Catalog tags, policy tags for column-level security, data quality scans, and Lineage API integration.

🤖

Merlin AI & BigQuery ML

AI analyzes parsed metadata to recommend partition keys, clustering columns, and slot reservation sizing. SAS PROC MODEL and scoring logic land in BigQuery ML or Vertex AI AutoML automatically.

🔒

On-Premise & Air-Gapped

Full deployment behind your firewall. Source code and lineage never leave your network. Dataform Git integration for dev → staging → production promotion. SOX, GDPR, BCBS 239 ready.

Cloud Data Fusion

From Legacy ETL to Cloud Data Fusion Pipelines — Automatically

MigryX converts visual ETL from Alteryx, Talend, DataStage, Informatica, SSIS, and ODI into Cloud Data Fusion pipelines — generating both SQL-based and idiomatic CDAP approaches depending on transformation complexity.

SQL-First Approach

When legacy ETL logic is fundamentally SQL — SELECT, JOIN, GROUP BY, window functions, MERGE — MigryX generates Data Fusion pipelines that push SQL directly to BigQuery. The pipeline becomes a thin orchestration layer while BigQuery's serverless engine handles the heavy compute.

→ BigQuery Pushdown plugin executes SQL natively in BigQuery
→ Zero data movement for SQL-only transformations
→ Leverages BigQuery partitioning, clustering, and slot reservations
→ Ideal for: SAS PROC SQL, Teradata BTEQ, Oracle PL/SQL, SQL dialects

Idiomatic CDAP Approach

When legacy ETL involves visual dataflow logic — multi-branch routing, row-level transformations, conditional splits, custom functions — MigryX generates native CDAP plugins that map 1:1 to legacy transformation semantics inside Data Fusion's visual pipeline designer.

→ Wrangler directives for row-level parsing, cleansing, and type coercion
→ Joiner, GroupBy, and Deduplicate plugins for set operations
→ JavaScript and Python transform plugins for custom logic
→ Ideal for: Alteryx workflows, Talend tMap, DataStage parallel jobs, SSIS

🔀

Source-Aware Conversion

Each legacy source maps differently. Alteryx multi-output tools become Data Fusion Splitter plugins. Talend tMap lookup joins become Joiner stages. DataStage sequential files become GCS source connectors. MigryX chooses the right Data Fusion pattern for each source construct.

📋

Pipeline JSON Generation

MigryX produces Data Fusion pipeline JSON (CDAP artifact spec) ready for import — sources, transforms, sinks, connections, and error handling pre-configured. No manual assembly. Deploy directly via the Data Fusion REST API or Studio UI.

⚡

Hybrid SQL + CDAP Pipelines

Real-world ETL mixes SQL and procedural logic. MigryX generates hybrid pipelines that push SQL to BigQuery for set operations and use CDAP plugins for row-level transforms — combining the best of both approaches in a single Data Fusion pipeline.

🔄

Batch & Real-Time Modes

Legacy batch ETL converts to Data Fusion batch pipelines. Event-driven or near-real-time sources convert to Data Fusion real-time pipelines with Pub/Sub sources and streaming BigQuery sinks — preserving the original processing semantics.

📐

Lineage to Dataplex

Every generated Data Fusion pipeline includes lineage metadata. Column-level source-to-target mappings are published to Dataplex Data Catalog automatically — ensuring governance continuity from legacy ETL to Data Fusion without manual documentation.

🧪

Validation & Parity Testing

MigryX validates Data Fusion pipeline outputs against legacy system outputs — row counts, aggregate comparisons, and column-level data matching. Evidence-based parity reports prove the migration is correct before cutover.

Which approach does MigryX choose?

MigryX's parser analyzes each legacy workflow and classifies every transformation as SQL-expressible or plugin-required. Pure SQL flows get BigQuery pushdown pipelines. Visual dataflow logic gets CDAP plugins. Mixed workloads get hybrid pipelines. The decision is automatic, auditable, and overridable.

Read the Deep Dive →

Measurable Results

Quantifiable Value — On Google Cloud

Organizations using MigryX to land on BigQuery accelerate delivery, eliminate manual rewrite cost, and unlock BigQuery's serverless performance and Vertex AI capabilities from day one.

85%

Faster Delivery

Automated lineage extraction and parser-driven analysis eliminate months of manual discovery and rewrite work.

70%

Risk Reduction

Complete dependency visibility prevents production incidents and migration-related data defects on BigQuery.

60%

Lower Costs

Automated conversion, accelerated time-to-value, and eliminated rework deliver 60%+ project cost savings.

+95%

Parser Accuracy

Deterministic custom parsers deliver +95% accuracy out of the box. Optional AI augmentation pushes accuracy up to 99%.

Why MigryX

Custom parsers vs. generic BigQuery migration tooling

Generic ETL scanners approximate lineage. MigryX parses it exactly — every macro, every column, every dialect — then lands it natively on BigQuery with Dataform, Dataproc, and Dataplex support.

Capability	MigryX	Generic Tools
Custom parser per source (SAS, Talend, DataStage, etc.)	✓	✗
100% column-level lineage to Dataplex catalog	✓	~
Native BigQuery SQL with partitioning & clustering	✓	~
Dataform SQLX pipeline generation	✓	✗
Dataproc PySpark & Cloud Dataflow output	✓	✗
Cloud Composer DAG generation	✓	✗
BigQuery ML / Vertex AI analytical model migration	✓	✗
SAS macro expansion & full dialect support	✓	✗
Dataplex data quality rules & policy tag generation	✓	✗
On-premise / air-gapped deployment	✓	✗
Row-level data validation & parity proof	✓	✗
Cloud Data Fusion pipeline generation (SQL + CDAP)	✓	✗
BigQuery slot reservation & partition key recommendations	✓	✗
Alteryx .yxmd workflow XML parsing & conversion	✓	✗
IBM DataStage .dsx / parallel job XML parsing	✓	✗
Informatica PowerCenter XML + IDMC/IICS mapping parsing	✓	~
Oracle ODI Knowledge Module (IKM/LKM/CKM) translation	✓	✗
SSIS .dtsx package parsing (data flow + control flow)	✓	~
Talend .item artifact & tMap conversion	✓	✗
Teradata BTEQ command translation + 500+ SQL function maps	✓	~
Multi-target output (BigQuery + Snowflake + Databricks)	✓	✗
Deterministic AST-based parsing (not regex or AI-only)	✓	✗
Parser-driven risk analysis & BigQuery optimization	✓	✗
STTM export & Dataplex catalog registration	✓	~

✓ Full support ~ Partial / approximate ✗ Not supported

Migrate Everythingto BigQuery.

What MigryX produces on Google Cloud

BigQuery SQL

Dataform

Dataproc PySpark

Cloud Dataflow

Cloud Composer (Airflow)

BigQuery ML & Vertex AI

BigLake & Iceberg

Dataplex Governance

Cloud Data Fusion

Every legacy source — migrated to BigQuery.

SAS to BigQuery

Talend to BigQuery

Alteryx to BigQuery

DataStage to BigQuery

Informatica to BigQuery

Oracle ODI to BigQuery

SSIS to BigQuery

Teradata to BigQuery

Oracle PL/SQL to BigQuery

SQL Dialects to BigQuery

SAS DataFlux to BigQuery

MigryX Compass

From legacy codebase to BigQuery in five steps

Ingest

Parse & Analyze

Convert

Validate

Govern

Built for Google Cloud's Data & AI Architecture

Custom-Built Parsers

BigQuery-Native SQL

Dataform Pipelines

Dataplex Lineage & Catalog

Merlin AI & BigQuery ML

On-Premise & Air-Gapped

From Legacy ETL to Cloud Data Fusion Pipelines — Automatically

SQL-First Approach

Idiomatic CDAP Approach

Source-Aware Conversion

Pipeline JSON Generation

Hybrid SQL + CDAP Pipelines

Batch & Real-Time Modes

Lineage to Dataplex

Validation & Parity Testing

Which approach does MigryX choose?

Quantifiable Value — On Google Cloud

Custom parsers vs. generic BigQuery migration tooling

Ready to land on BigQuery?

Migrate Everything
to BigQuery.