Data Migration
The problem
Your company is migrating from a legacy CRM to a new platform. The migration team has run the import scripts, and now you need to answer a simple question: did every customer record arrive correctly?
This is harder than it sounds. The legacy system and new platform use
different schemas, different column names, and different conventions. The
new system assigns its own timestamps. Some customers exist in multiple
regions, so a single customer_id is not enough to uniquely identify a
row — you need a composite key.
You cannot just diff the files line by line. Row order differs, columns are renamed, and timestamps will never match. You need a tool that understands structure.
The data exports
Before cutover, you export the full customer table from the legacy CRM. After import, you export the same data from the new platform.
legacy_customers.csv (source)
customer_id,region,first_name,last_name,email,plan,balance,created_at
1001,US,Alice,Smith,alice@example.com,pro,120.00,2023-06-15T09:30:00Z
1001,EU,Alice,Smith,alice.eu@example.com,basic,45.00,2023-06-15T09:30:00Z
1002,US,Bob,Jones,bob@corp.io,enterprise,950.00,2023-07-01T10:00:00Z
1003,US,Carol,Lee,carol@example.com,pro,0.00,2023-08-20T10:30:00Z
1004,US,Dan,Wu,dan@corp.io,basic,75.50,2023-09-10T11:00:00Znew_customers.csv (target)
cust_id,region,name,email,tier,current_balance,migrated_at
1001,US,Alice Smith,alice@example.com,pro,120.00,2025-03-01T08:00:00Z
1001,EU,Alice Smith,alice.eu@example.com,basic,45.00,2025-03-01T08:00:00Z
1002,US,Bob Jones,bob@corp.io,enterprise,950.00,2025-03-01T08:05:00Z
1004,US,Dan Wu,dan@corp.io,basic,75.50,2025-03-01T08:20:00Z
1005,US,Eve Park,eve@newco.com,trial,0.00,2025-03-01T08:25:00ZDifferences you would expect to find:
- Carol (1003/US) is missing from the new platform
- Eve (1005/US) is unexpected — not in the legacy export
- Column names differ —
customer_id→cust_id,plan→tier,balance→current_balance - Names are split in legacy (
first_name,last_name) but combined in the new platform (name) - Timestamps are irrelevant —
created_atvsmigrated_atwill always differ
The validation config
type: tabular
source: legacy_customers.csv
target: new_customers.csv
keys:
- customer_id
- region
column_mapping:
customer_id: cust_id
full_name: name
plan: tier
balance: current_balance
ignore_columns:
- created_at
- first_name
- last_name
normalization:
full_name:
- op: concat
args: [first_name, " ", last_name]
- op: trimWhy this config works
Composite keys — customer_id + region uniquely identifies each row.
Alice has two records (US and EU), so a single-column key would create
duplicates. Reconlify requires unique keys and will error if they are not.
Column mapping — translates between legacy and new-platform column names.
The config uses logical (source-side) names everywhere: keys says
customer_id, not cust_id.
Normalization — generates a full_name column on the source side by
concatenating first_name and last_name. This derived field is compared
against the target's name column. Without this, Reconlify would have no
way to compare split source names against the combined target field.
ignore_columns — excludes created_at (timestamps always differ between
exports), first_name, and last_name (replaced by the generated
full_name).
Running the validation
reconlify run config.yamlReconlify prints a summary and writes the full report to report.json. The
exit code is 1, meaning differences were found.
To save the report to a specific location:
reconlify run config.yaml --out migration-validation/round1.jsonInterpreting the results
Open report.json. The summary tells you the migration status at a glance:
{
"source_rows": 5,
"target_rows": 5,
"missing_in_target": 1,
"missing_in_source": 1,
"rows_with_mismatches": 0,
"mismatched_cells": 0
}What each number means for your migration
missing_in_target: 1 — one legacy record did not make it to the new
platform. The samples.missing_in_target section identifies it as Carol
(1003/US). This is the most critical finding — it means data was lost.
Investigate whether the import script failed, whether a filter excluded her,
or whether her record was rejected by validation rules.
missing_in_source: 1 — one record exists in the new platform that was not in the legacy export. The samples identify Eve (1005/US). This could be test data, a record created directly in the new system, or a duplicate from a previous import run. Either way, it needs review.
rows_with_mismatches: 0 — every record that exists on both sides has identical values (after column mapping and normalization). Alice's two records, Bob, and Dan all match.
The migration checklist
For a clean migration, you want:
| Metric | Expected | This run |
|---|---|---|
| missing_in_target | 0 | 1 |
| missing_in_source | 0 | 1 |
| rows_with_mismatches | 0 | 0 |
This migration is not yet clean. Fix the Carol import issue, investigate the Eve record, re-export, and run the validation again.
The validation workflow
Migration validation is rarely one-and-done. The typical workflow is:
- Export both systems after import
- Run Reconlify with your validation config
- Review the report — focus on missing rows first, then mismatches
- Fix import issues or data quality problems
- Re-export and re-run until the report shows zero differences
Save each report with a timestamped filename so you have an audit trail of each validation round.
Common migration validation scenarios
- Pre/post cutover — compare the last legacy export against the first new-system export
- Phased migration — run periodic comparisons to track progress as batches are migrated
- ETL pipeline regression — confirm that pipeline changes did not introduce data loss
- SQL refactor validation — verify that rewritten queries produce the same output
- Environment promotion — confirm staging and production datasets match after deployment
Related docs
- Column Mapping — how logical names and target-side resolution work
- Normalization and Rules — all available transform operations for source-side pipelines
- Report Format — full reference for the JSON report structure