User Guide

Understanding Reconlify

Reconlify compares two files and tells you exactly how they differ. It is designed for cases where a raw diff is not enough — when you need to match rows by key, tolerate small numeric rounding, normalize formatting, or map columns between systems that use different schemas.

What Reconlify solves

Standard diff tools compare files line by line. They do not understand structure. If a target file has the same rows in a different order, diff flags every line. If two systems use different column names for the same data, diff cannot pair them. If amounts differ by a fraction of a cent due to rounding, diff treats that the same as a completely wrong value.

Reconlify handles all of this:

Key-based matching — rows are paired by business key, not by position. Row order does not matter.
Column mapping — source and target can use different column names. Reconlify translates between them.
Numeric tolerance — small rounding differences pass without being flagged.
String rules and normalization — formatting noise (whitespace, casing, date formats, split fields) is cleaned up before comparison.
Structured JSON report — every run produces a machine-readable report with counts, metadata, and sample differences.

All processing happens locally. No data leaves your machine.

Comparison modes

Tabular mode compares CSV and TSV files. Rows are matched by one or more key columns and classified as matching, mismatched, missing in target, or missing in source. This is the mode for structured data: migrations, financial reconciliations, report validation.

Text mode compares plain text files. Lines are compared by position (line_by_line) or as unordered sets (unordered_lines). Regex rules can normalize timestamps, request IDs, and other variable content before comparison. This is the mode for logs, CLI output, and generated text.

Row classification

In tabular mode, every row falls into one of four categories:

Category	Meaning
Match	Key exists on both sides, all compared values are equal
Mismatch	Key exists on both sides, one or more values differ
Missing in target	Key exists in source but not in target
Missing in source	Key exists in target but not in source

The report counts each category and provides the differences.

Writing your first config

If you have not written a config before, start with the Quick Start — it walks through installation, example data, and your first run.

This section covers the config structure and the decisions you need to make for each comparison.

Config structure

Every config is a YAML file with a fixed structure:

type: tabular          # or "text"
source: source.csv     # path to source file
target: target.csv     # path to target file
keys:                  # columns that uniquely identify a row
  - order_id

These four fields are the minimum for a tabular comparison. Everything else is optional and adds precision.

Choosing keys

Keys determine how rows are matched. Pick columns that uniquely identify a record on both sides.

Single key — use when one column is unique:

keys:
  - order_id

Composite key — use when no single column is unique. For example, a customer who exists in multiple regions:

keys:
  - customer_id
  - region

Reconlify requires keys to be unique within each file. If duplicates exist, the run fails with an error identifying the duplicate key values.

Adding column mapping

When source and target use different column names, add column_mapping to tell Reconlify which columns correspond:

column_mapping:
  order_id: transaction_id
  amount: total_amount

The left side is the source column name (the logical name). The right side is the target column name. Every other config section uses logical names.

See Column Mapping for a full walkthrough with examples.

Adding tolerance

When numeric values may differ slightly due to rounding:

tolerance:
  amount: 0.05

Values within the threshold are treated as equal. See Financial Reconciliation for a worked example showing how tolerance separates rounding noise from real discrepancies.

Adding string rules

When string values need cleanup before comparison:

string_rules:
  counterparty:
    - trim
    - case_insensitive
  reference_id:
    - regex_extract:
        pattern: "REF-(\\d+)"
        group: 1

Each column can have its own set of rules. See Normalization and Rules for examples of each rule type.

Adding normalization

When the source and target represent the same data in structurally different ways — for example, split name fields vs a combined name field:

normalization:
  full_name:
    - op: concat
      args: [first_name, " ", last_name]
    - op: trim

This creates a derived column on the source side before comparison. See Data Migration for a full example using normalization in a migration validation workflow.

Controlling which columns are compared

By default, Reconlify compares all columns present in both files (after mapping). You can narrow or exclude columns:

compare:
  include_columns:     # compare only these columns
    - amount
    - status
 
ignore_columns:        # skip these columns entirely
  - created_at
  - updated_at

Use include_columns when you only care about specific fields. Use ignore_columns when most columns matter but a few (like timestamps) should be skipped.

Text mode config

Text mode uses a different set of options:

type: text
source: app_before.log
target: app_after.log
mode: line_by_line
 
normalize:
  trim_lines: true
  collapse_whitespace: true
 
replace_regex:
  - pattern: "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}"
    replace: "<TS>"
 
drop_lines_regex:
  - "^DEBUG"

See Log Comparison for a complete walkthrough of both line_by_line and unordered_lines modes.

For the full list of config options, see the YAML Config Reference.

Running comparisons

Basic run

reconlify run config.yaml

Reconlify prints a summary to the terminal and writes a detailed JSON report to report.json.

Custom output path

reconlify run config.yaml --out results/round1.json

Exit codes

Code	Meaning
0	No differences found
1	Differences found
2	Error (bad config, missing file, etc.)

Exit code 1 is not an error. It means the comparison completed successfully and detected differences. This makes Reconlify easy to integrate into CI/CD pipelines — a non-zero exit code signals that the check did not pass.

CI/CD integration

Because the same config and input files always produce the same output, Reconlify fits naturally into automated workflows:

reconlify run config.yaml --out report.json
if [ $? -eq 1 ]; then
  echo "Differences detected — see report.json"
  exit 1
fi

Save reports with timestamped filenames to maintain an audit trail across runs.

Reading reports

The report is a JSON file with a consistent structure across all comparison types. For the full field reference, see the Report Format Reference.

Summary

The top-level summary section gives you the overall picture:

{
  "source_rows": 6,
  "target_rows": 5,
  "missing_in_target": 1,
  "missing_in_source": 0,
  "rows_with_mismatches": 1,
  "mismatched_cells": 2
}

Start here. If all counts are zero, the datasets match. If not, the numbers tell you what kind of differences to investigate.

Details

The details section records what was compared and how:

keys — which columns were used for matching
column_mapping — which columns were renamed
filters_applied — which rows were excluded before comparison
column_stats — per-column mismatch counts

column_stats is especially useful for large datasets. If 500 rows have mismatches but column_stats shows all of them are in the amount column, you know exactly where to focus.

Samples

The samples section provides example rows for each category of difference:

mismatches — rows that exist on both sides but have different values, showing source and target values per column
missing_in_target — keys that exist in source but not in target
missing_in_source — keys that exist in target but not in source

The report includes all detected differences — not a sampled subset.

Reconlify does not apply sampling limits when generating reports. All detected differences are included in the report JSON output. For very large datasets this may result in large report files. Consumers such as the Reconlify Desktop may paginate or filter displayed results, but the CLI itself always emits the full evidence set. This behavior might be subject to change in future.

Understanding text mode report samples

Text mode reports use two different sample structures depending on the comparison mode.

samples contains individual lines where the source and target differ. This is the primary output for line_by_line mode. Each entry shows the line number, the raw content from both files, and the processed content after normalization and regex replacements:

{
  "line_number_source": 9,
  "line_number_target": 9,
  "raw_source": "2026-03-01 10:00:04 [INFO] Response sent: 200 OK (95ms)",
  "raw_target": "2026-03-09 14:22:13 [WARN] Response sent: 404 Not Found (52ms)",
  "processed_source": "<TS> [INFO] Response sent: 200 OK (<DUR>)",
  "processed_target": "<TS> [WARN] Response sent: 404 Not Found (<DUR>)"
}

The raw_* fields show what is in the original files. The processed_* fields show what Reconlify actually compared — after timestamps were replaced with <TS> and durations with <DUR>. The difference is in the processed values: [INFO] Response sent: 200 OK vs [WARN] Response sent: 404 Not Found.

samples_agg is an aggregated view that shows how many times each distinct line appears in the source and target. This is the primary output for unordered_lines mode, where line order does not matter and the same line may appear many times.

{
  "line": "ERROR connection failed",
  "source_count": 5,
  "target_count": 3,
  "source_line_numbers": [12, 45, 78, 110, 143],
  "target_line_numbers": [8, 51, 99]
}

This means the line "ERROR connection failed" appears five times in the source log and three times in the target log. The difference of two occurrences is reported as a mismatch. The source_line_numbers and target_line_numbers fields point to the exact lines in the original files where each occurrence was found.

In line_by_line mode, samples_agg is empty. In unordered_lines mode, samples is empty. Each mode uses only its corresponding sample structure.

See Log Comparison for complete examples of both modes, and the Report Format Reference for the full field reference.

Debugging differences

When the report shows unexpected results, work through these steps.

Too many mismatches

If most rows are flagged as mismatches, the comparison rules are probably too strict for the data.

Formatting noise. Check if mismatches are caused by whitespace, casing, or date format differences. Add the appropriate string rules or normalization:

Extra spaces → trim or compare.trim_whitespace
Different casing → case_insensitive
Different date formats → date_format normalization
Different column names → column_mapping

Rounding differences. If numeric columns show small differences (0.01, 0.001), add tolerance for those columns.

Check column_stats. If mismatches are concentrated in one or two columns, the fix is usually a targeted rule on those columns rather than a broad change.

Missing rows you expected to match

If rows appear as "missing in target" when they should match:

Check your keys. The most common cause is a key mismatch. If the source key is "ORD-2025-100" and the target key is "100", the rows will not pair. Use regex_extract on the key column or verify your key column choice.

Check column mapping. If the key column has a different name in the target, it must be listed in column_mapping.

Check for duplicates. If a key appears more than once in either file, Reconlify rejects the run with an error. Investigate why duplicates exist — you may need a composite key.

Zero mismatches but data looks wrong

If the report shows zero differences but you know the data differs:

Check ignore_columns. A column listed in ignore_columns is excluded from comparison entirely. Verify you are not accidentally ignoring a column that matters.

Check tolerance values. A tolerance that is too wide will accept differences you intended to catch. Start tight (0.01) and widen only for known rounding behavior.

Check include_columns. If you specified include_columns, only those columns are compared. Other columns are silently skipped.

Config errors

Exit code 2 means the config or input files have a problem. Common causes:

Missing file — the source or target path does not exist or is misspelled
Missing key column — a column listed in keys does not exist in the file (check for typos and verify column_mapping if the target uses a different name)
Duplicate keys — a key value appears more than once in a file (add another column to keys for a composite key, or clean the data)
Invalid YAML — indentation or syntax errors in the config file

The error message identifies the specific issue. Fix it and re-run.

Next steps

Quick Start — installation and first run
Column Mapping — comparing files with different column names
Normalization and Rules — string rules and source-side pipelines
Data Migration — migration validation workflow
Financial Reconciliation — tolerance and string rules for financial data
Log Comparison — text mode for logs and CLI output
YAML Config Reference — full configuration options
Report Format Reference — full report structure