User Guide
Understanding Reconlify
Reconlify compares two files and tells you exactly how they differ. It is
designed for cases where a raw diff is not enough — when you need to match
rows by key, tolerate small numeric rounding, normalize formatting, or map
columns between systems that use different schemas.
What Reconlify solves
Standard diff tools compare files line by line. They do not understand structure. If a target file has the same rows in a different order, diff flags every line. If two systems use different column names for the same data, diff cannot pair them. If amounts differ by a fraction of a cent due to rounding, diff treats that the same as a completely wrong value.
Reconlify handles all of this:
- Key-based matching — rows are paired by business key, not by position. Row order does not matter.
- Column mapping — source and target can use different column names. Reconlify translates between them.
- Numeric tolerance — small rounding differences pass without being flagged.
- String rules and normalization — formatting noise (whitespace, casing, date formats, split fields) is cleaned up before comparison.
- Structured JSON report — every run produces a machine-readable report with counts, metadata, and sample differences.
All processing happens locally. No data leaves your machine.
Comparison modes
Tabular mode compares CSV and TSV files. Rows are matched by one or more key columns and classified as matching, mismatched, missing in target, or missing in source. This is the mode for structured data: migrations, financial reconciliations, report validation.
Text mode compares plain text files. Lines are compared by position
(line_by_line) or as unordered sets (unordered_lines). Regex rules can
normalize timestamps, request IDs, and other variable content before
comparison. This is the mode for logs, CLI output, and generated text.
Row classification
In tabular mode, every row falls into one of four categories:
| Category | Meaning |
|---|---|
| Match | Key exists on both sides, all compared values are equal |
| Mismatch | Key exists on both sides, one or more values differ |
| Missing in target | Key exists in source but not in target |
| Missing in source | Key exists in target but not in source |
The report counts each category and provides the differences.
Writing your first config
If you have not written a config before, start with the Quick Start — it walks through installation, example data, and your first run.
This section covers the config structure and the decisions you need to make for each comparison.
Config structure
Every config is a YAML file with a fixed structure:
type: tabular # or "text"
source: source.csv # path to source file
target: target.csv # path to target file
keys: # columns that uniquely identify a row
- order_idThese four fields are the minimum for a tabular comparison. Everything else is optional and adds precision.
Choosing keys
Keys determine how rows are matched. Pick columns that uniquely identify a record on both sides.
Single key — use when one column is unique:
keys:
- order_idComposite key — use when no single column is unique. For example, a customer who exists in multiple regions:
keys:
- customer_id
- regionReconlify requires keys to be unique within each file. If duplicates exist, the run fails with an error identifying the duplicate key values.
Adding column mapping
When source and target use different column names, add column_mapping to
tell Reconlify which columns correspond:
column_mapping:
order_id: transaction_id
amount: total_amountThe left side is the source column name (the logical name). The right side is the target column name. Every other config section uses logical names.
See Column Mapping for a full walkthrough with examples.
Adding tolerance
When numeric values may differ slightly due to rounding:
tolerance:
amount: 0.05Values within the threshold are treated as equal. See Financial Reconciliation for a worked example showing how tolerance separates rounding noise from real discrepancies.
Adding string rules
When string values need cleanup before comparison:
string_rules:
counterparty:
- trim
- case_insensitive
reference_id:
- regex_extract:
pattern: "REF-(\\d+)"
group: 1Each column can have its own set of rules. See Normalization and Rules for examples of each rule type.
Adding normalization
When the source and target represent the same data in structurally different ways — for example, split name fields vs a combined name field:
normalization:
full_name:
- op: concat
args: [first_name, " ", last_name]
- op: trimThis creates a derived column on the source side before comparison. See Data Migration for a full example using normalization in a migration validation workflow.
Controlling which columns are compared
By default, Reconlify compares all columns present in both files (after mapping). You can narrow or exclude columns:
compare:
include_columns: # compare only these columns
- amount
- status
ignore_columns: # skip these columns entirely
- created_at
- updated_atUse include_columns when you only care about specific fields. Use
ignore_columns when most columns matter but a few (like timestamps) should
be skipped.
Text mode config
Text mode uses a different set of options:
type: text
source: app_before.log
target: app_after.log
mode: line_by_line
normalize:
trim_lines: true
collapse_whitespace: true
replace_regex:
- pattern: "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}"
replace: "<TS>"
drop_lines_regex:
- "^DEBUG"See Log Comparison for a complete walkthrough of both
line_by_line and unordered_lines modes.
For the full list of config options, see the YAML Config Reference.
Running comparisons
Basic run
reconlify run config.yamlReconlify prints a summary to the terminal and writes a detailed JSON report
to report.json.
Custom output path
reconlify run config.yaml --out results/round1.jsonExit codes
| Code | Meaning |
|---|---|
| 0 | No differences found |
| 1 | Differences found |
| 2 | Error (bad config, missing file, etc.) |
Exit code 1 is not an error. It means the comparison completed successfully and detected differences. This makes Reconlify easy to integrate into CI/CD pipelines — a non-zero exit code signals that the check did not pass.
CI/CD integration
Because the same config and input files always produce the same output, Reconlify fits naturally into automated workflows:
reconlify run config.yaml --out report.json
if [ $? -eq 1 ]; then
echo "Differences detected — see report.json"
exit 1
fiSave reports with timestamped filenames to maintain an audit trail across runs.
Reading reports
The report is a JSON file with a consistent structure across all comparison types. For the full field reference, see the Report Format Reference.
Summary
The top-level summary section gives you the overall picture:
{
"source_rows": 6,
"target_rows": 5,
"missing_in_target": 1,
"missing_in_source": 0,
"rows_with_mismatches": 1,
"mismatched_cells": 2
}Start here. If all counts are zero, the datasets match. If not, the numbers tell you what kind of differences to investigate.
Details
The details section records what was compared and how:
- keys — which columns were used for matching
- column_mapping — which columns were renamed
- filters_applied — which rows were excluded before comparison
- column_stats — per-column mismatch counts
column_stats is especially useful for large datasets. If 500 rows have
mismatches but column_stats shows all of them are in the amount column,
you know exactly where to focus.
Samples
The samples section provides example rows for each category of difference:
- mismatches — rows that exist on both sides but have different values, showing source and target values per column
- missing_in_target — keys that exist in source but not in target
- missing_in_source — keys that exist in target but not in source
The report includes all detected differences — not a sampled subset.
Reconlify does not apply sampling limits when generating reports. All detected differences are included in the report JSON output. For very large datasets this may result in large report files. Consumers such as the Reconlify Desktop may paginate or filter displayed results, but the CLI itself always emits the full evidence set. This behavior might be subject to change in future.
Understanding text mode report samples
Text mode reports use two different sample structures depending on the comparison mode.
samples contains individual lines where the source and target differ.
This is the primary output for line_by_line mode. Each entry shows the
line number, the raw content from both files, and the processed content after
normalization and regex replacements:
{
"line_number_source": 9,
"line_number_target": 9,
"raw_source": "2026-03-01 10:00:04 [INFO] Response sent: 200 OK (95ms)",
"raw_target": "2026-03-09 14:22:13 [WARN] Response sent: 404 Not Found (52ms)",
"processed_source": "<TS> [INFO] Response sent: 200 OK (<DUR>)",
"processed_target": "<TS> [WARN] Response sent: 404 Not Found (<DUR>)"
}The raw_* fields show what is in the original files. The processed_*
fields show what Reconlify actually compared — after timestamps were replaced
with <TS> and durations with <DUR>. The difference is in the processed
values: [INFO] Response sent: 200 OK vs [WARN] Response sent: 404 Not Found.
samples_agg is an aggregated view that shows how many times each
distinct line appears in the source and target. This is the primary output
for unordered_lines mode, where line order does not matter and the same
line may appear many times.
{
"line": "ERROR connection failed",
"source_count": 5,
"target_count": 3,
"source_line_numbers": [12, 45, 78, 110, 143],
"target_line_numbers": [8, 51, 99]
}This means the line "ERROR connection failed" appears five times in the
source log and three times in the target log. The difference of two
occurrences is reported as a mismatch. The source_line_numbers and
target_line_numbers fields point to the exact lines in the original files
where each occurrence was found.
In line_by_line mode, samples_agg is empty. In unordered_lines mode,
samples is empty. Each mode uses only its corresponding sample structure.
See Log Comparison for complete examples of both modes, and the Report Format Reference for the full field reference.
Debugging differences
When the report shows unexpected results, work through these steps.
Too many mismatches
If most rows are flagged as mismatches, the comparison rules are probably too strict for the data.
Formatting noise. Check if mismatches are caused by whitespace, casing, or date format differences. Add the appropriate string rules or normalization:
- Extra spaces →
trimorcompare.trim_whitespace - Different casing →
case_insensitive - Different date formats →
date_formatnormalization - Different column names →
column_mapping
Rounding differences. If numeric columns show small differences (0.01,
0.001), add tolerance for those columns.
Check column_stats. If mismatches are concentrated in one or two columns,
the fix is usually a targeted rule on those columns rather than a broad
change.
Missing rows you expected to match
If rows appear as "missing in target" when they should match:
Check your keys. The most common cause is a key mismatch. If the source
key is "ORD-2025-100" and the target key is "100", the rows will not pair.
Use regex_extract on the key column or verify your key column choice.
Check column mapping. If the key column has a different name in the target,
it must be listed in column_mapping.
Check for duplicates. If a key appears more than once in either file, Reconlify rejects the run with an error. Investigate why duplicates exist — you may need a composite key.
Zero mismatches but data looks wrong
If the report shows zero differences but you know the data differs:
Check ignore_columns. A column listed in ignore_columns is excluded
from comparison entirely. Verify you are not accidentally ignoring a column
that matters.
Check tolerance values. A tolerance that is too wide will accept differences you intended to catch. Start tight (0.01) and widen only for known rounding behavior.
Check include_columns. If you specified include_columns, only those
columns are compared. Other columns are silently skipped.
Config errors
Exit code 2 means the config or input files have a problem. Common causes:
- Missing file — the source or target path does not exist or is misspelled
- Missing key column — a column listed in
keysdoes not exist in the file (check for typos and verifycolumn_mappingif the target uses a different name) - Duplicate keys — a key value appears more than once in a file (add
another column to
keysfor a composite key, or clean the data) - Invalid YAML — indentation or syntax errors in the config file
The error message identifies the specific issue. Fix it and re-run.
Next steps
- Quick Start — installation and first run
- Column Mapping — comparing files with different column names
- Normalization and Rules — string rules and source-side pipelines
- Data Migration — migration validation workflow
- Financial Reconciliation — tolerance and string rules for financial data
- Log Comparison — text mode for logs and CLI output
- YAML Config Reference — full configuration options
- Report Format Reference — full report structure