YAML Configuration
Every Reconlify comparison is driven by a YAML config file. This page documents every field with a description, a minimal example, and common mistakes.
Minimal config
The smallest valid tabular config:
type: tabular
source: source.csv
target: target.csv
keys:
- idThis compares two CSV files, matches rows by id, and checks all remaining
columns for differences.
type
The comparison mode.
type: tabular| Value | Description |
|---|---|
tabular |
CSV/TSV comparison with key-based row matching |
text |
Plain text comparison, line by line or unordered |
Common mistakes:
- Omitting
typeentirely. It is required — Reconlify does not infer the mode from the file extension. - Using
type: csv. The correct value istabular.
source / target
Paths to the files being compared. Resolved relative to the working directory
where you run reconlify run.
source: exports/bank_statement.csv
target: exports/erp_transactions.csvCommon mistakes:
- Using absolute paths that work on your machine but break in CI. Prefer relative paths from the project root.
- Swapping source and target. The report labels "missing in target" and "missing in source" depend on which file is which. Convention: the authoritative or expected file is the source.
keys
One or more columns that uniquely identify a row. Required for tabular mode.
Single key:
keys:
- order_idComposite key — use when no single column is unique:
keys:
- customer_id
- regionRows with the same key values are paired and compared. Rows that exist in only one file are reported as missing.
Common mistakes:
- Choosing a column that is not unique. If the same key appears twice in either file, Reconlify returns an error. Add more columns to form a composite key.
- Forgetting to map the key column. If the source key is
order_idbut the target calls itid, you needcolumn_mapping— otherwise Reconlify cannot find the key in the target. - Using a column with null values as a key. Nulls cannot uniquely identify rows and will cause matching errors.
column_mapping
Translates between source and target column names. The left side is the source column name (the logical name used everywhere else in the config). The right side is the target column name.
column_mapping:
order_id: transaction_id
amount: total_amountUnmapped columns are looked up by their source name in the target file. You only need to map columns whose names differ.
Common mistakes:
- Writing the mapping backwards. The source name goes on the left:
source_col: target_col. Writingtarget_col: source_colwill fail because neither name matches the actual file headers. - Referencing target column names in other config sections. After defining a
mapping, use the logical (source-side) name everywhere — in
keys,tolerance,string_rules,filters, andignore_columns. - Mapping a column that has the same name in both files. This is harmless but unnecessary.
See Column Mapping for a full walkthrough.
compare
Global settings that control how all columns are compared.
compare:
trim_whitespace: true
case_insensitive: false
normalize_nulls: ["", "NULL", "null", "N/A"]| Field | Default | Description |
|---|---|---|
trim_whitespace |
true |
Strip leading and trailing spaces from all values |
case_insensitive |
false |
Ignore upper/lower case across all columns |
normalize_nulls |
[] |
Treat these string values as equivalent to null |
Common mistakes:
- Setting
case_insensitive: trueglobally when only one column needs it. This makes every column case-insensitive, including codes and IDs where casing may matter. Usestring_rulesfor per-column control. - Assuming
trim_whitespaceis off by default. It is on — values are trimmed unless you explicitly set it tofalse.
include_columns
Compare only the listed columns. All other non-key columns are ignored.
compare:
include_columns:
- amount
- statusCommon mistakes:
- Listing key columns in
include_columns. Keys are always used for matching — they do not need to be included here. - Using
include_columnsandexclude_columnstogether. While technically valid, this is confusing. Pick one approach.
exclude_columns
Compare all common columns except the listed ones.
compare:
exclude_columns:
- debug_field
- internal_notesCommon mistakes:
- Excluding a key column. Keys are used for matching, not for value comparison — they are already excluded from the diff by default.
ignore_columns
A top-level shorthand for excluding columns from comparison. Functionally
equivalent to compare.exclude_columns but defined outside the compare
block.
ignore_columns:
- created_at
- updated_at
- internal_idUse this for volatile fields like timestamps or system-generated IDs that always differ between exports but are not meaningful.
Common mistakes:
- Ignoring a column you actually need to compare. If a report shows zero mismatches but the data looks wrong, check whether the column is listed here.
- Forgetting to ignore the original columns after generating a normalized
replacement. If you use
normalizationto createfull_namefromfirst_nameandlast_name, add both originals toignore_columns— otherwise they are compared too.
tolerance
Allow small numeric differences without flagging them as mismatches. Specify a threshold per column.
tolerance:
amount: 0.01
balance: 0.05Values within the threshold are treated as equal. If both values are numeric, Reconlify compares the absolute difference against the tolerance. If either value is non-numeric, it falls back to exact string comparison.
Common mistakes:
- Setting tolerance too wide. A tolerance of
1.00on a financial amount column will hide real discrepancies. Start tight (0.01) and widen only for known rounding behavior. - Applying tolerance to non-numeric columns. Tolerance only works on numeric values — it has no effect on strings.
- Using tolerance as a substitute for rounding normalization. Tolerance
checks
|source - target| <= threshold. If you need to round values to a specific precision before comparing, use theroundnormalization operation instead.
See Financial Reconciliation for a worked example.
string_rules
Per-column transformations applied before comparing string values. Rules are applied in the order listed.
string_rules:
customer_name:
- trim
- case_insensitive
order_ref:
- regex_extract:
pattern: "ORD-\\d{4}-(\\d+)"
group: 1
product_label:
- containsAvailable rules
trim — strip leading and trailing whitespace.
string_rules:
vendor_name:
- trimcase_insensitive — ignore upper/lower case differences.
string_rules:
status:
- case_insensitiveregex_extract — extract a regex capture group before comparing. Requires
pattern (a regex with at least one capture group) and optionally group
(default: 1).
string_rules:
reference_id:
- regex_extract:
pattern: "REF-(\\d+)"
group: 1contains — match if either value contains the other as a substring, instead of requiring exact equality.
string_rules:
description:
- containsCommon mistakes:
- Applying
case_insensitiveas a string rule when you want it globally. If every column should be case-insensitive, usecompare.case_insensitiveinstead. - Forgetting to double-escape backslashes in regex patterns. YAML requires
\\d, not\d. A single backslash is interpreted as a YAML escape sequence. - Using
containson key columns. This makes matching very loose — a value of"A"would match"BA". Useregex_extractfor precise key normalization.
See Normalization and Rules for worked examples of each rule type.
filters
Remove rows from comparison before matching. Excluded rows are tracked in the report for audit purposes.
exclude_keys
Remove specific rows by their exact key values:
filters:
exclude_keys:
- { order_id: "TEST-001" }
- { order_id: "STAGING-999" }Each entry must include all key columns. For composite keys:
filters:
exclude_keys:
- { customer_id: "1001", region: "TEST" }Matching rows are removed from both source and target.
Common mistakes:
- Omitting a key column in a composite key entry. If your keys are
[customer_id, region], each exclude entry must specify both. An entry with onlycustomer_idwill not match anything. - Using exclude_keys for broad filtering. If you need to exclude many rows
by a condition (e.g., all cancelled orders), use
row_filtersinstead.
row_filters
Remove rows based on column conditions:
filters:
row_filters:
apply_to: both
mode: exclude
rules:
- column: status
op: equals
value: "cancelled"apply_to — which sides are filtered:
| Value | Description |
|---|---|
both (default) |
Filter source and target |
source |
Filter only source rows |
target |
Filter only target rows |
mode — the filter logic:
| Value | Description |
|---|---|
exclude (default) |
Remove rows matching all rules |
include |
Keep only rows matching all rules |
Supported filter operators
| Operator | Required field | Description |
|---|---|---|
equals |
value |
Column equals the value |
not_equals |
value |
Column does not equal the value |
in |
values (list) |
Column value is in the list |
contains |
value |
Column contains the substring |
regex |
pattern |
Column matches the regex pattern |
is_null |
— | Column is null or empty |
not_null |
— | Column is not null |
Example with multiple rules:
filters:
row_filters:
apply_to: source
mode: include
rules:
- column: status
op: in
values: ["active", "pending"]
- column: amount
op: not_nullThis keeps only source rows where status is "active" or "pending" and
amount is not null. All other source rows are excluded.
Common mistakes:
- Confusing
excludeandincludemodes. Withmode: exclude, matching rows are removed. Withmode: include, matching rows are kept and everything else is removed. - Using
apply_to: sourcewhen you wantboth. If cancelled orders exist on both sides and you only filter the source, the target's cancelled rows will appear as "missing in source". - Forgetting that rules are combined with AND logic. All rules must match for a row to be affected. If you need OR logic, use multiple filter blocks.
Filters are applied in order: exclude_keys first, then row_filters. Both
run before duplicate-key validation and comparison.
normalization
Create computed columns on the source side before comparison. Each entry is a named pipeline where steps run in sequence.
normalization:
full_name:
- op: concat
args: [first_name, " ", last_name]
- op: trimThe first step receives its inputs from args (column names or string/numeric
literals). Each subsequent step operates on the result of the previous one.
Supported operations
| Operation | Args (first step) | Description |
|---|---|---|
concat |
col1, literal, col2, ... | String concatenation |
upper |
col | Convert to uppercase |
lower |
col | Convert to lowercase |
trim |
col | Strip whitespace |
substr |
col, start [, length] | Extract substring |
round |
col [, precision] | Round numeric value |
add |
col1, col2 | Add two numeric values |
sub |
col1, col2 | Subtract |
mul |
col1, col2 | Multiply |
div |
col1, col2 | Divide |
coalesce |
col1, col2, ... | First non-null value |
date_format |
col, from_fmt, to_fmt | Parse and reformat a date |
map |
col, val, repl, ... | Map specific values to replacements |
Map example — convert short codes to labels:
normalization:
status:
- op: map
args: [status_code, "A", "ACTIVE", "I", "INACTIVE", "S", "SUSPENDED"]Date format example — align date representations:
normalization:
event_date:
- op: date_format
args: [event_date, "%Y-%m-%d", "%d/%m/%Y"]Arithmetic example — compute a derived value:
normalization:
total:
- op: mul
args: [quantity, unit_price]
- op: roundCommon mistakes:
- Referencing a generated column from another generated column. All
argsmust refer to original source columns or literals. Normalization entries are independent — they cannot chain across pipelines. - Forgetting to ignore the original columns. If you generate
full_namefromfirst_nameandlast_name, add both toignore_columns— otherwise they are compared against the target (where they may not exist). - Using normalization when
string_ruleswould suffice. If you only need trimming or case normalization on an existing column,string_rulesis simpler. Use normalization when you need to combine columns, remap values, or change data types.
See Normalization and Rules for worked examples and Data Migration for normalization in a migration workflow.
csv
Configure how CSV files are parsed.
csv:
delimiter: "\t"
header: true
encoding: utf-8| Field | Default | Description |
|---|---|---|
delimiter |
"," |
Field delimiter character |
header |
true |
Whether the first row is a header |
encoding |
utf-8 |
File encoding (only UTF-8 is supported) |
Common mistakes:
- Forgetting to set
delimiter: "\t"for TSV files. The default is comma — a TSV file parsed with a comma delimiter produces a single column per row. - Setting
header: falsewithout providing column names. If your file has no header row, Reconlify generates default column names (col_0,col_1, etc.), which must be used inkeysand other config sections.
output
Control what the report includes.
output:
include_row_samples: true
include_column_stats: true| Field | Default | Description |
|---|---|---|
include_row_samples |
true |
Include sample rows showing differences |
include_column_stats |
true |
Include per-column mismatch counts |
Common mistakes:
- Setting
include_row_samples: falseand then wondering why the report has no sample data. The summary counts are still present, but individual row examples are omitted.
Set include_row_samples: false for summary-only reports in automated
pipelines where you only need pass/fail status.
Text mode
Text mode compares plain text files. It uses a different set of config fields than tabular mode.
Minimal text config
type: text
source: expected.log
target: actual.logmode
The comparison strategy for text files.
mode: line_by_line| Value | Description |
|---|---|
line_by_line (default) |
Compare lines by position — line 1 to line 1, line 2 to line 2 |
unordered_lines |
Compare the set of lines regardless of order — count occurrences of each distinct line |
Common mistakes:
- Using
line_by_linefor logs from parallel workers. If the same lines appear but in different order, every line flags as a difference. Switch tounordered_lines. - Using
unordered_lineswhen order matters. If a log should produce events in a specific sequence,unordered_lineswill miss ordering regressions.
See Log Comparison for examples of both modes.
normalize
Normalization options clean up lines before comparison. Applied in a fixed order: normalize newlines, trim, collapse whitespace, case conversion, then blank line removal.
normalize:
normalize_newlines: true
trim_lines: true
collapse_whitespace: true
case_insensitive: false
ignore_blank_lines: false| Field | Default | Description |
|---|---|---|
normalize_newlines |
true |
Convert CRLF to LF |
trim_lines |
false |
Strip leading/trailing whitespace per line |
collapse_whitespace |
false |
Replace consecutive spaces with a single space |
case_insensitive |
false |
Convert all lines to lowercase before comparing |
ignore_blank_lines |
false |
Drop empty lines after other normalization |
Common mistakes:
- Assuming
trim_linesis on by default. Unlike tabular mode'strim_whitespace, text mode'strim_linesdefaults tofalse. - Enabling
collapse_whitespacefor indentation-sensitive files. This replaces all runs of whitespace with a single space, which destroys indentation structure.
replace_regex
Substitute matching patterns before comparison. Rules are applied sequentially — the output of one rule is the input to the next.
replace_regex:
- pattern: "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}"
replace: "<TS>"
- pattern: "req-[a-z0-9]+"
replace: "req-<ID>"
- pattern: "\\d+ms"
replace: "<DUR>"Use this to normalize timestamps, UUIDs, request IDs, durations, and other values that change between runs.
Common mistakes:
- Forgetting to double-escape backslashes.
\\din YAML becomes the regex\d. A single backslash (\d) is a YAML escape sequence and will not match digits. - Writing an overly broad pattern. A pattern like
\\d+replaces every number in every line, including meaningful values like error codes or counts. - Ordering rules incorrectly. If rule A replaces timestamps and rule B replaces a pattern that includes timestamps, put rule A first. Rules apply sequentially.
drop_lines_regex
Remove entire lines that match any pattern. Matching is checked after
normalize and replace_regex have been applied.
drop_lines_regex:
- "^DEBUG"
- "^\\s*$"
- "^#"Common mistakes:
- Dropping lines that contain useful data. A pattern like
"error"would drop any line containing "error" — including lines you want to compare. Use anchored patterns (^DEBUG) to be precise. - Expecting drop to apply before replace_regex. The order is: normalize, then replace_regex, then drop_lines_regex. A line is dropped based on its content after replacements have been applied.