YAML Configuration

Every Reconlify comparison is driven by a YAML config file. This page documents every field with a description, a minimal example, and common mistakes.

Minimal config

The smallest valid tabular config:

type: tabular
source: source.csv
target: target.csv
keys:
  - id

This compares two CSV files, matches rows by id, and checks all remaining columns for differences.


type

The comparison mode.

type: tabular
Value Description
tabular CSV/TSV comparison with key-based row matching
text Plain text comparison, line by line or unordered

Common mistakes:

  • Omitting type entirely. It is required — Reconlify does not infer the mode from the file extension.
  • Using type: csv. The correct value is tabular.

source / target

Paths to the files being compared. Resolved relative to the working directory where you run reconlify run.

source: exports/bank_statement.csv
target: exports/erp_transactions.csv

Common mistakes:

  • Using absolute paths that work on your machine but break in CI. Prefer relative paths from the project root.
  • Swapping source and target. The report labels "missing in target" and "missing in source" depend on which file is which. Convention: the authoritative or expected file is the source.

keys

One or more columns that uniquely identify a row. Required for tabular mode.

Single key:

keys:
  - order_id

Composite key — use when no single column is unique:

keys:
  - customer_id
  - region

Rows with the same key values are paired and compared. Rows that exist in only one file are reported as missing.

Common mistakes:

  • Choosing a column that is not unique. If the same key appears twice in either file, Reconlify returns an error. Add more columns to form a composite key.
  • Forgetting to map the key column. If the source key is order_id but the target calls it id, you need column_mapping — otherwise Reconlify cannot find the key in the target.
  • Using a column with null values as a key. Nulls cannot uniquely identify rows and will cause matching errors.

column_mapping

Translates between source and target column names. The left side is the source column name (the logical name used everywhere else in the config). The right side is the target column name.

column_mapping:
  order_id: transaction_id
  amount: total_amount

Unmapped columns are looked up by their source name in the target file. You only need to map columns whose names differ.

Common mistakes:

  • Writing the mapping backwards. The source name goes on the left: source_col: target_col. Writing target_col: source_col will fail because neither name matches the actual file headers.
  • Referencing target column names in other config sections. After defining a mapping, use the logical (source-side) name everywhere — in keys, tolerance, string_rules, filters, and ignore_columns.
  • Mapping a column that has the same name in both files. This is harmless but unnecessary.

See Column Mapping for a full walkthrough.

compare

Global settings that control how all columns are compared.

compare:
  trim_whitespace: true
  case_insensitive: false
  normalize_nulls: ["", "NULL", "null", "N/A"]
Field Default Description
trim_whitespace true Strip leading and trailing spaces from all values
case_insensitive false Ignore upper/lower case across all columns
normalize_nulls [] Treat these string values as equivalent to null

Common mistakes:

  • Setting case_insensitive: true globally when only one column needs it. This makes every column case-insensitive, including codes and IDs where casing may matter. Use string_rules for per-column control.
  • Assuming trim_whitespace is off by default. It is on — values are trimmed unless you explicitly set it to false.

include_columns

Compare only the listed columns. All other non-key columns are ignored.

compare:
  include_columns:
    - amount
    - status

Common mistakes:

  • Listing key columns in include_columns. Keys are always used for matching — they do not need to be included here.
  • Using include_columns and exclude_columns together. While technically valid, this is confusing. Pick one approach.

exclude_columns

Compare all common columns except the listed ones.

compare:
  exclude_columns:
    - debug_field
    - internal_notes

Common mistakes:

  • Excluding a key column. Keys are used for matching, not for value comparison — they are already excluded from the diff by default.

ignore_columns

A top-level shorthand for excluding columns from comparison. Functionally equivalent to compare.exclude_columns but defined outside the compare block.

ignore_columns:
  - created_at
  - updated_at
  - internal_id

Use this for volatile fields like timestamps or system-generated IDs that always differ between exports but are not meaningful.

Common mistakes:

  • Ignoring a column you actually need to compare. If a report shows zero mismatches but the data looks wrong, check whether the column is listed here.
  • Forgetting to ignore the original columns after generating a normalized replacement. If you use normalization to create full_name from first_name and last_name, add both originals to ignore_columns — otherwise they are compared too.

tolerance

Allow small numeric differences without flagging them as mismatches. Specify a threshold per column.

tolerance:
  amount: 0.01
  balance: 0.05

Values within the threshold are treated as equal. If both values are numeric, Reconlify compares the absolute difference against the tolerance. If either value is non-numeric, it falls back to exact string comparison.

Common mistakes:

  • Setting tolerance too wide. A tolerance of 1.00 on a financial amount column will hide real discrepancies. Start tight (0.01) and widen only for known rounding behavior.
  • Applying tolerance to non-numeric columns. Tolerance only works on numeric values — it has no effect on strings.
  • Using tolerance as a substitute for rounding normalization. Tolerance checks |source - target| <= threshold. If you need to round values to a specific precision before comparing, use the round normalization operation instead.

See Financial Reconciliation for a worked example.

string_rules

Per-column transformations applied before comparing string values. Rules are applied in the order listed.

string_rules:
  customer_name:
    - trim
    - case_insensitive
  order_ref:
    - regex_extract:
        pattern: "ORD-\\d{4}-(\\d+)"
        group: 1
  product_label:
    - contains

Available rules

trim — strip leading and trailing whitespace.

string_rules:
  vendor_name:
    - trim

case_insensitive — ignore upper/lower case differences.

string_rules:
  status:
    - case_insensitive

regex_extract — extract a regex capture group before comparing. Requires pattern (a regex with at least one capture group) and optionally group (default: 1).

string_rules:
  reference_id:
    - regex_extract:
        pattern: "REF-(\\d+)"
        group: 1

contains — match if either value contains the other as a substring, instead of requiring exact equality.

string_rules:
  description:
    - contains

Common mistakes:

  • Applying case_insensitive as a string rule when you want it globally. If every column should be case-insensitive, use compare.case_insensitive instead.
  • Forgetting to double-escape backslashes in regex patterns. YAML requires \\d, not \d. A single backslash is interpreted as a YAML escape sequence.
  • Using contains on key columns. This makes matching very loose — a value of "A" would match "BA". Use regex_extract for precise key normalization.

See Normalization and Rules for worked examples of each rule type.

filters

Remove rows from comparison before matching. Excluded rows are tracked in the report for audit purposes.

exclude_keys

Remove specific rows by their exact key values:

filters:
  exclude_keys:
    - { order_id: "TEST-001" }
    - { order_id: "STAGING-999" }

Each entry must include all key columns. For composite keys:

filters:
  exclude_keys:
    - { customer_id: "1001", region: "TEST" }

Matching rows are removed from both source and target.

Common mistakes:

  • Omitting a key column in a composite key entry. If your keys are [customer_id, region], each exclude entry must specify both. An entry with only customer_id will not match anything.
  • Using exclude_keys for broad filtering. If you need to exclude many rows by a condition (e.g., all cancelled orders), use row_filters instead.

row_filters

Remove rows based on column conditions:

filters:
  row_filters:
    apply_to: both
    mode: exclude
    rules:
      - column: status
        op: equals
        value: "cancelled"

apply_to — which sides are filtered:

Value Description
both (default) Filter source and target
source Filter only source rows
target Filter only target rows

mode — the filter logic:

Value Description
exclude (default) Remove rows matching all rules
include Keep only rows matching all rules

Supported filter operators

Operator Required field Description
equals value Column equals the value
not_equals value Column does not equal the value
in values (list) Column value is in the list
contains value Column contains the substring
regex pattern Column matches the regex pattern
is_null Column is null or empty
not_null Column is not null

Example with multiple rules:

filters:
  row_filters:
    apply_to: source
    mode: include
    rules:
      - column: status
        op: in
        values: ["active", "pending"]
      - column: amount
        op: not_null

This keeps only source rows where status is "active" or "pending" and amount is not null. All other source rows are excluded.

Common mistakes:

  • Confusing exclude and include modes. With mode: exclude, matching rows are removed. With mode: include, matching rows are kept and everything else is removed.
  • Using apply_to: source when you want both. If cancelled orders exist on both sides and you only filter the source, the target's cancelled rows will appear as "missing in source".
  • Forgetting that rules are combined with AND logic. All rules must match for a row to be affected. If you need OR logic, use multiple filter blocks.

Filters are applied in order: exclude_keys first, then row_filters. Both run before duplicate-key validation and comparison.

normalization

Create computed columns on the source side before comparison. Each entry is a named pipeline where steps run in sequence.

normalization:
  full_name:
    - op: concat
      args: [first_name, " ", last_name]
    - op: trim

The first step receives its inputs from args (column names or string/numeric literals). Each subsequent step operates on the result of the previous one.

Supported operations

Operation Args (first step) Description
concat col1, literal, col2, ... String concatenation
upper col Convert to uppercase
lower col Convert to lowercase
trim col Strip whitespace
substr col, start [, length] Extract substring
round col [, precision] Round numeric value
add col1, col2 Add two numeric values
sub col1, col2 Subtract
mul col1, col2 Multiply
div col1, col2 Divide
coalesce col1, col2, ... First non-null value
date_format col, from_fmt, to_fmt Parse and reformat a date
map col, val, repl, ... Map specific values to replacements

Map example — convert short codes to labels:

normalization:
  status:
    - op: map
      args: [status_code, "A", "ACTIVE", "I", "INACTIVE", "S", "SUSPENDED"]

Date format example — align date representations:

normalization:
  event_date:
    - op: date_format
      args: [event_date, "%Y-%m-%d", "%d/%m/%Y"]

Arithmetic example — compute a derived value:

normalization:
  total:
    - op: mul
      args: [quantity, unit_price]
    - op: round

Common mistakes:

  • Referencing a generated column from another generated column. All args must refer to original source columns or literals. Normalization entries are independent — they cannot chain across pipelines.
  • Forgetting to ignore the original columns. If you generate full_name from first_name and last_name, add both to ignore_columns — otherwise they are compared against the target (where they may not exist).
  • Using normalization when string_rules would suffice. If you only need trimming or case normalization on an existing column, string_rules is simpler. Use normalization when you need to combine columns, remap values, or change data types.

See Normalization and Rules for worked examples and Data Migration for normalization in a migration workflow.

csv

Configure how CSV files are parsed.

csv:
  delimiter: "\t"
  header: true
  encoding: utf-8
Field Default Description
delimiter "," Field delimiter character
header true Whether the first row is a header
encoding utf-8 File encoding (only UTF-8 is supported)

Common mistakes:

  • Forgetting to set delimiter: "\t" for TSV files. The default is comma — a TSV file parsed with a comma delimiter produces a single column per row.
  • Setting header: false without providing column names. If your file has no header row, Reconlify generates default column names (col_0, col_1, etc.), which must be used in keys and other config sections.

output

Control what the report includes.

output:
  include_row_samples: true
  include_column_stats: true
Field Default Description
include_row_samples true Include sample rows showing differences
include_column_stats true Include per-column mismatch counts

Common mistakes:

  • Setting include_row_samples: false and then wondering why the report has no sample data. The summary counts are still present, but individual row examples are omitted.

Set include_row_samples: false for summary-only reports in automated pipelines where you only need pass/fail status.


Text mode

Text mode compares plain text files. It uses a different set of config fields than tabular mode.

Minimal text config

type: text
source: expected.log
target: actual.log

mode

The comparison strategy for text files.

mode: line_by_line
Value Description
line_by_line (default) Compare lines by position — line 1 to line 1, line 2 to line 2
unordered_lines Compare the set of lines regardless of order — count occurrences of each distinct line

Common mistakes:

  • Using line_by_line for logs from parallel workers. If the same lines appear but in different order, every line flags as a difference. Switch to unordered_lines.
  • Using unordered_lines when order matters. If a log should produce events in a specific sequence, unordered_lines will miss ordering regressions.

See Log Comparison for examples of both modes.

normalize

Normalization options clean up lines before comparison. Applied in a fixed order: normalize newlines, trim, collapse whitespace, case conversion, then blank line removal.

normalize:
  normalize_newlines: true
  trim_lines: true
  collapse_whitespace: true
  case_insensitive: false
  ignore_blank_lines: false
Field Default Description
normalize_newlines true Convert CRLF to LF
trim_lines false Strip leading/trailing whitespace per line
collapse_whitespace false Replace consecutive spaces with a single space
case_insensitive false Convert all lines to lowercase before comparing
ignore_blank_lines false Drop empty lines after other normalization

Common mistakes:

  • Assuming trim_lines is on by default. Unlike tabular mode's trim_whitespace, text mode's trim_lines defaults to false.
  • Enabling collapse_whitespace for indentation-sensitive files. This replaces all runs of whitespace with a single space, which destroys indentation structure.

replace_regex

Substitute matching patterns before comparison. Rules are applied sequentially — the output of one rule is the input to the next.

replace_regex:
  - pattern: "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}"
    replace: "<TS>"
  - pattern: "req-[a-z0-9]+"
    replace: "req-<ID>"
  - pattern: "\\d+ms"
    replace: "<DUR>"

Use this to normalize timestamps, UUIDs, request IDs, durations, and other values that change between runs.

Common mistakes:

  • Forgetting to double-escape backslashes. \\d in YAML becomes the regex \d. A single backslash (\d) is a YAML escape sequence and will not match digits.
  • Writing an overly broad pattern. A pattern like \\d+ replaces every number in every line, including meaningful values like error codes or counts.
  • Ordering rules incorrectly. If rule A replaces timestamps and rule B replaces a pattern that includes timestamps, put rule A first. Rules apply sequentially.

drop_lines_regex

Remove entire lines that match any pattern. Matching is checked after normalize and replace_regex have been applied.

drop_lines_regex:
  - "^DEBUG"
  - "^\\s*$"
  - "^#"

Common mistakes:

  • Dropping lines that contain useful data. A pattern like "error" would drop any line containing "error" — including lines you want to compare. Use anchored patterns (^DEBUG) to be precise.
  • Expecting drop to apply before replace_regex. The order is: normalize, then replace_regex, then drop_lines_regex. A line is dropped based on its content after replacements have been applied.