Log Comparison

Why standard diff tools fail for logs

Application logs change on every run. Timestamps are different. Request IDs are unique. Durations fluctuate with load. Debug lines appear or disappear based on configuration.

A standard diff flags all of this. On a 500-line log, you might see 400 "differences" — almost all caused by timestamps and IDs — while the one real regression (a warning that replaced an info message) is buried in the noise.

Reconlify's text mode solves this. You define rules to normalize away predictable noise — timestamps become <TS>, request IDs become <ID>, debug lines are dropped — and Reconlify reports only the lines that are genuinely different.

The scenario

You are deploying a new version of a backend service. Before and after the deploy, you capture the application log from a standard test run. You need to confirm that the new version produces the same behavior — same sequence of operations, same outcomes — despite naturally different timestamps and request IDs.

app_before.log (source)

2026-03-01 10:00:01 [INFO] Service started on port 8080
2026-03-01 10:00:01 [INFO] Config loaded: workers=4, timeout=30s
2026-03-01 10:00:02 [INFO] Connected to database in 45ms
DEBUG health_check: ok
2026-03-01 10:00:03 [INFO] Handling request req-a1b2c3
2026-03-01 10:00:03 [INFO] Query executed in 22ms, 15 rows
2026-03-01 10:00:03 [INFO] Response sent: 200 OK (120ms)
2026-03-01 10:00:04 [INFO] Handling request req-d4e5f6
2026-03-01 10:00:04 [INFO] Query executed in 18ms, 15 rows
2026-03-01 10:00:04 [INFO] Response sent: 200 OK (95ms)
DEBUG health_check: ok
2026-03-01 10:00:05 [INFO] Shutdown complete, 2 requests served

app_after.log (target)

2026-03-09 14:22:10 [INFO] Service started on port 8080
2026-03-09 14:22:10 [INFO] Config loaded: workers=4, timeout=30s
2026-03-09 14:22:11 [INFO] Connected to database in 38ms
DEBUG health_check: ok
2026-03-09 14:22:12 [INFO] Handling request req-x7y8z9
2026-03-09 14:22:12 [INFO] Query executed in 30ms, 15 rows
2026-03-09 14:22:12 [INFO] Response sent: 200 OK (140ms)
2026-03-09 14:22:13 [INFO] Handling request req-m0n1o2
2026-03-09 14:22:13 [INFO] Query executed in 25ms, 0 rows
2026-03-09 14:22:13 [WARN] Response sent: 404 Not Found (52ms)
DEBUG health_check: ok
2026-03-09 14:22:14 [INFO] Shutdown complete, 2 requests served

A raw diff would flag nearly every line. But the meaningful differences are:

  • The second query returned 0 rows instead of 15
  • The second response was a 404 instead of a 200

Everything else — timestamps, request IDs, durations — is expected variance.

Line-by-line comparison

Use line_by_line mode when both logs should produce the same sequence of operations in the same order.

type: text
source: app_before.log
target: app_after.log
mode: line_by_line
 
normalize:
  trim_lines: true
  collapse_whitespace: true
 
replace_regex:
  - pattern: "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}"
    replace: "<TS>"
  - pattern: "req-[a-z0-9]+"
    replace: "req-<ID>"
  - pattern: "\\d+ms"
    replace: "<DUR>"
  - pattern: "\\d+ rows"
    replace: "<N> rows"
 
drop_lines_regex:
  - "^DEBUG"

What each rule does

normalizetrim_lines strips leading/trailing whitespace. collapse_whitespace reduces irregular spacing to a single space. These handle minor formatting inconsistencies.

replace_regex — four rules normalize predictable noise:

Pattern Replaces Why
\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} <TS> Timestamps differ every run
req-[a-z0-9]+ req-<ID> Request IDs are unique per request
\d+ms <DUR> Durations vary with load
\d+ rows <N> rows Row counts may vary with test data

drop_lines_regex — removes lines starting with DEBUG. These are diagnostic output that is not relevant to behavior validation.

What survives normalization

After processing, most lines match. For example:

<TS> [INFO] Handling request req-<ID>
<TS> [INFO] Query executed in <DUR>, <N> rows
<TS> [INFO] Response sent: 200 OK (<DUR>)

But two lines differ — the second request's outcomes:

Line Source (processed) Target (processed)
8 <TS> [INFO] Query executed in <DUR>, <N> rows <TS> [INFO] Query executed in <DUR>, <N> rows
9 <TS> [INFO] Response sent: 200 OK (<DUR>) <TS> [WARN] Response sent: 404 Not Found (<DUR>)

Line 8 matches because the row count was normalized to <N> rows. Line 9 is a real difference — the response code changed from 200 to 404. This is the regression you need to investigate.

Reading the report

The report samples array shows each differing line with both raw and processed content:

{
  "line_number_source": 10,
  "line_number_target": 10,
  "raw_source": "2026-03-01 10:00:04 [INFO] Response sent: 200 OK (95ms)",
  "raw_target": "2026-03-09 14:22:13 [WARN] Response sent: 404 Not Found (52ms)",
  "processed_source": "<TS> [INFO] Response sent: 200 OK (<DUR>)",
  "processed_target": "<TS> [WARN] Response sent: 404 Not Found (<DUR>)"
}

The raw_* fields show original log content. The processed_* fields show the normalized values that were compared. The line_number_* fields point to the exact lines in the original files.

The details section also includes dropped_samples (which DEBUG lines were removed) and replacement_samples (which regex rules fired on which lines) — useful for verifying your normalization rules work as expected.

Unordered comparison

Some systems produce log lines in a non-deterministic order. Worker pools, async handlers, and parallel pipelines all emit lines whose order depends on scheduling. Use unordered_lines mode when the same lines should appear but their sequence is not guaranteed.

Example: worker pool output

expected_workers.log (source)

[worker-1] Processing batch A
[worker-2] Processing batch B
[worker-1] Batch A complete: 100 items
[worker-2] Batch B complete: 200 items
[worker-1] Processing batch C
[worker-1] Batch C complete: 150 items

actual_workers.log (target)

[worker-2] Processing batch B
[worker-1] Processing batch A
[worker-2] Batch B complete: 200 items
[worker-1] Batch A complete: 100 items
[worker-1] Processing batch C
[worker-1] Batch C complete: 150 items

The lines are identical but reordered. A line-by-line diff would flag every line. Unordered mode handles this correctly:

type: text
source: expected_workers.log
target: actual_workers.log
mode: unordered_lines
 
normalize:
  trim_lines: true

Reconlify counts how many times each distinct line appears on each side. All six lines appear exactly once in both files, so the result is: zero differences.

When unordered mode finds real differences

If the target had a missing or extra line, the report would show it. The unordered_stats section provides a summary:

{
  "source_only_lines": 1,
  "target_only_lines": 0,
  "distinct_mismatched_lines": 1
}
  • source_only_lines — excess line occurrences in source (lines that disappeared)
  • target_only_lines — excess line occurrences in target (lines that appeared)
  • distinct_mismatched_lines — how many unique line contents have different counts

The samples_agg array shows each mismatched line with occurrence counts and original line numbers:

{
  "line": "[worker-1] Batch C complete: 150 items",
  "source_count": 1,
  "target_count": 0,
  "source_line_numbers": [6],
  "target_line_numbers": []
}

Entries are sorted by largest count difference, so the biggest discrepancies appear first.

Choosing between modes

Situation Mode
Sequential log — same operations in same order line_by_line
Parallel workers — same lines, unpredictable order unordered_lines
CLI output — deterministic sequence line_by_line
Event-driven logs — events may interleave unordered_lines

When in doubt, start with line_by_line. Switch to unordered_lines if you see false positives caused by reordering.

Typical use cases

  • Deployment validation — confirm a new version produces the same behavior as the previous one
  • Regression testing — check CLI or application output against saved baselines
  • CI artifact checks — validate generated text files in pipelines
  • Environment comparison — compare logs from staging vs production
  • Config change validation — verify that a config change did not alter application behavior