The problem with log noise

CI/CD pipelines generate enormous log volumes. A single failing test can produce 50,000 lines of output. Stack traces, dependency warnings, Docker pull progress bars, timestamp prefixes — most of it is noise.

Log mining — extracting meaningful patterns from this noise — is foundational to everything else we do in Testhide. The root-cause classifier needs clean input. The flakiness predictor needs stable feature vectors. The failure retriever needs normalized templates to compute similarity.

Get log mining wrong, and every downstream model suffers.

What Drain3 does

Drain3 is an online log template extraction algorithm. Given a stream of log lines, it builds a parse tree and assigns each line to a "log cluster" — a template with variable fields extracted. For example:

"502 BadGateway for /api/v1/charge during retry 3 of 3"
"502 BadGateway for /api/v1/payment during retry 1 of 3"
"502 BadGateway for /api/v2/refund during retry 2 of 3"

Drain3 would extract the template:

"502 BadGateway for <*> during retry <*> of <*>"

This is exactly what you want. The template is stable across different requests. You can compute frequency, alert on spikes, link to past incidents — all the good things.

Our hand-rolled regex approach

Before we integrated Drain3, we had a set of hand-rolled regex patterns that we called "master patterns." These were written by engineers who knew our specific log format intimately:

PATTERNS = [
    (r'HTTP \d{3} for .*', 'HTTP_STATUS'),
    (r'retry attempt \d+ of \d+', 'RETRY'),
    (r'Connection refused to .*:\d+', 'CONN_REFUSED'),
    # ... 47 more patterns
]

For our specific CI infrastructure, these patterns were extremely precise. They'd been tuned over years. They handled edge cases that a generic algorithm would miss.

The problem with each approach alone

Drain3 alone: Generic. Great at clustering unknown log formats. Bad at our specific domain where we know the structure. Tends to over-split clusters for logs with high cardinality in specific fields (e.g., UUIDs, commit hashes, file paths).

Regex alone: Brittle. Miss one pattern, miss the cluster. Doesn't handle novel log formats from new dependencies. Requires manual updates as the system evolves.

The hybrid that works

The insight: use regex as a pre-normalization step, then feed the normalized output to Drain3.

def extract_template(log_line: str) -> str:
    # Step 1: apply domain-specific normalization
    for pattern, replacement in MASTER_PATTERNS:
        log_line = re.sub(pattern, replacement, log_line)

    # Step 2: feed normalized line to Drain3
    result = drain3_parser.add_log_message(log_line)
    return result.cluster.get_template()

The regex patterns don't need to be exhaustive — they just need to normalize the high-cardinality fields that would confuse Drain3. Drain3 handles the rest: clustering, template learning, unknown formats.

Results

After switching to the hybrid approach:

Cluster count dropped from ~12,000 to ~340 for the same log volume. The regex normalization collapsed the UUID/hash cardinality that was fragmenting clusters.
Cluster stability (measured by template edit distance over rolling 24h windows) improved by 73%.
Novel pattern detection still works — Drain3 handles new log formats we haven't seen before, which pure regex would miss.

The lesson

The "right tool for the job" is often both tools. Drain3 is the right algorithm for log template extraction. Our regex patterns encode domain knowledge that no generic algorithm can replicate. The combination is better than either alone.

This is a recurring theme in production ML: the published algorithm is right about the approach, but wrong about assuming it has all the domain knowledge. Hybrid systems that combine algorithmic strength with domain-specific pre-processing usually win.

For Testhide, the log mining pipeline now feeds clean, stable templates into every downstream model. The root-cause classifier's accuracy improved by 12 percentage points after the switch. That number is why the hybrid approach is staying.

← Back to blog Install Testhide →

Regex vs Drain3 for log mining: we kept both