By Saif Mahmud in System Design — 29 Mar 2026

How I Replaced a Messy Regex Engine with Radix Trees (and Made Domain Matching 70% Faster)

We had a domain matching service returning responses in under a millisecond. So naturally, I deleted it.

There is a golden rule in software engineering: if it ain't broke, don't touch it. This service wasn’t broken. It was fast. The servers weren’t actively on fire. But it lived directly in the hot path, handling thousands of requests per second. And when you are dealing with that kind of scale, your inner engineering demon wakes up. "Fast enough" starts to feel like a personal challenge.

You look at a 0.8ms response time and think: I bet I can make that 0.2ms.

But performance wasn’t the real problem. Under the hood, the system was slowly turning into a toxic, regex-powered house of cards. It worked, but it was fragile, opaque, and exactly one bad PR away from ruining someone’s week at 3 AM.

So I tore the whole thing out and rebuilt the core engine using Radix trees.

The result? We made the system ~70% faster in production, dropped our read-path memory allocations to absolute zero, and finally built a system I can reason about without summoning dark magic.

Here is the story of how our domain matching service evolved from a naive string comparator into a wildly fast routing engine.

Phase 1: The Dark Ages (Exact Matching)

In the beginning, life was simple. Our service did exactly one thing: exact string matching.

if query == "saifmahmud.blog" {
    return true
}

This ran inside a legacy NGINX and Lua setup that boasted absolutely zero maintainability. Touching those scripts felt like trying to defuse a bomb blindfolded. Eventually, we did what every team does in this situation: we burned it to the ground and rewrote it in Go.

Phase 2: "Regex Will Fix It" (Famous Last Words)

Since we were rewriting the engine from scratch, we wanted wildcard support. If others can do it, why can't we? We needed to allow things like *.saifmahmud.blog.

The most obvious solution? Regular expressions.

Every rule was dynamically compiled under the hood to something like this: ^[^.]+\.saifmahmud\.blog$

The problem? We did that... for every single rule.

If a user had 10,000 rules, we were holding 10,000 compiled regex objects in memory. Every request had to painstakingly walk that list in a linear loop until it found a match. If there was no match, it evaluated all 10,000 before giving up.

This wasn't just computationally expensive. It was basically a self-inflicted denial-of-service attack on our own CPUs.

And before you ask, "Why not just fan-out with goroutines?": adding a WaitGroup wouldn't save us. This was a purely CPU-bound problem with zero I/O wait. Throwing concurrency at it would just help us hit the hard limit of our CPU cores faster.

Phase 3: The LRU Cache Band-Aid

To stop the servers from melting down, we did what any self-respecting engineer does when faced with a bottleneck: we slapped an LRU cache on it.

Hot tenants got their compiled regexes cached in-memory. CPU usage dropped. Things looked stable again.

But this was a band-aid. We weren’t fixing the system, we were just making sure it failed less often. The core problem remained: we were still doing O(N) regex evaluations per request.

Phase 4: Galaxy-Brain Prefix Matching

If you know anything about regex engines, you know that suffix matching (which we relied on for wildcards) makes it incredibly hard to share work across patterns. Domains are all about suffixes.

So we got weird with it. We flipped them.

api.example.com became com.example.api.

By reversing the domain labels, everything became prefix matching. This sped up individual regex evaluations significantly, but it didn’t solve the real issue. We still had that pesky for loop cycling through thousands of separate objects. Still a loop. Still O(N).

Phase 5: The Frankenstein Regex

To kill the linear loop, we did something slightly unhinged. We decided to compile the entire list of rules into a single, massive regex object using alternation (|).

Instead of looping through objects, we just asked the regex engine: Hey, does this query match (^rule1$|^rule2$|^rule3$...)?

But this created a new monster: comically long regex strings. To optimize it, we implemented Common Substring Merging. Since we were already reversing labels, we grouped shared prefixes:

Before: (^com\.example$|^com\.example\.api\.[^.]+$|^com\.example\.cdn\.[^.]+$)

After: ^com\.example(|\.api\.[^.]+|\.cdn\.[^.]+)$

It was faster. It was also an absolute nightmare to look at. Debugging now meant mentally executing a giant grouped regex string while someone is pinging you on Teams asking, "Why is this domain being allowed in production?"

Phase 6: The Final Boss (Negation)

Then came the final requirement: Negation. We needed to allow wildcard patterns, but explicitly deny specific subdomains.

*.example.com           # Allow all 1-level subdomains
!api.example.com        # Explicitly deny api.example.com

Now we had:

One massive allow regex.
One massive deny regex.
A growing set of mental gymnastics about which one fires first and priority collisions.

It worked. But at this point, the system wasn’t just complex, it was fragile. Every change felt like pulling on a loose thread in a sweater. We had reached the absolute limit of what our cached, reversed, merged, double-barrel regex engine could sanely handle.

Phase 7: The Radix Rebellion

I snapped. The regex engine had to go.

Even though it was returning answers in sub-milliseconds, it felt like a house of cards. I ripped it out entirely and replaced it with a data structure actually designed for prefix routing: The Radix Tree.

Because we were already reversing our domain labels (api.example.com -> com.example.api.), a Radix tree was a flawless fit. com.example. sits at the root, and api. branches seamlessly off it. Prefixes are shared naturally. No duplication. No scanning. No guessing.

But how do you handle exact matches, wildcards, and deny rules all sitting on the same node without the universe collapsing?

We designed a custom node structure that explicitly tracks the state:

type RuleNode struct {
    LabelCount      int    // e.g., 2 for example.com
    HasExact        bool
    ExactAction     Action // Allow | Deny
    HasWildcard     bool
    WildcardAction  Action // Allow | Deny
}

By separating Exact and Wildcard flags, a single node can represent both example.com and *.example.com simultaneously. Cleanly. No weird edge cases.

The Zero-Allocation Hot Path

Traversal is simple: walk the tree, evaluate rules as you go, and keep track of the best match. No regex engine. No backtracking. No surprises. Just deterministic evaluation.

To enforce DNS-style priority, we followed three mathematical rules:

Specificity Wins: A rule with more labels (3 labels) overrides a rule with fewer labels (2 labels). Exact matches naturally beat wildcards this way.
Deny Wins Ties: If an Allow rule and a Deny rule exist at the exact same specificity level, Deny takes no prisoners.
Zero Heap Allocations: We track the "best" rule using strictly primitive variables on the stack.

Here is the actual Go implementation driving this core logic today:

m.tree.Root().WalkPath(searchKey, func(key []byte, rule RuleNode) bool {
    isExactMatch := bytes.Equal(key, searchKey)

    // 1. Evaluate Exact Matches
    if isExactMatch && rule.HasExact {
        updateBestRule(rule.ExactAction, rule.LabelCount)
    }

    // 2. Evaluate 1-Level Wildcard Matches
    if !isExactMatch && rule.HasWildcard && queryLabels == rule.LabelCount+1 {
        updateBestRule(rule.WildcardAction, rule.LabelCount)
    }

    return false // Keep traversing to find deeper, more specific rules
})

The Aftermath

I can already hear you typing in the comments: "You made those numbers up, my guy." Fair enough. I heard you like tables, so here is the raw, un-cached benchmark data for the read path. (Keep in mind, that "70% faster" metric I mentioned at the start was our production average because the LRU cache was heroically masking the regex engine's sins. When you strip away the cache, it gets violent):

Match Benchmark (Read Path)

Rules	Radix	Regex Naive	Regex Grouped
10	269 ns	412 ns	186 ns
100	258 ns	22,800 ns	815 ns
1,000	289 ns	366,000 ns	3,300 ns
10,000	314 ns	4,050,000 ns	6,350 ns

This table tells a beautiful engineering story.

Notice how Regex Grouped actually beats Radix at 10 rules? If you have a tiny static list, a well-written regex is incredibly fast.

But look what happens as it scales. The Naive Regex dies a horrible O(N) death, scaling to over 4 milliseconds per query. Our optimized Frankenstein regex survives much better, but at 10,000 rules, it is still 20x slower than Radix.

Why? Because Regex scales with how many rules you have. Radix basically shrugs and stays completely flat at ~250-300ns. Radix scales with how many labels are in the domain O(depth). That’s the whole game.

Memory (Where Things Get Ugly)

Speed is great, but what about the memory footprint during compilation?

Rules	Radix	Regex Naive	Regex Grouped
10	3 KB / 93 allocs	35 KB / 176 allocs	53 KB / 214 allocs
100	60 KB / 930 allocs	346 KB / 1.5K allocs	140 KB / 703 allocs
1,000	585 KB / 9K allocs	5.4 MB / 15K allocs	1.2 MB / 5K allocs
10,000	5.9 MB / 90K allocs	59 MB / 152K allocs	13.6 MB / 50K allocs

At 10,000 rules, the naive regex balloons to 59 MB in memory. Our optimized grouped regex cut that down to 13.6 MB. But the Radix tree requires a lean 5.9 MB footprint. Even the heavily optimized regex was still over 2x heavier.

Sometimes breaking the "don't touch it" rule pays off.

By stripping out the Frankenstein regex engine and replacing it with an immutable radix tree, we didn't just make it faster—we made it understandable.

Our memory allocations on the read path dropped to absolute zero. The garbage collector barely even knows this service exists anymore. No more decoding giant regex strings. No more relying on caches to hide structural problems. Just a system that behaves exactly the way it looks.

Regex is a fantastic tool. But this was never a regex problem. It was a data structure problem. And once we treated it that way, everything got simpler, faster, and a lot less cursed.

The best part? Not a single unit test complained!