The Simplest Science Behind Domain Similarity

Written by Sarah Dontogan | September 2, 2025

When attackers try to fool you, they often start with domains that look nearly identical to the real thing. You’ve probably seen it in phishing attempts: an email that looks like it’s from your bank or a well-known brand, but the sender’s address is just a little off. “paypai.com” instead of “paypal.com.” At a glance, most people won’t notice. That’s exactly what attackers are counting on.

To spot these tricks, security researchers use a method called Levenshtein distance. It sounds complex, but it’s simply a way of measuring how similar two domain names are, and it’s one of the tools ThreatSTOP uses to proactively protect you. Read on for this basic technique in computing.

What Is Levenshtein Distance?

Levenshtein distance measures how many edits it would take to turn one word into another. Edits can be:

Adding a character
Removing a character
Replacing a character

Examples:

google.com → gooogle.com (one extra “o”)
netflix.com → netfli.com (one missing “x”)
paypal.com → paypai.com (an “l” swapped for an “i”)

Attackers also rely on Unicode “homograph” domains—for example, swapping Latin letters with visually identical Cyrillic characters. ThreatSTOP normalizes these to punycode before scoring. That means раураl.com (with Cyrillic “р”) still shows a Levenshtein distance of 1, and is proactively blocked.

These one-edit domains are designed to deceive. ThreatSTOP makes sure they don’t get the chance.

Why It Matters to You

Attackers know people skim quickly. A single swapped letter is enough to trick someone into clicking, entering credentials, or downloading malware. Traditional blocklists may miss these variations, but similarity scoring closes that gap.

ThreatSTOP’s Security, Intelligence, and Research team applies this method to:

Catch phishing domains early—before they’re widely reported
Stop attackers from impersonating your brand
Protect employees from accidentally visiting malicious sites

Regex vs. Levenshtein: Better Together

You may be familiar with regex (regular expressions). Regex is excellent at spotting known threats and exact patterns, but it struggles when attackers invent unpredictable twists.

Technique	Best At…	Not Great When…
Regex	Quickly spotting known, exact threats	Attackers use creative, never-before-seen variations
Levenshtein	Detecting subtle, unknown variations fast	You only want to match exact patterns

Think of regex as a guard checking IDs against a list, while Levenshtein is the detective who notices suspicious behavior. Used together, they provide the strongest coverage.

Let's make it visual

Here's a simple visualization of how Levenshtein distance works, comparing the legitimate domain "paypal.com" and a sneaky impostor "paypai.com":

The number at the bottom right corner (1) means there’s just one edit separating these domains-very suspicious!

ThreatSTOP’s Approach

We’re rolling out similarity-based detection in a controlled way to deliver clear value.

Protective DNS (DNS Defense Cloud and DNS Defense): Our platforms automatically stop access to malicious look-alike domains before a user ever reaches them.
IP Defense: When phishing infrastructure is tied to IP addresses, our protections ensure your firewalls, routers, and cloud controls proactively block it.

You can expect:

Simple activation: No complex setup.
Clear visibility: See exactly why a domain was flagged.
Control: Opt in to evaluate, and opt out if needed.

This feature is experimental today, but already proving powerful in real-world testing. Your feedback helps us refine accuracy while keeping false positives low.

Staying Ahead of Subtle Tricks

The message is simple: attackers thrive on subtle changes, but ThreatSTOP’s protections remove that advantage. By using domain similarity scoring alongside proven threat intelligence, we stop phishing campaigns before they succeed. Helping you stay protected without adding complexity to your security stack.

For those interested in joining the ThreatSTOP family, or to learn more about our proactive protections for all environments, we invite you to visit our product page. Discover how our solutions can make a significant difference in your digital security landscape. We have pricing for all sizes of customers! Get started with a Demo today!

Connect with Customers, Disconnect from Risks

MITRE ATT&CK Framework Mapping

Threat Activity	ATT&CK Technique ID	Category
Phishing with look-alike domains	T1566.002	Initial Access: Spearphishing Link
Credential harvesting through fake login pages	T1056.003	Collection: Web Portal Capture
Command and Control over malicious domains	T1071.004	Command and Control: Application Layer Protocol (DNS/HTTP)
Data exfiltration via crafted domains	T1048.003	Exfiltration: Exfiltration Over Unencrypted/Obfuscated Non-C2 Protocol
Brand impersonation for malicious campaigns	T1585.001	Resource Development: Domains

View full post