HashSet in C#

Introducing the HashSet in C#

If you’re working with large datasets or need lightning-fast lookups, HashSet in C# might just be your secret weapon. Unlike traditional collections, HashSet offers some unique advantages in terms of performance, especially when it comes to checking for the existence of elements. In this article, we’ll look at why HashSet in C# is a game-changer — with real-world use cases, performance comparisons, and code examples that show exactly when and how to use it.

It’s really easy to reach for a standard List object when you’re building out your apps in .NET. It does pretty much everything you need (most of the time): it implements IEnumerable, works well with LINQ, has sorting built in, and lets you pull values out by index really easily. In most cases, it’s the right tool for the job because it just works — especially when you need to store and manipulate a collection of items, like imported datasets of one kind or another. But sometimes, there’s a lighter choice that can work even better with certain types of data.

If you’ve ever needed to make sure a collection contains only unique values, or wanted a faster way to check whether something already exists, HashSet in C# might be exactly what you’re looking for. It’s a no-duplicates, no-fuss collection that includes built-in set operations like intersection, union, and difference. Let’s take a look at where it comes in handy, and a few examples that show how powerful it can be.

1. Remove Duplicates From a List

Starting with a regular list of strings, let’s use a HashSet<string> to dedupe emails:

var emails = new List<string> {
    "alex@jkrussell.dev", "sam@jkrussell.dev", "alex@jkrussell.dev", "chris@jkrussell.dev"
};

var uniqueEmails = new HashSet<string>(emails);

foreach (var email in uniqueEmails)
{
    Console.WriteLine(email);
}

Output:

alex@jkrussell.dev
sam@jkrussell.dev
chris@jkrussell.dev

A real-world scenario where you might use this is when you’ve accepted bulk user input and you want to dedupe email addresses or other data quickly.

2. Fast Lookups in Large Lists

Lookup speed is one of the biggest advantages of using a HashSet in C#. Here’s the best way to check if your list contains a given value:

var bannedEmails = new HashSet<string> {
    "admin@jkrussell.dev", "test@jkrussell.dev", "null@jkrussell.dev"
};

if (bannedEmails.Contains("admin@jkrussell.dev"))
{
    Console.WriteLine("Email is blocked.");
}

Output:

Email is blocked.

HashSet is often 10–100x faster than using a List for lookups, though you won’t really feel it from small examples like this. But as the size of your collections grow there are real and tangible benefits to using them.

Let’s look at a comparison of how some of the common collection types cope at scale:

Collection Type.Contains() Time ComplexityNotes
List<T>O(n) — linearChecks each item one by one — slower as the list gets longer
ArrayO(n) — linearSame as List — no shortcuts for lookups
Dictionary<Key,Value>O(1) — constantInstantly finds keys using a hash — fast even with large data
HashSet<T>O(1) — constantLike Dictionary but for single values — always fast and efficient

The Big-O notation might seem a bit complicated to understand, but it’s a really useful way of describing how fast (or slow) an algorithm is as the input grows — it’s a shorthand for performance scaling. Think of it as measuring how many steps something takes as the list or data gets bigger.

Here’s a chart that illustrates how lookup performance differs across common C# collections when using the .Contains() method:

A chart illustrating how lookup performance differs across common C# collections when using the .Contains() method.

In the chart, the dashed lines represent constant-time performance as the number of elements grows. The green (HashSet<T>) and blue (Dictionary<K,V>) dashed lines illustrate their speed. The red and orange lines (List<T> and Array) climb steadily, showing that these data structures slow down as their size increases.

3. Find Shared Tags Between Two Collections

Compare two lists for common items:

var postTags = new HashSet<string> { "csharp", "aspnet", "linq", "backend" };
var userTags = new List<string> { "linq", "frontend", "Backend", "csharp", "api" };

postTags.IntersectWith(userTags);

foreach (var tag in postTags)
{
    Console.WriteLine($"Matched tag: {tag}");
}

Output:

Matched tag: csharp
Matched tag: linq

In the example, we used IntersectWith() to modify the postTags HashSet. If you wanted to use a non-destructive method, just use Intersect():

var sharedTags = postTags.Intersect(userTags);

In both cases, it’s important to note that a HashSet<T> is case-sensitive by default. In the example above, if one collection contains “backend” and the other contains “Backend”, they will not be considered a match and will not appear in the output. If you did want to ignore any casing differences, declare your HashSet like this:

var postTags = new HashSet<string>(StringComparer.OrdinalIgnoreCase)

4. Merge Two Collections (Union)

You’re probably used to using Linq queries and calling .Distinct() to filter out duplicates in a list. But what about when you want to bring two datasets together? Using a HashSet in C# is one of the most efficient ways to merge two collections without duplicates — especially useful when email addresses or other identifiers must remain unique. Unlike List, HashSet enforces uniqueness automatically.

Here’s an example of how it can be used in a scenario bringing two sets of emails together:

var subscribedEmails = new HashSet<string> { "alex@jkrussell.dev", "sam@jkrussell.dev" };
var newEmails = new List<string> { "sam@jkrussell.dev", "bob@jkrussell.dev" };

subscribedEmails.UnionWith(newEmails);

foreach (var email in subscribedEmails)
{
    Console.WriteLine(email);
}

Output:

alex@jkrussell.dev
sam@jkrussell.dev
bob@jkrussell.dev

Order isn’t preserved in a HashSet, so if that’s something you need you’ll either have to post-sort or use a different data structure.

5. Find What’s Missing (Difference)

Get items in list A not in list B:

var allFeatures = new HashSet<string> { "Login", "Signup", "DarkMode", "Export" };
var completedFeatures = new List<string> { "Login", "Signup" };

allFeatures.ExceptWith(completedFeatures);

foreach (var feature in allFeatures)
{
    Console.WriteLine($"Still to do: {feature}");
}

Output:

Still to do: DarkMode
Still to do: Export

Remember earlier how we used IntersectWith() to find shared items, and Intersect() as a non-destructive alternative? The same principle applies here: ExceptWith() modifies the original set, while Except() returns a new collection without altering the source:

// Destructive
emails.ExceptWith(unsubscribed);

// Non-destructive
var filtered = emails.Except(unsubscribed);

6. Compare Two Lists for Exclusives (Symmetric Difference)

Compare two lists and return differences:

var listA = new HashSet<string> { "feature1", "feature2", "feature3" };
var listB = new HashSet<string> { "feature2", "feature3", "feature4" };

listA.SymmetricExceptWith(listB);

foreach (var item in listA)
{
    Console.WriteLine($"Unique to one list: {item}");
}

Output:

Unique to one list: feature1
Unique to one list: feature4

The same rules apply here as in the previous examples: SymmetricExceptWith is destructive, whilst SymmetricExcept will return a new collection.

7. De-duplicate Custom Objects by Property

The final example is really interesting, and highlights just how a HashSet<T> determines if any two objects are the same. Check out this code, which you’ll notice has Equals() and GetHashCode() properties:

var products = new HashSet<Product>
{
    new Product { SKU = "123" },
    new Product { SKU = "123" },
    new Product { SKU = "456" }
};

foreach (var p in products)
{
    Console.WriteLine($"Product: {p.SKU}");
}

class Product : IEquatable<Product>
{
    public string SKU { get; set; }

    public override int GetHashCode() => SKU.GetHashCode();

    public bool Equals(Product? other) => other is not null && SKU == other.SKU;

    public override bool Equals(object? obj)
    {
        return Equals(obj as Product);
    }
}

Output:

Product: 123
Product: 456

You’ll notice from the code that there are two override members: Equals() and GetHashCode(). These are both added by using the implementation of IEquatable<Product>. The object in the overridden method Equals() is then passed as a Product to our own implementation for further processing. HashSet<T> uses these methods to determine if two objects are the same:

  • GetHashCode() — to quickly group or locate candidates (bucket).
  • Equals() — to confirm actual equality within that group.

If you’re using a custom class like Product, and you don’t override these two methods, each new Product { SKU = "123" } will be treated as a different object, even if the SKU string is identical — because by default, object equality compares reference, not content.

HashSet vs List Comparison

If you’re considering doing a find and replace on every single List<T> you’ve ever created with a HashSet<T>, you might want to weigh up how and compare which one is better suited to your use case:

FeatureList<T>HashSet<T>
Duplicates allowedYesNo (enforced by default)
OrderingMaintains insertion orderNo guaranteed order
Lookup performanceO(n) — linearO(1) — constant
Add performanceO(1) — amortisedO(1) — amortised
Remove performanceO(n)O(1) — on average
Memory usageLower (compact layout)Higher (uses hash buckets)
Set operationsManual (via LINQ)Built-in (e.g. IntersectWith)
Best forOrdered lists, duplicates, small datasetsFast lookups, uniqueness, large datasets

In data structures, “amortised” refers to the average performance of an operation over time, even if some individual operations might occasionally take a bit longer.

Thread Safety and Memory Considerations

One word of caution: HashSet is not thread-safe. If your app is multi-threaded or you’re doing anything involving parallel tasks that read and write to the same collection, you’ll need to add your own locking or use a concurrent structure. You’re probably not, but it’s worth noting. It’s safe to Contains() on a read-only set from multiple threads, but once you start modifying it (adding or removing items), you’ll need to wrap it in a lock or move to something like ConcurrentDictionary.

Another thing to keep in mind is memory. HashSet is super-fast — but you pay a little for that speed in terms of memory usage. Internally, it uses buckets and hash codes, which makes lookups near-instant. But that structure has more overhead than something like a simple list, which just stores values linearly in memory. It’s not usually a problem unless you’re working with huge datasets or on memory-constrained systems, but it’s worth being aware of just in case.

Wrapping Up

HashSets in .NET are cool. If you don’t use them already, you should try them out — they’re fast, lightweight, and incredibly useful when you’re dealing with uniqueness, lookups, or set-based operations. While a List might be your default go-to, it’s always worth asking: do I actually need ordering or duplicates? If not, HashSet could be a better choice.

That said, they won’t always be the right tool for every job. If you care about preserving item order, need indexed access, or are working with small datasets where performance isn’t a concern, a list might still be the better fit. But when speed and uniqueness matter and you want a super-scalable tool, give the trusty HashSet a go!

Leave a Reply

Your email address will not be published. Required fields are marked *