Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctness of Assert.Contain Assert.DoesNotContain for collections #2872

Closed
TimLovellSmith opened this issue Jan 26, 2024 · 12 comments
Closed

Comments

@TimLovellSmith
Copy link
Contributor

Hopefully this issue might just a historical one already fixed? But anyway, here is the issue...

I'm using xunit 2.4.1, it looks like, so I'll have to check if this really repros with newer versions...

I got an interesting suggestion from an AI.

"The proposed code change can be improved by avoiding the use of Assert.DoesNotContain method to check if a key already exists in the dictionary. This method is not efficient because it iterates over the entire dictionary to check if the key exists. Instead, you can use the ContainsKey method of the dictionary which is more efficient because it uses hashing to check if the key exists."

My test is something like

                var dict = new Dictionary<string, (string sku, string region)>(StringComparer.OrdinalIgnoreCase);
                Assert.DoesNotContain<string, (string sku, string region)>(meter.MeterId, (IDictionary<string, (string sku, string region)>)dict);

Which could possibly call

    public static void DoesNotContain<TKey, TValue>(TKey expected, IDictionary<TKey, TValue> collection)

right?

I doubted this suggestion at first. Because it looks like DoesNotContain tries to optimize for this kind of thing!:

    public static void DoesNotContain<TKey, TValue>(TKey expected, IDictionary<TKey, TValue> collection)
    {
        GuardArgumentNotNull("expected", expected);
        GuardArgumentNotNull("collection", collection);
        DoesNotContain(expected, collection.Keys);
    }

    public static void DoesNotContain<T>(T expected, IEnumerable<T> collection)
    {
        if (collection is ICollection<T> collection2 && collection2.Contains(expected))
        {
            throw new DoesNotContainException(expected, collection);
        }
        // need an is ICollection<T> && !collection2.Contains(expected) clause here? See below

        DoesNotContain(expected, collection, GetEqualityComparer<T>());
    }

But when I applied the AI's suggestion, lo and behold, it runs a lot faster!

So apparently this optimization, if that is what it is, doesn't work for dictionaries. I wonder why...

I think the problem might be that even when the collection implements a Contains() method, the optimization only applies to the NEGATIVE case, i.e. when the assertion must fails, because the collection contains the element.

If the collection doesn't contain the element, then it should also be able to do a fast-path return, assuming that it implements .Contains in the expected fashion - but instead we'll continue enumerating the entire collection with the equality comparer, won't we.

I tried to see if the problem might also be something that applies to the current code, but I'm a little confused about it since the implementation seems to have changed.

Now there's an implementation which special cases HashSet...
However I'm a little bit suspicious it will have similar problems, plus not fix the bug that was fixed for consistency with HashSet.Contains.

I mean, it seems like Dictionary key collection will also need some special casing for the same reasons as HashSet even though it doesn't inherit HashSet..., and passing the dictionary key collection to .DoesNotContain would end up treating it as a general collection and using the default equality comparer

			// We special case HashSet<T> because it has a custom Contains implementation that is based on the comparer
			// passed into their constructors, which we don't have access to.
			var hashSet = collection as HashSet<T>;
			if (hashSet != null)
				Contains(expected, hashSet);
			else
				Contains(expected, collection, GetEqualityComparer<T>());

https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2.keycollection.contains?view=net-8.0

@TimLovellSmith
Copy link
Contributor Author

A bit of reproing later, the perf issue seems to be fixed in xunit.assert 2.5.0

@TimLovellSmith
Copy link
Contributor Author

TimLovellSmith commented Jan 26, 2024

The KeyCollection doesn't use dictionary's IComparer does seem to be a real phenomenon.
I suppose opinions may not all be the same on whether its a bug.
Here is an exploration of the behavior:

Console.WriteLine("Hello, World!");
Dictionary<string, string> dict = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
for (int i = 0; i < 1000000; i++)
{
    dict.Add("hello" + i.ToString(), i.ToString());
}

Assert.Contains("HELLO1", (IDictionary<string, string>)dict); // passes
Assert.Contains("HELLO1", dict.Keys); // fails!

@TimLovellSmith
Copy link
Contributor Author

TimLovellSmith commented Jan 26, 2024

Seems that there's also a more general phenomenon of not using custom .Contains overrides in collections (any more - this actually used to pass with xunit 2.4.2)

var c = new CustomCollection();
Assert.Contains("s", c);

class CustomCollection : ICollection<string>

{
    public int Count => 1;

    public bool IsReadOnly => true;

    public void Add(string item) => throw new NotImplementedException();

    public void Clear() => throw new NotImplementedException();
    public bool Contains(string item) { return item == "s"; }
    public void CopyTo(string[] array, int arrayIndex) => throw new NotImplementedException();
    public IEnumerator<string> GetEnumerator() => throw new NotImplementedException();
    public bool Remove(string item) => throw new NotImplementedException();
    IEnumerator IEnumerable.GetEnumerator() => throw new NotImplementedException();
}

Unhandled exception. System.NotImplementedException: The method or operation is not implemented.
at CustomCollection.GetEnumerator() in C:\repo\xunitDictionaryExperiment\Program.cs:line 25
at Xunit.Sdk.CollectionTracker1.GetEnumerator() in /_/src/xunit.assert/Asserts/Sdk/CollectionTracker.cs:line 694 at System.Linq.Enumerable.Contains[TSource](IEnumerable1 source, TSource value, IEqualityComparer1 comparer) at Xunit.Assert.Contains[T](T expected, IEnumerable1 collection, IEqualityComparer1 comparer) in /_/src/xunit.assert/Asserts/CollectionAsserts.cs:line 255 at Xunit.Assert.Contains[T](T expected, IEnumerable1 collection) in /_/src/xunit.assert/Asserts/CollectionAsserts.cs:line 227
at Program.

$(String[] args) in C:\repo\xunitDictionaryExperiment\Program.cs:line 11

@TimLovellSmith TimLovellSmith changed the title Performance of DoesNotContain [xunit 2.4.1] and/or correctness of Collection.DoesNotContain for Dictionary.KeyCollection [latest?] Performance of DoesNotContain [xunit 2.4.1] vs correctness of Collection.DoesNotContain for Dictionary.KeyCollection [latest?] Jan 26, 2024
@TimLovellSmith TimLovellSmith changed the title Performance of DoesNotContain [xunit 2.4.1] vs correctness of Collection.DoesNotContain for Dictionary.KeyCollection [latest?] Correctness of Assert.Contain Assert.DoesNotContain for collections Jan 26, 2024
@TimLovellSmith
Copy link
Contributor Author

TimLovellSmith commented Jan 26, 2024

I should probably split this into two issues, but then one would just be closed as fixed.

@bradwilson
Copy link
Member

bradwilson commented Jan 29, 2024

This was a lot for me to read, so I'm giving you a lot to read in return. 😂

The KeyCollection doesn't use dictionary's IComparer does seem to be a real phenomenon.
I suppose opinions may not all be the same on whether its a bug.

It's irrelevant for us now, as you'll see below. It may still be a framework bug, though whether it would get fixed or not is unclear to me (since it's likely been that way for a looooooong time).

passing the dictionary key collection to .DoesNotContain would end up treating it as a general collection and using the default equality comparer

This is true, and I think the issue lies with KeyCollection rather than us. The 99% use case will be people passing the value and the dictionary, rather than the value and the key collection, so it would probably be hard to motivate the .NET team that it's worth fixing from that angle.

Seems that there's also a more general phenomenon of not using custom .Contains overrides in collections (any more - this actually used to pass with xunit 2.4.2)

This behavior was definitely changed in 2.5.0. The usage of ICollection<T>.Contains is no longer in the code for Assert.Contains and Assert.DoesNotContain. The particulars of that implementation are a little bit odd, as they were intended to be a workaround for sets and dictionaries.

As noted, in 2.4.2 we used the dictionary's Keys collection and did a linear search across it, which doesn't have great performance, whereas we directly call .ContainsKey in 2.5.0+. This is simply a case of us not thinking over the performance implications before now. 😅 In our defense, a lot of this assertion code was written in 2006 and 2007 against .NET Framework 2.0. It never got reevaluated until I did the assertion library overhaul, which people first started seeing about 9 months ago.

It's also interesting to note that moving to .NET 6 and later dramatically increases the overload count (from 13 to 44). A good chunk of that is adding support for Memory<T> and Span<T> (and their read-only counterparts) and soon IAsyncEnumerable<T>, but also adding additional concrete overloads for the immutable container types. We've also made string-equivalent functions that accept Memory<char> or Span<char> (and their read-only counterparts) and we treat them the same way we treat strings, as opposed to treating them like linear character containers.

You'll also note that we don't offer overloads for sets and dictionaries which allow you to pass custom comparers, and that's because these containers generally require custom comparers to be passed to them during construction since Equals and GetHashCode must be aligned in order for the containers to function properly. I recently wrote a page about the differences between equality with sets vs. linear containers; it equally applies to searches with Assert.Contains with both sets and dictionaries.

AI: "The proposed code change can be improved by avoiding the use of Assert.DoesNotContain method to check if a key already exists in the dictionary. This method is not efficient because it iterates over the entire dictionary to check if the key exists. Instead, you can use the ContainsKey method of the dictionary which is more efficient because it uses hashing to check if the key exists."

AI was right (for < 2.5.0) and wrong (for >= 2.5.0). Does it stop suggesting that it was a performance problem if you upgrade to 2.5.0 or later? If so that would be exceptionally impressive, but I'm wagering that it's just going to throw the recommendation regardless. In 2.5.0, we do use ContainsKey so you don't have to. 😁

That said, I did notice a bit of what I think is oversight on my part. The implementation of the generic IEnumerable<T> version does a check against HashSet<T> in an attempt to preserve correct behavior with searches:

public static void DoesNotContain<T>(
    T expected,
    IEnumerable<T> collection)
{
    GuardArgumentNotNull(nameof(collection), collection);

    // We special case HashSet<T> because it has a custom Contains implementation that is based on the comparer
    // passed into their constructors, which we don't have access to.
    var hashSet = collection as HashSet<T>;
    if (hashSet != null)
        DoesNotContain(expected, hashSet);
    else
        DoesNotContain(expected, collection, GetEqualityComparer<T>());
}

There are a couple potential issues here:

  • Why did I choose HashSet<T> instead of the interfaces? And if we want this based on concrete type behavior, are there any other hash sets in .NET that would need to be checked for?
  • Why didn't I check for Dictionary<K,V> as well (and the other concrete dictionaries)?

It seems like an oversight on my part. I'm going to mull over if/how this should be fixed. If you have thoughts, please share them.

It's also worth noting that if you call the IEnumerable<T> overload that allows you to pass a custom comparer, we do a linear search, always. The optimizations are only for the non-comparer paths. We could further optimize here by re-creating the containers with the custom comparer, and repopulate them with the items in the original container (a strategy we do today with Assert.Equal with sets and custom comparers), but at least today nobody seems to have complained about the un-optimized path here.

@TimLovellSmith
Copy link
Contributor Author

Or ISet<T>. There seem to be other ISet<T> types which could be created with their own custom comparer, aside from hash sets. Unfortunately you don't really know which sets or collections support custom comparers by looking at the interfaces though.

We could just assume any ICollection<T> might use a custom comparer, since they can all override Contains().

And then I guess the way to check two containers are equal would logically be to check if they .Contains() all each other's elements... but if Contains is not O(1) the performance could be terrible.

@bradwilson
Copy link
Member

Yeah, it seems reasonable that any set (or dictionary) needs a custom comparer during construction and not just during search operations.

And now I'm wondering whether a Contains() search against ICollection<T> which is actually a set... what does that do? Throw? Do a linear search? Create a new temporary container?

@bradwilson
Copy link
Member

And now I'm wondering whether a Contains() search against ICollection which is actually a set... what does that do? Throw? Do a linear search? Create a new temporary container?

This was a dumb question. 😂 I had assumed ICollection<T>.Contains had an overload that allowed you to pass a comparer, but it doesn't. So of course a set wouldn't need to have any special logic.

@bradwilson
Copy link
Member

The decompiler in Visual Studio says this is the code that is backing LINQ's IEnumerable<T>.Contains(T value, IEqualityComparer<T> comparer):

public static bool Contains<T>(this IEnumerable<T> source, T value, IEqualityComparer<T> comparer)
{
    if (comparer == null)
        comparer = EqualityComparer<T>.Default;

    if (source == null)
        throw Error.ArgumentNull("source");

    foreach (TSource item in source)
        if (comparer.Equals(item, value))
            return true;

    return false;
}

That's what I thought it probably did, and I think justifies our implementation of Assert.Contains when we don't know anything better about a container than just IEnumerable<T>.

Now I just think I want to be a little more lenient than just hard-coding a check for HashSet<T>, and treat any sets or dictionaries specially when appropriate.

@bradwilson
Copy link
Member

Why didn't I check for Dictionary<K,V> as well (and the other concrete dictionaries)?

This is really a signature thing. The dictionary support is dependent on signature mapping a dictionary of TKey/TValue and accepting just a TKey key value. The generic IEnumerable for a dictionary isn't IEnumerable<TKey>, it's IEnumerable<KeyValuePair<TKey, TValue>>, so you can't get the same kind of generic signature match that you can with a set.

In general we don't support the non-generic collections for things like this (as they're antiquated), so the only thing to be done here is support sets via interface rather than concrete.

bradwilson added a commit to xunit/assert.xunit that referenced this issue Feb 4, 2024
@bradwilson
Copy link
Member

(Yes, these replies seem like stream of consciousness because I was poking around writing potential tests and implementations only to realize I should usually think more before posting 😂)

@bradwilson
Copy link
Member

The update to support ISet<T> and IReadOnlySet<T> is merged. https://xunit.net/docs/using-ci-builds

Available in v2: 2.6.7-pre.16
Available in v3: 0.1.1-pre.365

ViktorHofer added a commit to dotnet/arcade that referenced this issue Mar 28, 2024
…a6..574aebac4

574aebac4 xunit/xunit#2872: Expand special handling for sets in Assert.Contains/DoesNotContain
3b8edcbf1 xunit/xunit#2880: Update XML documentation for string-based Assert.Equal
d9f8361d2 Consolidate string and span-based Assert.Equal primary implementation
9ad71163e Move span-of-char assertions to StringAsserts and update docs/param names to indicate they're treated like strings
d70b34621 xunit/xunit#2871: Inner exception stack trace is missing from Assert.Collection failure
141681779 Missed #nullable enable in AsyncCollectionAsserts
22c89b0ea xunit/xunit#2367: Add IAsyncEnumerable<> overloads (.NET Core 3.0+)
d5c32630a While formatting type names in Assert.Equal/NotEqual, convert generated type names to '<generated>'
d7b807179 Add platform conditionals to support .NET 6 Roslyn Analyzers
6d9024665 xunit/xunit#2850: Assert.Equal failing value-slot collections of different concrete types (also fixed for KeyValuePair keys and values)
1f66b837a xunit/xunit#2811: Add SortedSet and ImmutableSortedSet overloads for Assert.Contains/DoesNotContain
1dab747d3 Update FuncEqualityComparer to throw if GetHashCode is called, and update EqualException/NotEqualException to process it
6e0a7cd70 xunit/xunit#2828: Prefer IEquatable<> over custom collection equality
c35ef46d5 xunit/xunit#2824: Assert.Equal fails with null values in dictionary
455865ac8 xunit/xunit#2821: Assert.Equal for collections of IEquatable objects don't call custom Equals
9af2c9c12 Clarify names for range comparer
2e6d9b267 Updates for .NET 8 SDK

git-subtree-dir: src/Microsoft.DotNet.XUnitAssert/src
git-subtree-split: 574aebac41dbbbf9e5e98bb9c65c6c5fab9b47f5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants