Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync-over-async in ConnectionMultiplexer #2223

Closed
gdoron opened this issue Aug 21, 2022 · 11 comments · Fixed by #2229
Closed

sync-over-async in ConnectionMultiplexer #2223

gdoron opened this issue Aug 21, 2022 · 11 comments · Fixed by #2229

Comments

@gdoron
Copy link

gdoron commented Aug 21, 2022

Hi,

While I was searching for deadlocks in our UTs projects, I've found multiple 3rd libraries use sync-over-async (anti-)pattern.
One of which was redis.stackexchange

var task = muxer.ReconfigureAsync(first: true, reconfigureAll: false, logProxy, null, "connect");
if (!task.Wait(muxer.SyncConnectTimeout(true)))

It's even a bigger problem considering xUnit is using 2 SynchronizationContexts, even in .NET Core: xunit/xunit#2573
And that ASP.NET Core itself is using blocking apis: dotnet/aspnetcore#43353

Since adding DataProtection is being done on application startup:
public void ConfigureServices(IServiceCollection services), we cannot use the async versions of redis.stackexchange.

For reference, @mgravell's post from 8 years ago about the wonders of sync-over-async... 😉
https://blog.marcgravell.com/2014/03/beware-jamming-thread-pool.html

@NickCraver
Copy link
Collaborator

I'm trying to understand what the ask is here - what's the issue? We offer a full async path in .ConnectAsync(), beyond that we minimize sync-over-async in .Connect(), but you're talking about socket I/O across a network to another server - it's all inherently async. If you want to call it synchronously, there's no magical out - it's going to be sync-over-async at some point.

@gdoron
Copy link
Author

gdoron commented Aug 21, 2022

Yes, I was not clear at all, sorry.
I thought of adding configureAwait(false) to not cause issues inside xUnit integration tests.

@NickCraver
Copy link
Collaborator

@gdoron are you seeing a lock around this? We can add a .ForAwait(), won't hurt, but am surprised you're seeing a dead lock in this specific combination if so.

@gdoron
Copy link
Author

gdoron commented Aug 21, 2022

I did see a deadlock there.
Bear in mind we have this code running in 15 different threads, because we use integration tests that spin webservers with WebApplicationFactory.

For now we got around this with simply not adding DataProtection with redis in UTs, since anyway it's a single device\pod.

But Ideally, I would say it would be better if redis.StackExchange would keep ConfigureAwait(false) on all its blocking sync over async, and then we can keep UTs flow as staging-prod flows.
What do you think?

@NickCraver
Copy link
Collaborator

We do this most places, but agree this is one of the few it isn't. For context: .ForAwait() does the same thing - it's our shorter extension internally.

mgravell added a commit that referenced this issue Aug 24, 2022
@mgravell
Copy link
Collaborator

This scenario is hugely contextual. I absolutely agree that sync-over-async is a dangerous anti-pattern that should be aggressively avoided; that doesn't mean it is entirely without massively caveated usage, nor does it mean that it is inherently doomed to deadlock - by which I mean: in the evil nasty times you do use it, it is still marginally possible to avoid the worst outcomes, for example deadlocks.

Now; the scenario you cite in the example here is ConnectImpl; importantly: ConnectImpl is only used as a helper from Connect; if you don't want to open that can of worms: don't call Connect - call ConnectAsync instead. As a fundamental statement of reality: if you use the synchronous API on a library that involves external IO, we're going to have to block the thread. The .Wait that you site simply represents that block. This is actually a very unusual example from the library - usually for individual commands, we use a Monitor based wait rather than task-based, but in this case there is merit to reusing a huge chunk of handshake code that is implemented via async Task, rather than having two completely parallel implementations of the same thing.

So: the interesting question becomes: does the shared task-based code fail to guard against deadlocks? I'm very aware of the sync-context concerns here, so I've added some tests to investigate this; I have one unexpected failure surrounding ConfigureAsync, which I will investigate. ConnectAsync: seems fine.

To be explicit: the deadlock scenario you're referring to is presumably an async operation in the library calling back into the sync-context via await that lacks continueOnCapturedContext: false; if you have a specific example of when you've seen this, I would be very interested - however, to emphasize: I cannot reproduce this for Connect/ConnectAsync. I will investigate the ConfigureAsync failue.

@mgravell
Copy link
Collaborator

^^^ tl;dr; I can't find a problem in the code you linked to, but I have found a tangential related problem, which I will investigate; this found problem is very specific to one API, so if you have more details on where you've observed a deadlock: please let me know so I can investigate that too

@mgravell
Copy link
Collaborator

mgravell commented Aug 24, 2022

this specifically is the failing test/scenario (the first method - SyncConfigure - passes and is included purely for comparison):

        [Fact]
        public void SyncConfigure()
        {
            using var ctx = new MySyncContext();
            using var conn = Create();
            Assert.Equal(0, ctx.OpCount);
            Assert.True(conn.Configure());
            Assert.Equal(0, ctx.OpCount);
        }

        [Theory]
        [InlineData(true)] // net472: pass; net6.0: fail (expected 0, actual 1)
        [InlineData(false)] // net472\net6.0: fail (expected 0, actual 1)
        public async Task AsyncConfigure(bool continueOnCapturedContext)
        {
            using var ctx = new MySyncContext();
            using var conn = Create();
            Assert.Equal(0, ctx.OpCount);
            Assert.True(await conn.ConfigureAsync(Writer).ConfigureAwait(continueOnCapturedContext));
            if (continueOnCapturedContext)
            {
                Assert.True(ctx.OpCount > 0, $"Opcount: {ctx.OpCount}");
            }
            else
            {
                Assert.Equal(0, ctx.OpCount);
            }
        }

There seem to be two different potential failures here that I'm investigating:

  1. a possible sync-context usage inside ConfigureAsync
  2. on net6.0 (not net472), ConfigureAwait seems to be running to completion synchronously even though IO is involved (obviously not intended)

@mgravell
Copy link
Collaborator

mgravell commented Aug 24, 2022

Good news: the sync-context usage in ConfigureAsync has been resolved in the branch - just a missing ForAwait(); I've done a sweep of all missing uses, but again: if you have a specific API where you saw this, please let me know

I'm continuing to investigate the ran-to-completion issue, because that is even more of a head-scratcher (edit: this just turned out to be race conditions in the test, with local near-zero latency)

@mgravell
Copy link
Collaborator

@NickCraver ready for review ^^^

mgravell added a commit that referenced this issue Aug 24, 2022
…er (#2229)

* add test to investigate #2223

* add connect test

* assert zeros in SyncConfigure

* add missing ForAwait uses

* investigate rantocompletion fault

* fix brittle test

* stabilize tests

* release notes
@gdoron
Copy link
Author

gdoron commented Aug 30, 2022

Sorry for ghosting, for some reason github did not send me notifications.

I've put the sync code connecting redis inside a (disgusting) if block, to avoid this thing all together, so it does not reproduce any more (and even before like with all deadlocks, not easy to reproduce).

if (!IsInXunitTest.IsInTest) {
    // Using redis as our data protection persistency storage
    var redis = ConnectionMultiplexer.Connect(
        CommonConfiguration.RedisHybridCacheConfiguration.ConnectionString);
    services.AddDataProtection(options => {
        options.ApplicationDiscriminator = AuthenticationConsts.SharedApplicationName;
    })
        .SetApplicationName(AuthenticationConsts.SharedApplicationName)
        .PersistKeysToStackExchangeRedis(redis, AuthenticationConsts.SharedRedisKey);
}

But I believe the dead lock was in this line:
var redis = ConnectionMultiplexer.Connect(CommonConfiguration.RedisHybridCacheConfiguration.ConnectionString);
But I'm not 100% sure anymore. But for sure in the stacktrace I saw stackexchange.redis and it originated from this code in our startup class.

I hope it helps, and thanks for the fix! @mgravell

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants