sync-over-async in ConnectionMultiplexer #2223

gdoron · 2022-08-21T00:07:01Z

Hi,

While I was searching for deadlocks in our UTs projects, I've found multiple 3rd libraries use sync-over-async (anti-)pattern.
One of which was redis.stackexchange

StackExchange.Redis/src/StackExchange.Redis/ConnectionMultiplexer.cs

Lines 697 to 699 in 7293213

    
           var task = muxer.ReconfigureAsync(first: true, reconfigureAll: false, logProxy, null, "connect"); 
        
           if (!task.Wait(muxer.SyncConnectTimeout(true)))

It's even a bigger problem considering xUnit is using 2 SynchronizationContexts, even in .NET Core: xunit/xunit#2573
And that ASP.NET Core itself is using blocking apis: dotnet/aspnetcore#43353

Since adding DataProtection is being done on application startup:
public void ConfigureServices(IServiceCollection services), we cannot use the async versions of redis.stackexchange.

For reference, @mgravell's post from 8 years ago about the wonders of sync-over-async... 😉
https://blog.marcgravell.com/2014/03/beware-jamming-thread-pool.html

The text was updated successfully, but these errors were encountered:

NickCraver · 2022-08-21T01:16:16Z

I'm trying to understand what the ask is here - what's the issue? We offer a full async path in .ConnectAsync(), beyond that we minimize sync-over-async in .Connect(), but you're talking about socket I/O across a network to another server - it's all inherently async. If you want to call it synchronously, there's no magical out - it's going to be sync-over-async at some point.

gdoron · 2022-08-21T06:12:47Z

Yes, I was not clear at all, sorry.
I thought of adding configureAwait(false) to not cause issues inside xUnit integration tests.

NickCraver · 2022-08-21T13:57:36Z

@gdoron are you seeing a lock around this? We can add a .ForAwait(), won't hurt, but am surprised you're seeing a dead lock in this specific combination if so.

gdoron · 2022-08-21T15:34:02Z

I did see a deadlock there.
Bear in mind we have this code running in 15 different threads, because we use integration tests that spin webservers with WebApplicationFactory.

For now we got around this with simply not adding DataProtection with redis in UTs, since anyway it's a single device\pod.

But Ideally, I would say it would be better if redis.StackExchange would keep ConfigureAwait(false) on all its blocking sync over async, and then we can keep UTs flow as staging-prod flows.
What do you think?

NickCraver · 2022-08-21T16:10:16Z

We do this most places, but agree this is one of the few it isn't. For context: .ForAwait() does the same thing - it's our shorter extension internally.

mgravell · 2022-08-24T10:42:10Z

This scenario is hugely contextual. I absolutely agree that sync-over-async is a dangerous anti-pattern that should be aggressively avoided; that doesn't mean it is entirely without massively caveated usage, nor does it mean that it is inherently doomed to deadlock - by which I mean: in the evil nasty times you do use it, it is still marginally possible to avoid the worst outcomes, for example deadlocks.

Now; the scenario you cite in the example here is ConnectImpl; importantly: ConnectImpl is only used as a helper from Connect; if you don't want to open that can of worms: don't call Connect - call ConnectAsync instead. As a fundamental statement of reality: if you use the synchronous API on a library that involves external IO, we're going to have to block the thread. The .Wait that you site simply represents that block. This is actually a very unusual example from the library - usually for individual commands, we use a Monitor based wait rather than task-based, but in this case there is merit to reusing a huge chunk of handshake code that is implemented via async Task, rather than having two completely parallel implementations of the same thing.

So: the interesting question becomes: does the shared task-based code fail to guard against deadlocks? I'm very aware of the sync-context concerns here, so I've added some tests to investigate this; I have one unexpected failure surrounding ConfigureAsync, which I will investigate. ConnectAsync: seems fine.

To be explicit: the deadlock scenario you're referring to is presumably an async operation in the library calling back into the sync-context via await that lacks continueOnCapturedContext: false; if you have a specific example of when you've seen this, I would be very interested - however, to emphasize: I cannot reproduce this for Connect/ConnectAsync. I will investigate the ConfigureAsync failue.

mgravell · 2022-08-24T10:43:26Z

^^^ tl;dr; I can't find a problem in the code you linked to, but I have found a tangential related problem, which I will investigate; this found problem is very specific to one API, so if you have more details on where you've observed a deadlock: please let me know so I can investigate that too

mgravell · 2022-08-24T10:45:09Z

this specifically is the failing test/scenario (the first method - SyncConfigure - passes and is included purely for comparison):

        [Fact]
        public void SyncConfigure()
        {
            using var ctx = new MySyncContext();
            using var conn = Create();
            Assert.Equal(0, ctx.OpCount);
            Assert.True(conn.Configure());
            Assert.Equal(0, ctx.OpCount);
        }

        [Theory]
        [InlineData(true)] // net472: pass; net6.0: fail (expected 0, actual 1)
        [InlineData(false)] // net472\net6.0: fail (expected 0, actual 1)
        public async Task AsyncConfigure(bool continueOnCapturedContext)
        {
            using var ctx = new MySyncContext();
            using var conn = Create();
            Assert.Equal(0, ctx.OpCount);
            Assert.True(await conn.ConfigureAsync(Writer).ConfigureAwait(continueOnCapturedContext));
            if (continueOnCapturedContext)
            {
                Assert.True(ctx.OpCount > 0, $"Opcount: {ctx.OpCount}");
            }
            else
            {
                Assert.Equal(0, ctx.OpCount);
            }
        }

There seem to be two different potential failures here that I'm investigating:

a possible sync-context usage inside ConfigureAsync
on net6.0 (not net472), ConfigureAwait seems to be running to completion synchronously even though IO is involved (obviously not intended)

mgravell · 2022-08-24T11:09:51Z

Good news: the sync-context usage in ConfigureAsync has been resolved in the branch - just a missing ForAwait(); I've done a sweep of all missing uses, but again: if you have a specific API where you saw this, please let me know

I'm continuing to investigate the ran-to-completion issue, because that is even more of a head-scratcher (edit: this just turned out to be race conditions in the test, with local near-zero latency)

mgravell · 2022-08-24T12:28:23Z

@NickCraver ready for review ^^^

…er (#2229) * add test to investigate #2223 * add connect test * assert zeros in SyncConfigure * add missing ForAwait uses * investigate rantocompletion fault * fix brittle test * stabilize tests * release notes

gdoron · 2022-08-30T17:54:29Z

Sorry for ghosting, for some reason github did not send me notifications.

I've put the sync code connecting redis inside a (disgusting) if block, to avoid this thing all together, so it does not reproduce any more (and even before like with all deadlocks, not easy to reproduce).

if (!IsInXunitTest.IsInTest) {
    // Using redis as our data protection persistency storage
    var redis = ConnectionMultiplexer.Connect(
        CommonConfiguration.RedisHybridCacheConfiguration.ConnectionString);
    services.AddDataProtection(options => {
        options.ApplicationDiscriminator = AuthenticationConsts.SharedApplicationName;
    })
        .SetApplicationName(AuthenticationConsts.SharedApplicationName)
        .PersistKeysToStackExchangeRedis(redis, AuthenticationConsts.SharedRedisKey);
}

But I believe the dead lock was in this line:
var redis = ConnectionMultiplexer.Connect(CommonConfiguration.RedisHybridCacheConfiguration.ConnectionString);
But I'm not 100% sure anymore. But for sure in the stacktrace I saw stackexchange.redis and it originated from this code in our startup class.

I hope it helps, and thanks for the fix! @mgravell

NickCraver added the ❔ question label Aug 21, 2022

mgravell added a commit that referenced this issue Aug 24, 2022

add test to investigate #2223

83a51be

mgravell mentioned this issue Aug 24, 2022

Resolve sync-context issues (missing configureawait) in the multiplexer #2229

Merged

NickCraver linked a pull request Aug 24, 2022 that will close this issue

Resolve sync-context issues (missing configureawait) in the multiplexer #2229

Merged

mgravell closed this as completed in #2229 Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync-over-async in ConnectionMultiplexer #2223

sync-over-async in ConnectionMultiplexer #2223

gdoron commented Aug 21, 2022

NickCraver commented Aug 21, 2022

gdoron commented Aug 21, 2022

NickCraver commented Aug 21, 2022

gdoron commented Aug 21, 2022

NickCraver commented Aug 21, 2022

mgravell commented Aug 24, 2022

mgravell commented Aug 24, 2022

mgravell commented Aug 24, 2022 •

edited

mgravell commented Aug 24, 2022 •

edited

mgravell commented Aug 24, 2022

gdoron commented Aug 30, 2022

sync-over-async in ConnectionMultiplexer #2223

sync-over-async in ConnectionMultiplexer #2223

Comments

gdoron commented Aug 21, 2022

NickCraver commented Aug 21, 2022

gdoron commented Aug 21, 2022

NickCraver commented Aug 21, 2022

gdoron commented Aug 21, 2022

NickCraver commented Aug 21, 2022

mgravell commented Aug 24, 2022

mgravell commented Aug 24, 2022

mgravell commented Aug 24, 2022 • edited

mgravell commented Aug 24, 2022 • edited

mgravell commented Aug 24, 2022

gdoron commented Aug 30, 2022

mgravell commented Aug 24, 2022 •

edited

mgravell commented Aug 24, 2022 •

edited