Fix ContentStore locking exceptions in async code #17246

ronaldbarendse · 2024-10-10T22:38:47Z

Prerequisites

I have added steps to test this contribution in the description below

Description

After large Deploy operations (like an environment restore) has completed and the CMS tries to update the content cache (NuCache) as part of completing the Umbraco scope, the following exceptions were thrown:

System.AggregateException: One or more errors occurred. (Write lock must be acquried.) (Exceptions were thrown by listed actions. (Object synchronization method was called from an unsynchronized block of code.) (Object synchronization method was called from an unsynchronized block of code.) (Object synchronization method was called from an unsynchronized block of code.))
   at Umbraco.Cms.Infrastructure.Scoping.Scope.TryFinally(Action[] actions)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.RobustExit(Boolean completed, Boolean onException)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.DisposeLastScope()
   at Umbraco.Cms.Infrastructure.Scoping.Scope.Dispose()
   at Umbraco.Deploy.Infrastructure.Core.AltScope.Dispose()
   at Umbraco.Deploy.Infrastructure.Work.WorkItems.SessionWorkItem.&lt;&gt;c__DisplayClass24_0.&lt;Complete&gt;b__0()
...
 System.InvalidOperationException: Write lock must be acquried.
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.EnsureLocked()
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.SetLocked(ContentNodeKit kit)
   at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.NotifyLocked(IEnumerable`1 payloads, Boolean&amp; draftChanged, Boolean&amp; publishedChanged)
   at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.Notify(JsonPayload[] payloads, Boolean&amp; draftChanged, Boolean&amp; publishedChanged)
   at Umbraco.Cms.Core.Cache.ContentCacheRefresher.NotifyPublishedSnapshotService(IPublishedSnapshotService service, AppCaches appCaches, JsonPayload[] payloads)
   at Umbraco.Cms.Core.Cache.ContentCacheRefresher.Refresh(JsonPayload[] payloads)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.DeliverLocal[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.Deliver[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.QueueRefresh[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Core.Cache.DistributedCache.RefreshByPayload[TPayload](Guid refresherGuid, TPayload[] payload)
   at Umbraco.Cms.Core.Cache.DistributedCache.RefreshByPayload[TPayload](Guid refresherGuid, IEnumerable`1 payloads)
   at Umbraco.Extensions.DistributedCacheExtensions.RefreshContentCache(DistributedCache dc, IEnumerable`1 changes)
   at Umbraco.Cms.Core.Cache.ContentTreeChangeDistributedCacheNotificationHandler.Handle(IEnumerable`1 entities)
   at Umbraco.Cms.Core.Cache.DistributedCacheNotificationHandlerBase`2.Handle(IEnumerable`1 notifications)

System.AggregateException: One or more errors occurred. (Could not enter monitor before timeout in content store)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.TryFinally(Action[] actions)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.RobustExit(Boolean completed, Boolean onException)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.DisposeLastScope()
   at Umbraco.Cms.Infrastructure.Scoping.Scope.Dispose()
   at Umbraco.Deploy.Infrastructure.Core.AltScope.Dispose()
   at Umbraco.Deploy.Infrastructure.Work.WorkItems.SessionWorkItem.&lt;&gt;c__DisplayClass24_0.&lt;Complete&gt;b__0()
...
System.TimeoutException: Could not enter monitor before timeout in content store
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.Lock(WriteLockInfo lockInfo, Boolean forceGen)
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.ScopedWriteLock..ctor(ContentStore store, Boolean scoped)
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.&lt;GetScopedWriteLock&gt;b__29_0(Boolean scoped)
   at Umbraco.Cms.Core.Scoping.ScopeContextualBase.&lt;&gt;c__DisplayClass2_0`1.&lt;Get&gt;b__0()
   at Umbraco.Cms.Core.Scoping.ScopeContext.Enlist[T](String key, Func`1 creator, Action`2 action, Int32 priority)
   at Umbraco.Cms.Core.Scoping.ScopeContextualBase.Get[T](ICoreScopeProvider scopeProvider, String key, Func`2 ctor)
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.GetScopedWriteLock(ICoreScopeProvider scopeProvider)
   at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.Notify(JsonPayload[] payloads, Boolean&amp; draftChanged, Boolean&amp; publishedChanged)
   at Umbraco.Cms.Core.Cache.ContentCacheRefresher.NotifyPublishedSnapshotService(IPublishedSnapshotService service, AppCaches appCaches, JsonPayload[] payloads)
   at Umbraco.Cms.Core.Cache.ContentCacheRefresher.Refresh(JsonPayload[] payloads)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.DeliverLocal[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.Deliver[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.QueueRefresh[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Core.Cache.DistributedCache.RefreshByPayload[TPayload](Guid refresherGuid, TPayload[] payload)
   at Umbraco.Cms.Core.Cache.DistributedCache.RefreshByPayload[TPayload](Guid refresherGuid, IEnumerable`1 payloads)
   at Umbraco.Extensions.DistributedCacheExtensions.RefreshContentCache(DistributedCache dc, IEnumerable`1 changes)
   at Umbraco.Cms.Core.Cache.ContentTreeChangeDistributedCacheNotificationHandler.Handle(IEnumerable`1 entities)
   at Umbraco.Cms.Core.Cache.DistributedCacheNotificationHandlerBase`2.Handle(IEnumerable`1 notifications)

This all originates from the ContentStore either not 'seeing' it already has a write lock or getting a timeout when trying to get it (essentially deadlocking). Looking at the locking implementation in this class, it uses Monitor, which has the following requirement:

Because the Monitor class has thread affinity, the thread that acquired a lock must release the lock by calling the Monitor.Exit method.

But looking at the code and stack trace, the lock/monitor is entered and exited in different methods and they can potentially be called from different threads, which is especially likely if it contains async calls in-between them (Deploy heavily uses async and does more work in a single lock, further increasing the chances of exiting on a different thread).

I've added a test showcasing one of the exceptions and will push a fix that uses SemaphoreSlim.

ronaldbarendse · 2024-10-11T07:38:32Z

As expected, the test without fix results in this exception:

Failed SetLocked [462 ms]
  Error Message:
   System.Threading.SynchronizationLockException : Object synchronization method was called from an unsynchronized block of code.
  Stack Trace:
     at System.Threading.Monitor.Exit(Object obj)
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.Release(WriteLockInfo lockInfo, Boolean commit) in /_/src/Umbraco.PublishedCache.NuCache/ContentStore.cs:line 412
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.ScopedWriteLock.Release(Boolean completed) in /_/src/Umbraco.PublishedCache.NuCache/ContentStore.cs:line 311
   at Umbraco.Cms.Core.Scoping.ScopeContextualBase.Dispose() in /_/src/Umbraco.Infrastructure/Scoping/ScopeContextualBase.cs:line 76
   at Umbraco.Cms.Tests.Integration.Umbraco.Infrastructure.PublishedCache.ContentCacheTests.SetLocked() in /_/tests/Umbraco.Tests.Integration/Umbraco.Infrastructure/PublishedCache/ContentCacheTests.cs:line 75

I've applied the same change in SnapDictionary (which has more locking tests in SnapDictionaryTests that all still pass). I also noticed an unused MonitorLock class and obsoleted it.

The only remaining usage of Monitor that can potentially enter/exit on different threads is in HttpContextRequestAppCache:

Umbraco-CMS/src/Umbraco.Web.Common/Cache/HttpContextRequestAppCache.cs

Lines 215 to 263 in 7787af2

    
           protected override void EnterReadLock() 
        
           { 
        
               var locker = GetLock(); 
        
               if (locker == null) 
        
               { 
        
                   return; 
        
               } 
        
               Monitor.Enter(locker); 
        
           } 
        
           protected override void EnterWriteLock() 
        
           { 
        
               var locker = GetLock(); 
        
               if (locker == null) 
        
               { 
        
                   return; 
        
               } 
        
               Monitor.Enter(locker); 
        
           } 
        
           protected override void ExitReadLock() 
        
           { 
        
               var locker = GetLock(); 
        
               if (locker == null) 
        
               { 
        
                   return; 
        
               } 
        
               if (Monitor.IsEntered(locker)) 
        
               { 
        
                   Monitor.Exit(locker); 
        
               } 
        
           } 
        
           protected override void ExitWriteLock() 
        
           { 
        
               var locker = GetLock(); 
        
               if (locker == null) 
        
               { 
        
                   return; 
        
               } 
        
               if (Monitor.IsEntered(locker)) 
        
               { 
        
                   Monitor.Exit(locker); 
        
               } 
        
           }

ExamineIndexRebuilder also uses Monitor, but always exits within the same method and therefore doesn't have the same thread synchronization issues.

bergmania

Code looks good to me

* Add ContentCache test * Use SemaphoreSlim as write lock * Apply lock imrpovements to SnapDictionary * Obsolete unused MonitorLock (cherry picked from commit c3db345)

ronaldbarendse · 2024-10-11T07:50:17Z

Awesome, let's verify whether this not only fixes the test, but also the runtime exceptions!👍🏻

Looking more into HttpContextRequestAppCache, I can see the locks are all entered/exited within the same methods (and thus same threads): it's just that the EnterReadLock()/ExitReadLock() and EnterWriteLock()/ExitWriteLock() are protected methods and can potentially be called incorrectly, but that's currently not the case...

* Add ContentCache test * Use SemaphoreSlim as write lock * Apply lock imrpovements to SnapDictionary * Obsolete unused MonitorLock (cherry picked from commit c3db345)

sauron · 2024-11-11T14:27:57Z

Hello, I wanted to touch base with you all, because, since this was applied automatically by Umbraco Cloud,
we are getting the (Recursive locks not allowed) on tasks we didn't have before while doing simple things,
like moving nodes between folders

or while running recurring Jobs that pulls from APIs and store locally.
The RecurringJob is a simple request to an API then insert or update into Umbraco under a folder like:

var results = "pull from API and parse with the Proper class"
foreach (var item in results)
{
    IContent dbNode = null;
    IPublishedContent? found = _publishedContentQuery.Content("<parentFOlderID>").Children().Where(x => x.Value<string>("externalId") == item.Id.Trim()).FirstOrDefault();
    if(found != null){
        dbNode = ContentService.GetById(found.Id);
    } else {
        dbNode = ContentService.Create(track.Title, int.Parse(parentId), "<nodeType>");
    }
    // Set Values
    dbNode.SetValue("...." , item.Id);
    // .....

    try
    {
        ContentService.SaveAndPublish(dbNode);
    }
    catch (AggregateException ae)
    {
        Console.WriteLine("Caught aggregate exception behavior"+ae.Message);
    }
}

While using try/catch avoid raising the error, we are wondering if there is a way to "release" the node being updated/inserted or a different approach we should take to run this recurring jobs to avoid this error.

For moving content between folders, we are "workarounding" it by moving smaller groups of content.

Is this the intended behavoir? Should I open a new issue?

ronaldbarendse · 2024-11-15T08:03:12Z

Hello, I wanted to touch base with you all, because, since this was applied automatically by Umbraco Cloud,
we are getting the (Recursive locks not allowed) on tasks we didn't have before while doing simple things,

Looking at the stack trace, the error happens in a notification handler (as notifications are published in the TryFinally() method after completing/disposing the scope). Do you have any custom handlers registered that trigger when moving content (like ContentSavedNotification or ContentMovedNotification)? And the code that's running in the recurring job, is that creating its own scope before creating/saving content, as is shown in the documentation?

lauren-g-2025 · 2025-01-14T16:05:58Z

We are experiencing the same issue as user sauron, starting mid-December on one of our Umbraco Cloud sites. We have two Hangfire jobs that run at the same time and now seem to interfere with each other, causing Recursive locks not allowed.

We can probably get round it by making the jobs run at different times, but it'd be good to understand the root cause here. I can reproduce the issue locally by triggering the two jobs together. Adding ICoreScopeProvider as in the documentation has not helped.

sauron · 2025-01-14T17:45:07Z

@lauren-g-2025 After reviewing our code and verifying that we were not customizing the handler, we end up "workarounding" it.
We added a try/catch syntax around the saveAndPublish call and we also made sure that we were triggering async using the await Task.Run(() => ...

So basically:

                    try
                    {
                         await Task.Run(() => _contentService.SaveAndPublish(article));
                    }
                    catch (AggregateException ae)
                    {
                        Console.WriteLine("Caught aggregate exception behavior "+ae.Message);
                    }

But going back to the main problem it is definitively this change.
if we run the app with the previous version of Umbraco, the one that didn't include this change, we were able to run everything smoothly.

I hope this helps.

ronaldbarendse · 2025-01-15T10:23:56Z

@sauron Can you share the stack traces of the inner exceptions of that AggregateException? The fix in this PR ensures the lock is correctly seen by other threads and the current IScope ensures only a single write lock is taken (as recursive locks are indeed not supported). So it seems like some code is executing outside of the scope that took the write lock and is also trying to write to the content store.

The write lock is enlisted in the current scope here:

Umbraco-CMS/src/Umbraco.Infrastructure/Scoping/ScopeContextualBase.cs

Lines 47 to 58 in fd9c1a0

    
           // no scope context = create a non-scoped object 
        
           IScopeContext? scopeContext = scopeProvider.Context; 
        
           if (scopeContext == null) 
        
           { 
        
               return ctor(false); 
        
           } 
        
           // create & enlist the scoped object 
        
           T? w = scopeContext.Enlist( 
        
               "ScopeContextualBase_" + key, 
        
               () => ctor(true), 
        
               (completed, item) => { item?.Release(completed); });

If there's no scope, it will call the ctor() and create a new ScopedWriteLock:

Umbraco-CMS/src/Umbraco.PublishedCache.NuCache/ContentStore.cs

Lines 317 to 318 in ed0b236

    
           public IDisposable? GetScopedWriteLock(ICoreScopeProvider scopeProvider) => 
        
               ScopeContextualBase.Get(scopeProvider, _instanceId, scoped => new ScopedWriteLock(this, scoped));

Also, using await Task.Run() can cause other issues and in the above example only causes additional overhead and can potentially result in thread starvation. Also see this comment in the Scope.Dispose() method:

Umbraco-CMS/src/Umbraco.Infrastructure/Scoping/Scope.cs

Line 367 in fd9c1a0

    
           $"The {nameof(Scope)} {InstanceId} being disposed is not the Ambient {nameof(Scope)} {_scopeProvider.AmbientScope?.InstanceId.ToString() ?? "NULL"}. This typically indicates that a child {nameof(Scope)} was not disposed, or flowed to a child thread that was not awaited, or concurrent threads are accessing the same {nameof(Scope)} (Ambient context) which is not supported. If using Task.Run (or similar) as a fire and forget tasks or to run threads in parallel you must suppress execution context flow with ExecutionContext.SuppressFlow() and ExecutionContext.RestoreFlow().";

You might need to wrap the code in a using (ExecutionContext.SuppressFlow()), like the webhook background job does:

Umbraco-CMS/src/Umbraco.Infrastructure/BackgroundJobs/Jobs/WebhookFiring.cs

Lines 62 to 86 in fd9c1a0

    
           await Task.WhenAll(requests.Select(request => 
        
           { 
        
               using (ExecutionContext.SuppressFlow()) 
        
               { 
        
                   return Task.Run(async () => 
        
                   { 
        
                       IWebhook? webhook = await _webHookService.GetAsync(request.WebhookKey); 
        
                       if (webhook is null) 
        
                       { 
        
                           return; 
        
                       } 
        
                       using HttpResponseMessage? response = await SendRequestAsync(webhook, request.EventAlias, request.RequestObject, request.RetryCount, CancellationToken.None); 
        
                       if ((response?.IsSuccessStatusCode ?? false) || request.RetryCount >= _webhookSettings.MaximumRetries) 
        
                       { 
        
                           await _webhookRequestService.DeleteAsync(request); 
        
                       } 
        
                       else 
        
                       { 
        
                           request.RetryCount++; 
        
                           await _webhookRequestService.UpdateAsync(request); 
        
                       } 
        
                   }); 
        
               } 
        
           }));

@lauren-g-2025 Umbraco uses distributed write locks, so only a single write operation (like updating content) can be done at a time, so running 2 background jobs at the same isn't recommended for that reason alone. But otherwise, it seems like Hangfire executes jobs from the same thread, potentially causing issues with Umbraco contexts being shared between jobs, so the same SuppressFlow() needs to be added, see also: https://discuss.hangfire.io/t/use-of-asynclocal-t-in-a-job/9443

Add ContentCache test

Unverified

This user has not yet uploaded their public signing key.

GPG key ID: 399971774A2E75BA

Learn about vigilant mode

Loading
Loading status checks…

6ee6529

ronaldbarendse added type/bug affected/v13 labels Oct 10, 2024

ronaldbarendse added 3 commits October 11, 2024 01:18

Use SemaphoreSlim as write lock

Loading
Loading status checks…

970326b

Apply lock imrpovements to SnapDictionary

Loading
Loading status checks…

0864c14

Obsolete unused MonitorLock

Loading
Loading status checks…

cd73c8a

bergmania approved these changes Oct 11, 2024

View reviewed changes

bergmania merged commit c3db345 into v13/dev Oct 11, 2024
12 of 15 checks passed

bergmania deleted the v13/bugfix/contentstore-locking branch October 11, 2024 07:45

bergmania added release/13.5.2 and removed affected/v13 labels Oct 16, 2024

bergmania added the release/14.3.1 label Oct 16, 2024

ronaldbarendse mentioned this pull request Dec 3, 2024

Deploy Failure on GIT Push: "Recursive locks not allowed" umbraco/Umbraco.Deploy.Issues#108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ContentStore locking exceptions in async code #17246

Fix ContentStore locking exceptions in async code #17246

ronaldbarendse commented Oct 10, 2024

ronaldbarendse commented Oct 11, 2024

bergmania left a comment

ronaldbarendse commented Oct 11, 2024

sauron commented Nov 11, 2024 •

edited

Loading

ronaldbarendse commented Nov 15, 2024

lauren-g-2025 commented Jan 14, 2025 •

edited

Loading

sauron commented Jan 14, 2025

ronaldbarendse commented Jan 15, 2025

Fix ContentStore locking exceptions in async code #17246

Fix ContentStore locking exceptions in async code #17246

Conversation

ronaldbarendse commented Oct 10, 2024

Prerequisites

Description

ronaldbarendse commented Oct 11, 2024

bergmania left a comment

Choose a reason for hiding this comment

ronaldbarendse commented Oct 11, 2024

sauron commented Nov 11, 2024 • edited Loading

ronaldbarendse commented Nov 15, 2024

lauren-g-2025 commented Jan 14, 2025 • edited Loading

sauron commented Jan 14, 2025

ronaldbarendse commented Jan 15, 2025

sauron commented Nov 11, 2024 •

edited

Loading

lauren-g-2025 commented Jan 14, 2025 •

edited

Loading