Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: race conditions in data store save #13372

Merged
merged 4 commits into from
Mar 18, 2025
Merged

fix: race conditions in data store save #13372

merged 4 commits into from
Mar 18, 2025

Conversation

ascorbic
Copy link
Contributor

@ascorbic ascorbic commented Mar 6, 2025

Changes

This fixes a few race condition bugs that manifested in sites with lots of very large markdown files.

Adds a method to the data store object that returns a promise that resolves when saving to disk is complete.

Resets the #dirty flag before saving to disk. The main cause of #13310 was that debounced saves that happened while the large data store file was being written to disk would be lost because the dirty flag was cleared once the file write was complete, missing the fact that more more entries need writing.

Fixes #13310

Testing

Tested with the #13310 repro

Docs

Copy link

changeset-bot bot commented Mar 6, 2025

🦋 Changeset detected

Latest commit: 8739a71

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions bot added the pkg: astro Related to the core `astro` package (scope) label Mar 6, 2025
Copy link

codspeed-hq bot commented Mar 6, 2025

CodSpeed Performance Report

Merging #13372 will not alter performance

Comparing data-save-race (8739a71) with main (d83f92a)

Summary

✅ 6 untouched benchmarks

@ascorbic
Copy link
Contributor Author

ascorbic commented Mar 6, 2025

Looks like this must be causing a slowdown, so there's a timeout. Setting back to draft

@ascorbic ascorbic marked this pull request as draft March 6, 2025 17:31
@ascorbic ascorbic self-assigned this Mar 7, 2025
@justin5267
Copy link

Looks like this must be causing a slowdown, so there's a timeout. Setting back to draft

Thank you for your hard work! The issue of the long synchronization time during the first run had already been reported in issue #13050 . After using caching in subsequent runs, the time became acceptable, so I proactively closed the issue. I truly appreciate your efforts!

ascorbic added 2 commits March 7, 2025 13:11
@ascorbic ascorbic marked this pull request as ready for review March 7, 2025 14:20
@justin5267
Copy link

Hello, @ascorbic I tested using https://github.com/justin5267/test and found that this PR @ did not resolve the issue. After running pnpm run dev, the project starts normally, but the data-store.json file is incomplete, with a size of only about 190MB, and some pages are missing.

As a comparison, modifying "SAVE_DEBOUNCE_MS" from 500 to 5000 and running pnpm run dev again resolves the issue. All pages generate correctly, and the size of data-store.json increases to approximately 540MB.

@ematipico ematipico added the pr preview This PR has a preview release label Mar 10, 2025
Copy link

pkg-pr-new bot commented Mar 10, 2025

astro

npm i https://pkg.pr.new/astro@13372

@astrojs/cloudflare

npm i https://pkg.pr.new/@astrojs/cloudflare@13372

@astrojs/netlify

npm i https://pkg.pr.new/@astrojs/netlify@13372

@astrojs/node

npm i https://pkg.pr.new/@astrojs/node@13372

@astrojs/vercel

npm i https://pkg.pr.new/@astrojs/vercel@13372

commit: 8739a71

@ematipico
Copy link
Member

I just tested the preview release against the reproduction provided by @justin5267, and the project is stuck at syncing content for a while...

Screenshot 2025-03-10 at 09 44 02

@justin5267
Copy link

justin5267 commented Mar 10, 2025

I just tested the preview release against the reproduction provided by @justin5267, and the project is stuck at syncing content for a while...

Screenshot 2025-03-10 at 09 44 02

Well, it's not actually stuck; it just takes a long time to synchronize all the content on the first startup, around 200,000ms. However, after caching, subsequent startups are much faster. I previously submitted a separate issue for this (#13050).

@ematipico
Copy link
Member

Can we improve the DX in the console? Since the log goes to a newline and nothing gets written, I thought everything was stuck 😓

@ascorbic
Copy link
Contributor Author

@ematipico something like printing "this may take some time", or periodically printing an update? Currently I don't think have the ability to display a progress indicator with our logger, do we?

@ascorbic
Copy link
Contributor Author

@justin5267 hmm. It did fix it for me. Let me look again and see.

@ematipico
Copy link
Member

@ematipico something like printing "this may take some time", or periodically printing an update? Currently I don't think have the ability to display a progress indicator with our logger, do we?

I think we have the yocto-spinner somewhere, but I don't remember where we use it (create-astro?). We can go with a simpler solution IMHO. When logging, it's possible to pass a boolean argument to not add a newline at the end of the log. We could printing the same phrase with three dots e.g. ., .., ..., ., .., etc.

@justin5267
Copy link

justin5267 commented Mar 10, 2025

@justin5267 hmm. It did fix it for me. Let me look again and see.

wechat_2025-03-10_174651_415

I have occasionally encountered a similar issue, where the first time I run pnpm run dev, all pages are generated correctly, but after modifying the content or deleting the .astro folder, running it again results in incomplete generation. Could you try deleting the .astro folder and then rerun pnpm run dev to check if the /docs/w1-1h.html page is generated correctly, or if the data-store.json file is around 540MB+ in size? If both answers are yes, it would indicate that the issue is resolved.

@ematipico
Copy link
Member

but after modifying the content

I modify the content, and it's correctly updated without issues.

or deleting the .astro folder

That's not a standard user flow, actually. Users shouldn't delete the cache folder because Astro is in charge of it. Not sure if we should account for such cases.

@justin5267
Copy link

it's correctly updated without issues.

Could you kindly help check if the http://localhost:54252/docs/w1-1.html page is accessible? When I ran pnpm run dev for testing, the project did not report any errors, but files in the folders starting with "w," "x," "y," and "z" were not generated, resulting in a 404 error when accessed. The same project was able to generate these pages correctly in Astro v4.

@justin5267
Copy link

The same project was able to generate these pages correctly in Astro v4

Even in Astro v5, if I modify SAVE_DEBOUNCE_MS to 5000 and run pnpm run dev again, these pages are generated correctly and can be accessed without issues.

@ascorbic
Copy link
Contributor Author

I'd ideally want to avoid needing to change the debounce, because it's working-around something that probably needs a proper fix.

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be honest; I really don't know what we are fixing, even with the explanation 😅

It would be nice to have some comments that explain the new properties and the new code, if possible.

Otherwise, I tested it, and it works as expected, except for the deletion .astro folder, which I think is something that users shouldn't usually do. They should use --force instead.

@justin5267
Copy link

According to @ematipico 's suggestion, I retested using pnpm astro dev --force and found that this PR has fixed the issue. In comparison, without applying this PR, even running pnpm astro dev --force fails to generate the files completely. Therefore, this PR is effective. Thank you for your hard work!

@ascorbic
Copy link
Contributor Author

Thanks for the update. I'm still not totally happy with the situation, but I think fixing it is going to require a bigger refactor, probably to implement a state machine.

@ascorbic ascorbic merged commit 7783dbf into main Mar 18, 2025
29 of 30 checks passed
@ascorbic ascorbic deleted the data-save-race branch March 18, 2025 09:24
@astrobot-houston astrobot-houston mentioned this pull request Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: astro Related to the core `astro` package (scope) pr preview This PR has a preview release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SAVE_DEBOUNCE_MS Too Short Causes Incomplete Data in data-store.json
3 participants