Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break up request graph cache serialisation and run after build completion #9384

Merged
merged 24 commits into from Feb 16, 2024

Conversation

JakeLane
Copy link
Contributor

↪️ Pull Request

Currently in Jira Frontend, we have a major problem with developers becoming confused when shutting down parcel, as it looks like the process has crashed. This is because the request tracker is serialised and written to cache, when the parcel cli is shutdown.

To avoid this, we can utilise the idle time in the watch process by serialising after the build has completed. It's also important to break up the serialisation phase, as we need to be able to interrupt at any time for a new build. I picked a sane constant value to split up the nodes in groups that took around ~10ms to process on my machine, which should free up the event loop well enough for most developers.

🚨 Test instructions

  1. Start a parcel build in watch mode
  2. Wait for build to complete
  3. Observe cache serialising to disk
  4. Close parcel (observe near-instant shutdown)

@@ -361,6 +358,7 @@ export default class Parcel {
createValidationRequest({optionsRef: this.#optionsRef, assetRequests}),
{force: assetRequests.length > 0},
);
await this.#requestTracker.writeToCache();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is after the build has reported as complete. Open to suggestions for a more suitable location.

let requestGraphKey = hashString(`${cacheKey}:requestGraph`);
let snapshotKey = hashString(`${cacheKey}:snapshot`);
const cacheKey = getCacheKey(this.options);
const requestGraphKey = `requestGraph:${hashString(cacheKey)}`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed these as it's easier to debug what's happening if the cache folder has useful names

@JakeLane JakeLane force-pushed the jlane2/write-request-graph-to-disk-background branch from f831368 to e5ffbea Compare November 16, 2023 04:33
@JakeLane JakeLane force-pushed the jlane2/write-request-graph-to-disk-background branch from e5ffbea to 863b5bd Compare November 16, 2023 04:43
@devongovett
Copy link
Member

Won't this end up spending more time/cpu serializing in general because it runs after every build instead of only when shutting down Parcel? Sure it starts earlier when you actually are shutting down but that's overall a pretty rare event vs doing a regular rebuild, in which case there's no need to save to disk.

@JakeLane
Copy link
Contributor Author

Won't this end up spending more time/cpu serializing in general because it runs after every build instead of only when shutting down Parcel? Sure it starts earlier when you actually are shutting down but that's overall a pretty rare event vs doing a regular rebuild, in which case there's no need to save to disk.

Yeah, it's definitely a compromise as we'll need to use CPU more often, but it shouldn't impact active waiting time for devs by slowing down the shutdown of the process. The parcel build should be usable to the developer while this serialisation is happening (requests will be served) and it's interruptible, so new changes to source should cancel these writes.

I'm realising now my implementation isn't ideal here, as it doesn't get interrupted in the way I expected. I'll push up an update tomorrow which should make more sense.

@devongovett
Copy link
Member

Even better would be to only serialize the parts of the graph that changed now that you have chunking...

@JakeLane JakeLane force-pushed the jlane2/write-request-graph-to-disk-background branch from 858efed to d93e422 Compare November 17, 2023 00:42
@JakeLane
Copy link
Contributor Author

Even better would be to only serialize the parts of the graph that changed now that you have chunking...

I wasn't able to come up with a way to know if it's safe to skip serialisation of a chunk efficiently :/

Ideally we need to know if a node has changed since last build and since it's possible for a node to be deleted, we'd have to have some sort of tracking for changed nodes.

@JakeLane JakeLane force-pushed the jlane2/write-request-graph-to-disk-background branch from 1efe860 to d1e726d Compare December 8, 2023 03:41
Copy link
Contributor

@marcins marcins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left some general feedback / questions for now.

}
}

// If there's already a file following this chunk, it's old and should be removed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter if it had extra chunks after that, or does that just become "garbage"?

i.e. if you had chunks 0, 1, 2, 3 (unlikely given the sizes - but still..), and next time have 0,1, you'll delete 2 but leave 3 dangling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I left it dangling as the intention was purely to cut the edge case of joining two different chunks together. In retrospect, it's not complex to just delete everything so I'll add that

}

// If there's already a file following this chunk, it's old and should be removed
if (await this.fs.exists(this.#getFilePath(key, chunks))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that async existence checking is long deprecated in Node, and can lead to race conditions: https://nodejs.org/api/fs.html#fsexistspath-callback (i.e. the recommended approach would be to just try and unlink and ignore failure)

@@ -111,27 +112,47 @@ export class LMDBCache implements Cache {
return Buffer.concat(await Promise.all(buffers));
}

async setLargeBlob(key: string, contents: Buffer | string): Promise<void> {
async setLargeBlob(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any elegant way to share the implementation between the two Cache types as (AFAICT) they're identical?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instantiate a FSCache instance inside lmdb cache and forward calls of setLargeBlob to that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me, will clean this up

@@ -846,6 +846,8 @@ export class RequestGraph extends ContentGraph<
}
}

const NODES_PER_BLOB = 2 ** 14;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we arrive at this number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I profiled locally on my machine on how long it took to serialise n nodes. I tuned n with binary search until I reached approximately ~50 ms serialisation time per blob. The goal is to free up the event loop for a reasonable amount of time for user perception.

I'll document this on the constant so it can be tuned in the future if required.

total,
size: this.graph.nodes.length,
});
report({type: 'cache', phase: 'end', total, size: this.graph.nodes.length});

await Promise.all(promises);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many concurrent promises will be typical here for a Jira cache? Promise.all with a large set, especially when writing files or doing other async IO stuff, can have sub-optimal performance. If the concurrency is large enough, you might have better results using async/queue with a concurrency limit set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We also have PromiseQueue in the utils for exactly this use case you haven't seen it in the codebase yet.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We also have PromiseQueue in the utils for exactly this use case you haven't seen it in the codebase yet.)

Ah, yeah, it's even touched in this PR 😅

I was thinking of Jira where I have used async/queue in the past for this..

hashString(`${cacheKey}:requestGraph`) + '-RequestGraph';
let snapshotKey = hashString(`${cacheKey}:snapshot`);
let requestGraphKey = `requestGraph-${hashString(cacheKey)}`;
let snapshotKey = `snapshot-${hashString(cacheKey)}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a biggy, but no point hashing the cache key twice.

queue
.add(() =>
serialiseAndSet(
`requestGraph-nodes-${i}-${hashString(cacheKey)}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we could re-use the hashed cache key from earlier.

)
) {
nodePromises.push(
getAndDeserialize(`requestGraph-nodes-${i}-${hashedCacheKey}`),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generate this string at least 3 different times. Could move it to a function so it's clear they're tied together?

opts,
),
)
.catch(() => {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this .catch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure we don't crash parcel if we interrupt a serialisation with the watcher (e.g. a new build is triggered)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required even though we're not awaiting the result?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "handler" to make sure it's not an "unhandled promise rejection"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll bring this back, I think that's the main reason it's required to have

for (let i = 0; i * NODES_PER_BLOB < cacheableNodes.length; i += 1) {
if (
this.cachedRequestsLastChunk !== null &&
i < this.cachedRequestsLastChunk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption here is that nodes never change or get re-ordered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep added a comment here. This idea was from @devongovett in the sync

if (
this.cachedRequestsLastChunk !== null &&
i < this.cachedRequestsLastChunk
) {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it'd be nicer, to calculate i with this expression rather than just calling continue at the start of the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this a bit, let me know what you reckon

Copy link
Contributor

@mattcompiles mattcompiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job bud 👏. This was a tricky one to get through. It all makes logical sense to me now. Happy to merge assuming you've tested it extensively.

@JakeLane
Copy link
Contributor Author

JakeLane commented Feb 13, 2024

@JakeLane JakeLane force-pushed the jlane2/write-request-graph-to-disk-background branch from a5a153e to 3965abb Compare February 15, 2024 05:41
@@ -269,6 +273,7 @@ export class RequestGraph extends ContentGraph<
optionNodeIds: this.optionNodeIds,
unpredicatableNodeIds: this.unpredicatableNodeIds,
invalidateOnBuildNodeIds: this.invalidateOnBuildNodeIds,
cachedRequestChunks: this.cachedRequestChunks,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea, this means we'll be able to skip work between different sessions of Parcel.

@@ -345,6 +351,9 @@ export class RequestGraph extends ContentGraph<
for (let parentNode of parentNodes) {
this.invalidateNode(parentNode, reason);
}

// If the node is invalidated, the cached request chunk on disk needs to be re-written
this.cachedRequestChunks.delete(Math.floor(nodeId / NODES_PER_BLOB));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need to do this from Request.completeRequest as well? I think it's possible that a node is invalidated but its request hasn't been re-run yet. Once it is re-run we'd want to update it's result in the cache.

Copy link
Contributor

@mattcompiles mattcompiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval for the new invalidation changes.

@JakeLane JakeLane merged commit 0560499 into v2 Feb 16, 2024
14 of 16 checks passed
@mischnic mischnic deleted the jlane2/write-request-graph-to-disk-background branch March 21, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants