chore(CI): speedup gh workflows, reduce E2E flake #8842

stipsan · 2025-03-05T11:46:14Z

Description

Removes run_install: false as this is the default option of pnpm/action-setup@v4.
uses cache: pnpm on actions/setup-node@v4, which replaces custom cache-node-modules commands, and leaves it to github to know the best way to cache pnpm.
In order to use cache: pnpm we need to run pnpm/action-setup before actions/setup-node.
Uses node-version: lts/* so we automatically are on the last stable LTS release of Node.js, so we don't have to manually bump majors every year. A lot of jobs are on Node.js v18, instead of the faster v22.
The e2e-ct.yml, uses turborepo to cache build output, which is much faster than the custom actions/cache steps, and doesn't require cleanup steps or garbage collection.
The e2e-ct.yml now caches install for chromium and firefox, webkit still needs to run install on every run for it to be successful, although it's still unclear why.
The e2e:build command now uses turborepo, significantly speeding up the e2e suite as it no longer needs to run sanity build on every single git push, only on changes that might affect the suite.
The playwright-ct.config.ts is adjusted to make flake easier to detect, and tests fail faster instead of spending up to 30 mins on the CI before failing. Retries is set to just 1, as our options for trace and video, assume at least 1 retry on errors (for example on-first-retry).
TestForm.tsx, used by e2e-ct, is updated with the same fixes related to unstable useRef usage as in the production document providers/
Regular e2e also has 1 retry now, instead of 4, to better catch flaky and failing tests (we shouldn't allow tests to regularly need 4 retries to pass).
Refactored a bunch of tests so they're more resilient and less flaky, especially on FireFox

What to review

Hopefully the changes makes sense and have helpful inline code comments.
In general the idea of setting retries to just 1, instead of 4 or 6, is that global limits like these makes it hard to track down flake, since the CI might regularly retry flaky tests 4+ times before they pass. It's better to set higher retry limits on specific flaky tests, as it also makes them easier to find and to keep track of.

Testing

If existing tests pass we're good 🤞

Notes for release

N/A

vercel · 2025-03-05T11:46:19Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
page-building-studio	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 11, 2025 11:11pm
performance-studio	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 11, 2025 11:11pm
test-studio	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 11, 2025 11:11pm

2 Skipped Deployments

Name	Status	Preview	Comments	Updated (UTC)
studio-workshop	⬜️ Ignored (Inspect)	Visit Preview		Mar 11, 2025 11:11pm
test-next-studio	⬜️ Ignored (Inspect)			Mar 11, 2025 11:11pm

github-actions · 2025-03-05T11:50:59Z

No changes to documentation

github-actions · 2025-03-05T11:51:04Z

Coverage Report

Status	Category	Percentage	Covered / Total
🔵	Lines	42%	54407 / 129513
🔵	Statements	42%	54407 / 129513
🔵	Functions	46.88%	2728 / 5819
🔵	Branches	79.35%	10185 / 12835

File Coverage

No changed files found.

Generated in workflow #32123 for commit 317d16c by the Vitest Coverage Report Action

github-actions · 2025-03-05T11:59:50Z

⚡️ Editor Performance Report

Updated Tue, 11 Mar 2025 23:19:33 GMT

Benchmark	reference ^{latency of sanity@latest}	experiment ^{latency of this branch}	Δ (%) ^{latency difference}
article (title)	27.0 efps (37ms)	26.3 efps (38ms)	+1ms (+2.7%)	✅
article (body)	73.0 efps (14ms)	78.1 efps (13ms)	-1ms (-/-%)	✅
article (string inside object)	27.8 efps (36ms)	27.4 efps (37ms)	+1ms (+1.4%)	✅
article (string inside array)	25.0 efps (40ms)	23.8 efps (42ms)	+2ms (+5.0%)	✅
recipe (name)	31.3 efps (32ms)	47.6 efps (21ms)	-11ms (-34.4%)	✅
recipe (description)	33.3 efps (30ms)	50.0 efps (20ms)	-10ms (-33.3%)	✅
recipe (instructions)	99.9+ efps (5ms)	99.9+ efps (5ms)	+0ms (-/-%)	✅
synthetic (title)	16.4 efps (61ms)	19.2 efps (52ms)	-9ms (-14.8%)	✅
synthetic (string inside object)	17.2 efps (58ms)	19.2 efps (52ms)	-6ms (-10.3%)	✅

efps — editor "frames per second". The number of updates assumed to be possible within a second.

Derived from input latency. efps = 1000 / input_latency

Detailed information

🏠 Reference result

The performance result of sanity@latest

Benchmark	latency	p75	p90	p99	blocking time	test duration
article (title)	37ms	39ms	43ms	293ms	151ms	9.9s
article (body)	14ms	15ms	16ms	53ms	47ms	5.1s
article (string inside object)	36ms	40ms	46ms	75ms	31ms	6.4s
article (string inside array)	40ms	42ms	43ms	49ms	29ms	6.6s
recipe (name)	32ms	35ms	44ms	72ms	58ms	9.4s
recipe (description)	30ms	31ms	34ms	54ms	28ms	6.2s
recipe (instructions)	5ms	6ms	7ms	10ms	0ms	3.2s
synthetic (title)	61ms	68ms	76ms	102ms	1648ms	14.8s
synthetic (string inside object)	58ms	61ms	66ms	510ms	2216ms	9.8s

🧪 Experiment result

The performance result of this branch

Benchmark	latency	p75	p90	p99	blocking time	test duration
article (title)	38ms	41ms	46ms	238ms	375ms	10.4s
article (body)	13ms	14ms	15ms	133ms	158ms	4.8s
article (string inside object)	37ms	38ms	47ms	76ms	35ms	6.4s
article (string inside array)	42ms	43ms	50ms	55ms	161ms	6.8s
recipe (name)	21ms	23ms	26ms	42ms	3ms	7.5s
recipe (description)	20ms	20ms	21ms	24ms	0ms	4.7s
recipe (instructions)	5ms	7ms	7ms	15ms	0ms	3.1s
synthetic (title)	52ms	54ms	61ms	100ms	639ms	12.5s
synthetic (string inside object)	52ms	54ms	60ms	401ms	1135ms	8.0s

📚 Glossary

column definitions

benchmark — the name of the test, e.g. "article", followed by the label of the field being measured, e.g. "(title)".

latency — the time between when a key was pressed and when it was rendered. derived from a set of samples. the median (p50) is shown to show the most common latency.

p75 — the 75th percentile of the input latency in the test run. 75% of the sampled inputs in this benchmark were processed faster than this value. this provides insight into the upper range of typical performance.

p90 — the 90th percentile of the input latency in the test run. 90% of the sampled inputs were faster than this. this metric helps identify slower interactions that occurred less frequently during the benchmark.

p99 — the 99th percentile of the input latency in the test run. only 1% of sampled inputs were slower than this. this represents the worst-case scenarios encountered during the benchmark, useful for identifying potential performance outliers.

blocking time — the total time during which the main thread was blocked, preventing user input and UI updates. this metric helps identify performance bottlenecks that may cause the interface to feel unresponsive.

test duration — how long the test run took to complete.

pedrobonamin · 2025-03-12T08:43:47Z

test/e2e/tests/document-actions/delete.spec.ts

-  await page.getByTestId('action-publish').click()
-  expect(await paneFooter.textContent()).toMatch(/published/i)
+  await publishButton.click()
+  await expect(paneFooter).toContainText(/published/i, {useInnerText: true, timeout: 30_000})


I think this is all great Cody, and really appreciate you taking the time to solve this.

I wonder something in this extended timeouts, have we seen them resolve after 30 seconds?

In the failing tests I'm seeing the published is not showing and it is just blocked in saving forever, delaying the whole test suite, and having to wait 30seconds for an actions to succeed doesn't feel right, if this is the experience in user land we should get it fixed.

I've seen some tests take over 1 minute and still showing the saving spinner. I don't yet know exactly when it happens, it seems to be happening when the entire e2e suite is running and there's a lot of mutations in the e2e dataset so it's difficult to reproduce locally.
I'll look into this condition after this PR is merged :)

And I agree, delaying the whole suite up to 30s is bad but it's better than 5s but then have a suite that will retry up to 5 times.
My goal is to find a better and more reliable approach to waiting for the saving spinner to complete and to understand the conditions causing it to take so long.

To add more on this, I agree will longer timeouts rather than multiple retries. We are adding some telemetry to test the speed of this across studios, maybe it will point us in the right direction.

Yeah, I've also seen it taking long, but I wonder, do they ever resolve after this long or are they just blocked in this state? In the ones I've found I've never seen it resolve, even after 30secs

@pedrobonamin yeah, I've seen cases beyond the default 5s, usually around 10. Not in chromium but in firefox. It seems like firefox generally needs longer timeouts, not sure why

@pedrobonamin for example, if you download the full-html-report--attempt-1 artifact from the build that ran just after this PR merged, you'll see that it waited 5.1 seconds:

This is a great way to see it, thank you!

vercel bot deployed to Preview – page-building-studio March 5, 2025 11:51 View deployment

vercel bot deployed to Preview – performance-studio March 5, 2025 11:51 View deployment

vercel bot deployed to Preview – test-studio March 5, 2025 11:53 View deployment

vercel bot deployed to Preview – performance-studio March 5, 2025 11:59 View deployment

vercel bot deployed to Preview – page-building-studio March 5, 2025 11:59 View deployment

vercel bot deployed to Preview – test-studio March 5, 2025 11:59 View deployment

vercel bot deployed to Preview – page-building-studio March 5, 2025 12:28 View deployment

vercel bot deployed to Preview – performance-studio March 5, 2025 12:29 View deployment

vercel bot deployed to Preview – test-studio March 5, 2025 12:32 View deployment

stipsan force-pushed the e2e-flake-(again) branch from 7acb2ed to 0fd7c7f Compare March 5, 2025 12:38

vercel bot deployed to Preview – page-building-studio March 5, 2025 12:39 View deployment

vercel bot deployed to Preview – performance-studio March 5, 2025 12:39 View deployment

vercel bot deployed to Preview – test-studio March 5, 2025 12:39 View deployment

vercel bot deployed to Preview – performance-studio March 5, 2025 12:52 View deployment

vercel bot deployed to Preview – page-building-studio March 5, 2025 12:52 View deployment

vercel bot deployed to Preview – test-studio March 5, 2025 12:52 View deployment

vercel bot deployed to Preview – page-building-studio March 5, 2025 12:57 View deployment

vercel bot deployed to Preview – test-studio March 5, 2025 12:57 View deployment

vercel bot deployed to Preview – performance-studio March 5, 2025 12:57 View deployment

stipsan force-pushed the e2e-flake-(again) branch from 7fdd3b7 to e67fec7 Compare March 10, 2025 15:09

vercel bot deployed to Preview – performance-studio March 10, 2025 15:15 View deployment

vercel bot deployed to Preview – page-building-studio March 10, 2025 15:15 View deployment

vercel bot deployed to Preview – test-studio March 10, 2025 15:17 View deployment

stipsan force-pushed the e2e-flake-(again) branch from e67fec7 to 9648b79 Compare March 10, 2025 15:30

vercel bot deployed to Preview – page-building-studio March 10, 2025 15:30 View deployment

vercel bot deployed to Preview – test-studio March 10, 2025 15:30 View deployment

vercel bot deployed to Preview – test-studio March 11, 2025 22:22 View deployment

stipsan force-pushed the e2e-flake-(again) branch from 9a87d73 to bceb701 Compare March 11, 2025 22:29

vercel bot deployed to Preview – page-building-studio March 11, 2025 22:30 View deployment

vercel bot deployed to Preview – performance-studio March 11, 2025 22:30 View deployment

vercel bot deployed to Preview – test-studio March 11, 2025 22:30 View deployment

stipsan force-pushed the e2e-flake-(again) branch from bceb701 to 78db2a8 Compare March 11, 2025 22:37

vercel bot deployed to Preview – page-building-studio March 11, 2025 22:38 View deployment

vercel bot deployed to Preview – performance-studio March 11, 2025 22:38 View deployment

vercel bot deployed to Preview – test-studio March 11, 2025 22:38 View deployment

stipsan force-pushed the e2e-flake-(again) branch from 78db2a8 to a4d9aa4 Compare March 11, 2025 22:40

vercel bot deployed to Preview – page-building-studio March 11, 2025 22:41 View deployment

vercel bot deployed to Preview – test-studio March 11, 2025 22:41 View deployment

vercel bot deployed to Preview – performance-studio March 11, 2025 22:41 View deployment

stipsan force-pushed the e2e-flake-(again) branch from a4d9aa4 to 7c4190d Compare March 11, 2025 22:52

vercel bot deployed to Preview – page-building-studio March 11, 2025 22:53 View deployment

vercel bot deployed to Preview – test-studio March 11, 2025 22:53 View deployment

vercel bot deployed to Preview – performance-studio March 11, 2025 22:53 View deployment

stipsan force-pushed the e2e-flake-(again) branch from 7c4190d to 616611b Compare March 11, 2025 23:01

vercel bot deployed to Preview – page-building-studio March 11, 2025 23:01 View deployment

vercel bot deployed to Preview – performance-studio March 11, 2025 23:01 View deployment

vercel bot deployed to Preview – test-studio March 11, 2025 23:01 View deployment

chore(CI): speedup gh workflows, reduce E2E flake

Loading
Loading status checks…

317d16c

stipsan force-pushed the e2e-flake-(again) branch from 616611b to 317d16c Compare March 11, 2025 23:10

vercel bot deployed to Preview – page-building-studio March 11, 2025 23:11 View deployment

vercel bot deployed to Preview – performance-studio March 11, 2025 23:11 View deployment

vercel bot deployed to Preview – test-studio March 11, 2025 23:11 View deployment

pedrobonamin reviewed Mar 12, 2025

View reviewed changes

EoinFalconer approved these changes Mar 12, 2025

View reviewed changes

stipsan merged commit 2e1a214 into next Mar 12, 2025
61 checks passed

stipsan deleted the e2e-flake-(again) branch March 12, 2025 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(CI): speedup gh workflows, reduce E2E flake #8842

chore(CI): speedup gh workflows, reduce E2E flake #8842

stipsan commented Mar 5, 2025 •

edited

Loading

vercel bot commented Mar 5, 2025 •

edited

Loading

github-actions bot commented Mar 5, 2025

github-actions bot commented Mar 5, 2025 •

edited

Loading

github-actions bot commented Mar 5, 2025 •

edited

Loading

🏠 Reference result

🧪 Experiment result

📚 Glossary

column definitions

pedrobonamin Mar 12, 2025

stipsan Mar 12, 2025

stipsan Mar 12, 2025

EoinFalconer Mar 12, 2025

pedrobonamin Mar 12, 2025

stipsan Mar 12, 2025

stipsan Mar 12, 2025

pedrobonamin Mar 12, 2025

chore(CI): speedup gh workflows, reduce E2E flake #8842

chore(CI): speedup gh workflows, reduce E2E flake #8842

Conversation

stipsan commented Mar 5, 2025 • edited Loading

Description

What to review

Testing

Notes for release

vercel bot commented Mar 5, 2025 • edited Loading

github-actions bot commented Mar 5, 2025

github-actions bot commented Mar 5, 2025 • edited Loading

Coverage Report

github-actions bot commented Mar 5, 2025 • edited Loading

⚡️ Editor Performance Report

🏠 Reference result

🧪 Experiment result

📚 Glossary

column definitions

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stipsan commented Mar 5, 2025 •

edited

Loading

vercel bot commented Mar 5, 2025 •

edited

Loading

github-actions bot commented Mar 5, 2025 •

edited

Loading

github-actions bot commented Mar 5, 2025 •

edited

Loading