Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(CI): speedup gh workflows, reduce E2E flake #8842

Merged
merged 1 commit into from
Mar 12, 2025
Merged

Conversation

stipsan
Copy link
Member

@stipsan stipsan commented Mar 5, 2025

Description

  • Removes run_install: false as this is the default option of pnpm/action-setup@v4.
    uses cache: pnpm on actions/setup-node@v4, which replaces custom cache-node-modules commands, and leaves it to github to know the best way to cache pnpm.
  • In order to use cache: pnpm we need to run pnpm/action-setup before actions/setup-node.
  • Uses node-version: lts/* so we automatically are on the last stable LTS release of Node.js, so we don't have to manually bump majors every year. A lot of jobs are on Node.js v18, instead of the faster v22.
  • The e2e-ct.yml, uses turborepo to cache build output, which is much faster than the custom actions/cache steps, and doesn't require cleanup steps or garbage collection.
  • The e2e-ct.yml now caches install for chromium and firefox, webkit still needs to run install on every run for it to be successful, although it's still unclear why.
  • The e2e:build command now uses turborepo, significantly speeding up the e2e suite as it no longer needs to run sanity build on every single git push, only on changes that might affect the suite.
  • The playwright-ct.config.ts is adjusted to make flake easier to detect, and tests fail faster instead of spending up to 30 mins on the CI before failing. Retries is set to just 1, as our options for trace and video, assume at least 1 retry on errors (for example on-first-retry).
  • TestForm.tsx, used by e2e-ct, is updated with the same fixes related to unstable useRef usage as in the production document providers/
  • Regular e2e also has 1 retry now, instead of 4, to better catch flaky and failing tests (we shouldn't allow tests to regularly need 4 retries to pass).
  • Refactored a bunch of tests so they're more resilient and less flaky, especially on FireFox

What to review

Hopefully the changes makes sense and have helpful inline code comments.
In general the idea of setting retries to just 1, instead of 4 or 6, is that global limits like these makes it hard to track down flake, since the CI might regularly retry flaky tests 4+ times before they pass. It's better to set higher retry limits on specific flaky tests, as it also makes them easier to find and to keep track of.

Testing

If existing tests pass we're good 🤞

Notes for release

N/A

Copy link

vercel bot commented Mar 5, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
page-building-studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 11, 2025 11:11pm
performance-studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 11, 2025 11:11pm
test-studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 11, 2025 11:11pm
2 Skipped Deployments
Name Status Preview Comments Updated (UTC)
studio-workshop ⬜️ Ignored (Inspect) Visit Preview Mar 11, 2025 11:11pm
test-next-studio ⬜️ Ignored (Inspect) Mar 11, 2025 11:11pm

Copy link
Contributor

github-actions bot commented Mar 5, 2025

No changes to documentation

Copy link
Contributor

github-actions bot commented Mar 5, 2025

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 42% 54407 / 129513
🔵 Statements 42% 54407 / 129513
🔵 Functions 46.88% 2728 / 5819
🔵 Branches 79.35% 10185 / 12835
File CoverageNo changed files found.
Generated in workflow #32123 for commit 317d16c by the Vitest Coverage Report Action

Copy link
Contributor

github-actions bot commented Mar 5, 2025

⚡️ Editor Performance Report

Updated Tue, 11 Mar 2025 23:19:33 GMT

Benchmark reference
latency of sanity@latest
experiment
latency of this branch
Δ (%)
latency difference
article (title) 27.0 efps (37ms) 26.3 efps (38ms) +1ms (+2.7%)
article (body) 73.0 efps (14ms) 78.1 efps (13ms) -1ms (-/-%)
article (string inside object) 27.8 efps (36ms) 27.4 efps (37ms) +1ms (+1.4%)
article (string inside array) 25.0 efps (40ms) 23.8 efps (42ms) +2ms (+5.0%)
recipe (name) 31.3 efps (32ms) 47.6 efps (21ms) -11ms (-34.4%)
recipe (description) 33.3 efps (30ms) 50.0 efps (20ms) -10ms (-33.3%)
recipe (instructions) 99.9+ efps (5ms) 99.9+ efps (5ms) +0ms (-/-%)
synthetic (title) 16.4 efps (61ms) 19.2 efps (52ms) -9ms (-14.8%)
synthetic (string inside object) 17.2 efps (58ms) 19.2 efps (52ms) -6ms (-10.3%)

efps — editor "frames per second". The number of updates assumed to be possible within a second.

Derived from input latency. efps = 1000 / input_latency

Detailed information

🏠 Reference result

The performance result of sanity@latest

Benchmark latency p75 p90 p99 blocking time test duration
article (title) 37ms 39ms 43ms 293ms 151ms 9.9s
article (body) 14ms 15ms 16ms 53ms 47ms 5.1s
article (string inside object) 36ms 40ms 46ms 75ms 31ms 6.4s
article (string inside array) 40ms 42ms 43ms 49ms 29ms 6.6s
recipe (name) 32ms 35ms 44ms 72ms 58ms 9.4s
recipe (description) 30ms 31ms 34ms 54ms 28ms 6.2s
recipe (instructions) 5ms 6ms 7ms 10ms 0ms 3.2s
synthetic (title) 61ms 68ms 76ms 102ms 1648ms 14.8s
synthetic (string inside object) 58ms 61ms 66ms 510ms 2216ms 9.8s

🧪 Experiment result

The performance result of this branch

Benchmark latency p75 p90 p99 blocking time test duration
article (title) 38ms 41ms 46ms 238ms 375ms 10.4s
article (body) 13ms 14ms 15ms 133ms 158ms 4.8s
article (string inside object) 37ms 38ms 47ms 76ms 35ms 6.4s
article (string inside array) 42ms 43ms 50ms 55ms 161ms 6.8s
recipe (name) 21ms 23ms 26ms 42ms 3ms 7.5s
recipe (description) 20ms 20ms 21ms 24ms 0ms 4.7s
recipe (instructions) 5ms 7ms 7ms 15ms 0ms 3.1s
synthetic (title) 52ms 54ms 61ms 100ms 639ms 12.5s
synthetic (string inside object) 52ms 54ms 60ms 401ms 1135ms 8.0s

📚 Glossary

column definitions

  • benchmark — the name of the test, e.g. "article", followed by the label of the field being measured, e.g. "(title)".
  • latency — the time between when a key was pressed and when it was rendered. derived from a set of samples. the median (p50) is shown to show the most common latency.
  • p75 — the 75th percentile of the input latency in the test run. 75% of the sampled inputs in this benchmark were processed faster than this value. this provides insight into the upper range of typical performance.
  • p90 — the 90th percentile of the input latency in the test run. 90% of the sampled inputs were faster than this. this metric helps identify slower interactions that occurred less frequently during the benchmark.
  • p99 — the 99th percentile of the input latency in the test run. only 1% of sampled inputs were slower than this. this represents the worst-case scenarios encountered during the benchmark, useful for identifying potential performance outliers.
  • blocking time — the total time during which the main thread was blocked, preventing user input and UI updates. this metric helps identify performance bottlenecks that may cause the interface to feel unresponsive.
  • test duration — how long the test run took to complete.

@stipsan stipsan force-pushed the e2e-flake-(again) branch from 7acb2ed to 0fd7c7f Compare March 5, 2025 12:38
@stipsan stipsan force-pushed the e2e-flake-(again) branch from 7fdd3b7 to e67fec7 Compare March 10, 2025 15:09
@stipsan stipsan force-pushed the e2e-flake-(again) branch from e67fec7 to 9648b79 Compare March 10, 2025 15:30
@stipsan stipsan force-pushed the e2e-flake-(again) branch from 9a87d73 to bceb701 Compare March 11, 2025 22:29
@stipsan stipsan force-pushed the e2e-flake-(again) branch from bceb701 to 78db2a8 Compare March 11, 2025 22:37
@stipsan stipsan force-pushed the e2e-flake-(again) branch from 78db2a8 to a4d9aa4 Compare March 11, 2025 22:40
@stipsan stipsan force-pushed the e2e-flake-(again) branch from a4d9aa4 to 7c4190d Compare March 11, 2025 22:52
@stipsan stipsan force-pushed the e2e-flake-(again) branch from 7c4190d to 616611b Compare March 11, 2025 23:01
await page.getByTestId('action-publish').click()
expect(await paneFooter.textContent()).toMatch(/published/i)
await publishButton.click()
await expect(paneFooter).toContainText(/published/i, {useInnerText: true, timeout: 30_000})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is all great Cody, and really appreciate you taking the time to solve this.

I wonder something in this extended timeouts, have we seen them resolve after 30 seconds?

In the failing tests I'm seeing the published is not showing and it is just blocked in saving forever, delaying the whole test suite, and having to wait 30seconds for an actions to succeed doesn't feel right, if this is the experience in user land we should get it fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen some tests take over 1 minute and still showing the saving spinner. I don't yet know exactly when it happens, it seems to be happening when the entire e2e suite is running and there's a lot of mutations in the e2e dataset so it's difficult to reproduce locally.
I'll look into this condition after this PR is merged :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I agree, delaying the whole suite up to 30s is bad but it's better than 5s but then have a suite that will retry up to 5 times.
My goal is to find a better and more reliable approach to waiting for the saving spinner to complete and to understand the conditions causing it to take so long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add more on this, I agree will longer timeouts rather than multiple retries. We are adding some telemetry to test the speed of this across studios, maybe it will point us in the right direction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've also seen it taking long, but I wonder, do they ever resolve after this long or are they just blocked in this state? In the ones I've found I've never seen it resolve, even after 30secs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedrobonamin yeah, I've seen cases beyond the default 5s, usually around 10. Not in chromium but in firefox. It seems like firefox generally needs longer timeouts, not sure why

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedrobonamin for example, if you download the full-html-report--attempt-1 artifact from the build that ran just after this PR merged, you'll see that it waited 5.1 seconds:

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great way to see it, thank you!

@stipsan stipsan merged commit 2e1a214 into next Mar 12, 2025
61 checks passed
@stipsan stipsan deleted the e2e-flake-(again) branch March 12, 2025 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants