[frontend] Latency when listing runs #10778

droctothorpe · 2024-05-03T16:43:09Z

Environment

How did you deploy Kubeflow Pipelines (KFP)?
Using the AWS helm chart.
KFP version:
2.0.5

Steps to reproduce

Create 10+ runs with the same experiment.
Go to the runs page.
The frontend will make 10+ requests to the same get experiment endpoint: https://<domain>/pipeline/apis/v2beta1/experiments/<experiment id>.
The concurrent requests will often result in latency per request in the 1+ second range which is cumulative. Some of our users are experiencing cumulative delays in the 15+ second range. Sometimes, the runs page never loads at all.
It may taken several refreshes to recreate. The error is somewhat intermittent.

Expected result

In an ideal world, the frontend would cache the experiment and not have to make the same request 10 times. In addition, even without caching, 10 experiment API requests probably shouldn't result in this much latency. Also, might be a red herring, but the experiment requests all include a Cache-Control: no-cache header. cc @owmasch.

Impacted by this bug? Give it a 👍.

The text was updated successfully, but these errors were encountered:

rimolive · 2024-05-04T15:11:13Z

I saw your comment in #10230, but I believe those are different issues. Although the old issue says it's a frontend issue, we discovered that it was a Database issue, and with the indexes creation we could improve performance.

Now the challenge is figure out what's going on in frontend that makes to much repetitive requests. Thank you for describing the steps to reproduce, I'll follow these steps and see if there's something we can do to propose a workaround until we have the actual fix.

droctothorpe · 2024-05-05T18:50:23Z

Thank you, @rimolive. We will continue investigating workarounds on our end as well.

droctothorpe · 2024-05-06T14:15:15Z

Found in the KFP Pipeline logs. Seems very relevant. Continuing to investigate but want to share now in case it's helpful for anyone else who is debugging:

 Waited for 1.133348867s due to client-side throttling, not priority and fairness, request: POST:https://<obfuscated>/apis/authorization.k8s.io/v1/subjectaccessreviews

droctothorpe · 2024-05-06T14:58:14Z

I think we just need to adjust the QPS and burst when instantiating the K8s client for the KFP API. Will validate the fix and update here.

droctothorpe · 2024-05-07T15:40:18Z

After investigating more thoroughly with @tarat44, it seems like modifying the QPS is probably not the optimal solution.

We're going to try updating the list runs page to make a single call to https://www.kubeflow.org/docs/components/pipelines/v1/reference/api/kubeflow-pipeline-api-spec/#operation--apis-v1beta1-experiments-get instead of individual calls for each experiment id and then parse the larger response for reduced latency.

Please let us know if this seems reasonable. Thanks!

thesuperzapper · 2024-05-07T19:08:05Z

@droctothorpe that seems reasonable.

Are there any other places which are making repeated requests that can be either cached, or combined?

Also, I guess we could have the backend cache the result of the subject access review for a few seconds.

droctothorpe · 2024-05-07T19:44:19Z

Thanks, @thesuperzapper. Just submitted a preliminary PR. Still have to update tests but waiting for design review before doing so. Going through the frontend code, it definitely seemed like there may be other opportunities to reduce latency. Here is another example where promise.all is used when a call to list pipelines could be used instead.

droctothorpe added area/frontend kind/bug labels May 3, 2024

droctothorpe mentioned this issue May 3, 2024

[frontend] Cannot list runs and/or artifacts (upstream request timeout) #10230

Closed

droctothorpe mentioned this issue May 7, 2024

fix(frontend): reduce list run latency #10797

Merged

1 task

droctothorpe closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[frontend] Latency when listing runs #10778

[frontend] Latency when listing runs #10778

droctothorpe commented May 3, 2024 •

edited

rimolive commented May 4, 2024

droctothorpe commented May 5, 2024

droctothorpe commented May 6, 2024

droctothorpe commented May 6, 2024

droctothorpe commented May 7, 2024

thesuperzapper commented May 7, 2024

droctothorpe commented May 7, 2024

[frontend] Latency when listing runs #10778

[frontend] Latency when listing runs #10778

Comments

droctothorpe commented May 3, 2024 • edited

Environment

Steps to reproduce

Expected result

rimolive commented May 4, 2024

droctothorpe commented May 5, 2024

droctothorpe commented May 6, 2024

droctothorpe commented May 6, 2024

droctothorpe commented May 7, 2024

thesuperzapper commented May 7, 2024

droctothorpe commented May 7, 2024

droctothorpe commented May 3, 2024 •

edited