-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompute or optimize queries against firefox_desktop.pageload #4957
Comments
➤ Andrew Creskey commented: cc: Denis Palmeiro who also makes use of queries against firefox_desktop.pageload. |
➤ Shell Escalante commented: we don’t have a focus area. George Kaberere should we tag this in any way before moving to DENG? there wasn’t a data modelling area. |
➤ Denis Palmeiro commented: After using this table quite extensively recently, I think what would help us substantially is to have at least these subsets in separate tables to make lookups faster: firefoxdesktop.pageloadnightly (nightly only) Andrew Creskey or Bas Schouten can maybe think of some other useful subsets. I don’t use the beta population often, but maybe that could also be useful to have. |
➤ Winnie Chan commented: Denis Palmeiro Andrew Creskey There are some questions I hope you can help answer:
cc George Kaberere |
➤ Denis Palmeiro commented:
Thanks! |
➤ Winnie Chan commented: Denis Palmeiro I have created the three views with data for April 2024 only at the moment. Could you take a look to make sure it fits your needs before I backfill the tables for more data (as you mentioned perhaps from a year ago around 2023-04-01).
Note that the experiments table is still big in size (currently at 35TB with 1 month of data. With another 11 months of data it may not be that much smaller than the original table of 200TB in size). Let me know what you think. Thanks. |
➤ Denis Palmeiro commented: Winnie Chan Thanks, those tables look great. Since the experiments table is still so large, let’s just get rid of it and do the other 2 instead. Thanks! |
➤ Winnie Chan commented: Denis Palmeiro I have backfilled the following tables from 2023-05-01. Let me know if you need more data.
I can go ahead and delete moz-fx-data-shared-prod.firefox_desktop.pageload_experiments. However, I wonder if the two tables above would be sufficient for your use cases in querying for experiments? The ticket included some sample redash queries (96384 ( https://sql.telemetry.mozilla.org/queries/96384 ), 92832 ( https://sql.telemetry.mozilla.org/queries/92832 )) that may look at experiments in other channels? |
➤ Denis Palmeiro commented: Thanks Winnie Chan! The experiment subset is the biggest use case for us, and we are mostly just interested in performance experiments but unfortunately there is no good way to just isolate those. However, the nightly and 1% should help us when we're doing quick lookups of data. |
➤ Winnie Chan commented: Thanks Denis Palmeiro . In that case I will close this ticket and delete the experiments table moz-fx-data-shared-prod.firefox_desktop.pageload_experiments. You can start using the new tables where applicable, particularly in any scheduled redash queries or dashboards. I will continue to monitor usage/costs of the the pageload tables for the next little while and see if there are more needs to optimize. Feel free to reach out if you have any questions! |
The performance team makes heavy use of the
firefox_desktop.pageload
table to analyze data from the pageload event.Some examples:
Colab query on experiment results.
Redash query to compare experiment results:
Redash query intended to be used on dashboards, but hits data limits
However as we are making queries against a huge dataset the performance can be poor and the usage costs are likely quite high. Furthermore, we run into data query limits via redash.
Creating this ticket as [~accountid:70121:7c899675-2b52-4a02-b363-378de64acfe3] suggested that modelling to pre-compute results could be helpful here.
Discussion in #data-help: https://mozilla.slack.com/archives/C4D5ZA91B/p1706214488829459
┆Issue is synchronized with this Jira Story
The text was updated successfully, but these errors were encountered: