Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AD-310 Historical snapshots of event aggregates tables #5604

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

curtismorales
Copy link
Contributor

@curtismorales curtismorales commented May 16, 2024

I'd like to backfill mobile suggest data in the event_aggregates and event_aggregates_suggest tables, but I can't just backfill those tables directly because the other source data for those tables (topsites and desktop suggest) expires after 30 days.

My thought was to make these tables of historical data, combining the data already in these tables with the mobile suggest data, and then in the view we can point at the historical table for everything prior to 2024-05-01 and at the regular table for everything after.

I don't know if this is the best way to do this, though; I'm open to other suggestions.

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title).
  • If the PR comes from a fork, trigger integration CI tests by running the Push to upstream workflow and provide the <username>:<branch> of the fork as parameter. The parameter will also show up
    in the logs of the manual-trigger-required-for-fork CI task together with more detailed instructions.
  • If adding a new field to a query, ensure that the schema and dependent downstream schemas have been updated.
  • When adding a new derived dataset, ensure that data is not available already (fully or partially) and recommend extending an existing dataset in favor of creating new ones. Data can be available in the bigquery-etl repository, looker-hub or in looker-spoke-default.

For modifications to schemas in restricted namespaces (see CODEOWNERS):

┆Issue is synchronized with this Jira Task

@dataops-ci-bot
Copy link

Integration report for "Merge branch 'main' into event-aggregates-historical"

sql.diff

Click to expand!
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived: event_aggregates_historical_20240430
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived: event_aggregates_suggest_historical_20240430
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/metadata.yaml	2024-05-16 19:57:41.000000000 +0000
@@ -0,0 +1,24 @@
+friendly_name: Contextual Services Event Aggregates, Historical (-> 2024-04-30)
+description: |-
+  Aggregated event and user counts for topsites and quicksuggest. Snapshot
+  of historical data through 2024-04-30
+owners:
+- cmorales@mozilla.com
+labels:
+  owner1: cmorales
+bigquery:
+  time_partitioning:
+    type: day
+    field: submission_date
+    require_partition_filter: true
+    expiration_days: null
+  range_partitioning: null
+  clustering:
+    fields:
+    - source
+    - event_type
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:contextual-services
+references: {}
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/query.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/query.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/query.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/query.sql	2024-05-16 19:55:52.000000000 +0000
@@ -0,0 +1,133 @@
+{% if is_init() %}
+  WITH blocks AS (
+    SELECT
+      b.id,
+      b.queryType AS query_type,
+    FROM
+      `moz-fx-ads-prod.adm.blocks` b
+    QUALIFY
+      1 = ROW_NUMBER() OVER (PARTITION BY b.id ORDER BY b.date DESC)
+  ),
+  event_aggregates AS (
+    SELECT
+      *
+    FROM
+      `moz-fx-data-shared-prod.contextual_services_derived_v1`
+    WHERE
+      submission_date <= '2024-04-30'
+  ),
+  mobile_suggest AS (
+    -- Suggest Android
+    SELECT
+      metrics.uuid.fx_suggest_context_id AS context_id,
+      DATE(submission_timestamp) AS submission_date,
+      'suggest' AS source,
+      IF(
+        metrics.string.fx_suggest_ping_type = "fxsuggest-click",
+        "click",
+        "impression"
+      ) AS event_type,
+      'phone' AS form_factor,
+      normalized_country_code AS country,
+      metadata.geo.subdivision1 AS subdivision1,
+      metrics.string.fx_suggest_advertiser AS advertiser,
+      client_info.app_channel AS release_channel,
+      metrics.quantity.fx_suggest_position AS position,
+      -- Only remote settings is in use on mobile
+      'remote settings' AS provider,
+      -- Only standard suggestions are in use on mobile
+      'firefox-suggest' AS match_type,
+      SPLIT(metadata.user_agent.os, ' ')[SAFE_OFFSET(0)] AS normalized_os,
+      -- This is the opt-in for Merino, not in use on mobile
+      CAST(NULL AS BOOLEAN) AS suggest_data_sharing_enabled,
+      blocks.query_type,
+    FROM
+      `moz-fx-data-shared-prod.fenix.fx_suggest` fs
+    LEFT JOIN
+      blocks
+      ON fs.metrics.quantity.fx_suggest_block_id = blocks.id
+    WHERE
+      metrics.string.fx_suggest_ping_type IN ("fxsuggest-click", "fxsuggest-impression")
+    UNION ALL
+    -- Suggest iOS
+    SELECT
+      metrics.uuid.fx_suggest_context_id AS context_id,
+      DATE(submission_timestamp) AS submission_date,
+      'suggest' AS source,
+      IF(
+        metrics.string.fx_suggest_ping_type = "fxsuggest-click",
+        "click",
+        "impression"
+      ) AS event_type,
+      'phone' AS form_factor,
+      normalized_country_code AS country,
+      metadata.geo.subdivision1 AS subdivision1,
+      metrics.string.fx_suggest_advertiser AS advertiser,
+      client_info.app_channel AS release_channel,
+      metrics.quantity.fx_suggest_position AS position,
+      -- Only remote settings is in use on mobile
+      'remote settings' AS provider,
+      -- Only standard suggestions are in use on mobile
+      'firefox-suggest' AS match_type,
+      SPLIT(metadata.user_agent.os, ' ')[SAFE_OFFSET(0)] AS normalized_os,
+      -- This is the opt-in for Merino, not in use on mobile
+      CAST(NULL AS BOOLEAN) AS suggest_data_sharing_enabled,
+      blocks.query_type,
+    FROM
+      `moz-fx-data-shared-prod.firefox_ios.fx_suggest` fs
+    LEFT JOIN
+      blocks
+      ON fs.metrics.quantity.fx_suggest_block_id = blocks.id
+    WHERE
+      metrics.string.fx_suggest_ping_type IN ("fxsuggest-click", "fxsuggest-impression")
+  ),
+  mobile_suggest_with_event_count AS (
+    SELECT
+      *,
+      COUNT(*) OVER (
+        PARTITION BY
+          submission_date,
+          context_id,
+          source,
+          event_type,
+          form_factor
+      ) AS user_event_count,
+    FROM
+      mobile_suggest
+    ORDER BY
+      context_id
+  )
+  SELECT
+    *
+  FROM
+    `moz-fx-data-shared-prod.contextual_services_derived_v1.event_aggregates_v1`
+  WHERE
+    submission_date <= '2024-04-30'
+  UNION ALL
+  SELECT
+    * EXCEPT (context_id, user_event_count, query_type),
+    COUNT(*) AS event_count,
+    COUNT(DISTINCT(context_id)) AS user_count,
+    query_type,
+  FROM
+    mobile_suggest_with_event_count
+  WHERE
+    submission_date <= '2024-04-30'
+    -- Filter out events associated with suspiciously active clients.
+    AND NOT (user_event_count > 50 AND event_type = 'click')
+  GROUP BY
+    submission_date,
+    source,
+    event_type,
+    form_factor,
+    country,
+    subdivision1,
+    advertiser,
+    release_channel,
+    position,
+    provider,
+    match_type,
+    normalized_os,
+    suggest_data_sharing_enabled,
+    query_type
+{% endif %}
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/schema.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_historical_20240430/schema.yaml	2024-05-16 19:55:52.000000000 +0000
@@ -0,0 +1,49 @@
+fields:
+- mode: NULLABLE
+  name: submission_date
+  type: DATE
+- mode: NULLABLE
+  name: source
+  type: STRING
+- mode: NULLABLE
+  name: event_type
+  type: STRING
+- mode: NULLABLE
+  name: form_factor
+  type: STRING
+- mode: NULLABLE
+  name: country
+  type: STRING
+- mode: NULLABLE
+  name: subdivision1
+  type: STRING
+- mode: NULLABLE
+  name: advertiser
+  type: STRING
+- mode: NULLABLE
+  name: release_channel
+  type: STRING
+- mode: NULLABLE
+  name: position
+  type: INTEGER
+- mode: NULLABLE
+  name: provider
+  type: STRING
+- mode: NULLABLE
+  name: match_type
+  type: STRING
+- mode: NULLABLE
+  name: normalized_os
+  type: STRING
+- mode: NULLABLE
+  name: suggest_data_sharing_enabled
+  type: BOOLEAN
+- mode: NULLABLE
+  name: event_count
+  type: INTEGER
+- mode: NULLABLE
+  name: user_count
+  type: INTEGER
+- mode: NULLABLE
+  name: query_type
+  type: STRING
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/metadata.yaml	2024-05-16 19:57:41.000000000 +0000
@@ -0,0 +1,21 @@
+friendly_name: Contextual Services Event Aggregates for Suggest, Historical (-> 2024-04-30)
+description: |-
+  Aggregated event counts for suggest. Snapshot of historical data through
+  2024-04-30
+owners:
+- cmorales@mozilla.com
+labels:
+  owner1: cmorales
+bigquery:
+  time_partitioning:
+    type: day
+    field: submission_date
+    require_partition_filter: true
+    expiration_days: null
+  range_partitioning: null
+  clustering: null
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:contextual-services
+references: {}
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/query.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/query.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/query.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/query.sql	2024-05-16 19:55:52.000000000 +0000
@@ -0,0 +1,116 @@
+{% if is_init() %}
+  WITH blocks AS (
+    SELECT
+      b.id,
+      b.queryType AS query_type,
+    FROM
+      `moz-fx-ads-prod.adm.blocks` b
+    QUALIFY
+      1 = ROW_NUMBER() OVER (PARTITION BY b.id ORDER BY b.date DESC)
+  ),
+  mobile_suggest AS (
+    -- Suggest Android
+    SELECT
+      metrics.uuid.fx_suggest_context_id AS context_id,
+      DATE(submission_timestamp) AS submission_date,
+      'phone' AS form_factor,
+      normalized_country_code AS country,
+      metrics.string.fx_suggest_advertiser AS advertiser,
+      SPLIT(metadata.user_agent.os, ' ')[SAFE_OFFSET(0)] AS normalized_os,
+      client_info.app_channel AS release_channel,
+      metrics.quantity.fx_suggest_position AS position,
+      -- Only remote settings is in use on mobile
+      'remote settings' AS provider,
+      -- Only standard suggestions are in use on mobile
+      'firefox-suggest' AS match_type,
+      -- This is the opt-in for Merino, not in use on mobile
+      CAST(NULL AS BOOLEAN) AS suggest_data_sharing_enabled,
+      IF(
+        metrics.string.fx_suggest_ping_type = "fxsuggest-click",
+        "click",
+        "impression"
+      ) AS event_type,
+      blocks.query_type,
+    FROM
+      `moz-fx-data-shared-prod.fenix.fx_suggest` fs
+    LEFT JOIN
+      blocks
+      ON fs.metrics.quantity.fx_suggest_block_id = blocks.id
+    WHERE
+      metrics.string.fx_suggest_ping_type IN ("fxsuggest-click", "fxsuggest-impression")
+    UNION ALL
+    -- Suggest iOS
+    SELECT
+      metrics.uuid.fx_suggest_context_id AS context_id,
+      DATE(submission_timestamp) AS submission_date,
+      'phone' AS form_factor,
+      normalized_country_code AS country,
+      metrics.string.fx_suggest_advertiser AS advertiser,
+      SPLIT(metadata.user_agent.os, ' ')[SAFE_OFFSET(0)] AS normalized_os,
+      client_info.app_channel AS release_channel,
+      metrics.quantity.fx_suggest_position AS position,
+      -- Only remote settings is in use on mobile
+      'remote settings' AS provider,
+      -- Only standard suggestions are in use on mobile
+      'firefox-suggest' AS match_type,
+      -- This is the opt-in for Merino, not in use on mobile
+      CAST(NULL AS BOOLEAN) AS suggest_data_sharing_enabled,
+      IF(
+        metrics.string.fx_suggest_ping_type = "fxsuggest-click",
+        "click",
+        "impression"
+      ) AS event_type,
+      blocks.query_type,
+    FROM
+      `moz-fx-data-shared-prod.firefox_ios.fx_suggest` fs
+    LEFT JOIN
+      blocks
+      ON fs.metrics.quantity.fx_suggest_block_id = blocks.id
+    WHERE
+      metrics.string.fx_suggest_ping_type IN ("fxsuggest-click", "fxsuggest-impression")
+  ),
+  mobile_suggest_with_event_count AS (
+    SELECT
+      *,
+      COUNT(*) OVER (
+        PARTITION BY
+          submission_date,
+          context_id,
+          event_type,
+          form_factor
+      ) AS user_event_count,
+    FROM
+      combined
+    ORDER BY
+      context_id
+  )
+  SELECT
+    *
+  FROM
+    `moz-fx-data-shared-prod.contextual_services_derived.event_aggregates_suggest_v1`
+  WHERE
+    submission_date <= '2024-04-30'
+  SELECT
+    * EXCEPT (context_id, user_event_count, event_type, query_type),
+    COUNTIF(event_type = "impression") AS impression_count,
+    COUNTIF(event_type = "click") AS click_count,
+    query_type,
+  FROM
+    with_event_count
+  WHERE
+    submission_date <= '2024-04-30'
+    -- Filter out events associated with suspiciously active clients.
+    AND NOT (user_event_count > 50 AND event_type = 'click')
+  GROUP BY
+    submission_date,
+    form_factor,
+    country,
+    advertiser,
+    normalized_os,
+    release_channel,
+    position,
+    provider,
+    match_type,
+    suggest_data_sharing_enabled,
+    query_type
+{% endif %}
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/schema.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services_derived/event_aggregates_suggest_historical_20240430/schema.yaml	2024-05-16 19:55:52.000000000 +0000
@@ -0,0 +1,40 @@
+fields:
+- mode: NULLABLE
+  name: submission_date
+  type: DATE
+- mode: NULLABLE
+  name: form_factor
+  type: STRING
+- mode: NULLABLE
+  name: country
+  type: STRING
+- mode: NULLABLE
+  name: advertiser
+  type: STRING
+- mode: NULLABLE
+  name: normalized_os
+  type: STRING
+- mode: NULLABLE
+  name: release_channel
+  type: STRING
+- mode: NULLABLE
+  name: position
+  type: INTEGER
+- mode: NULLABLE
+  name: provider
+  type: STRING
+- mode: NULLABLE
+  name: match_type
+  type: STRING
+- mode: NULLABLE
+  name: suggest_data_sharing_enabled
+  type: BOOLEAN
+- mode: NULLABLE
+  name: impression_count
+  type: INTEGER
+- mode: NULLABLE
+  name: click_count
+  type: INTEGER
+- mode: NULLABLE
+  name: query_type
+  type: STRING

Link to full diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants