Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Checks Bug]: ETL check checks__fail_firefox_ios_derived__new_profile_activation__v2 failiing to execute due to incorrect check path #4524

Open
kik-kik opened this issue Nov 6, 2023 · 3 comments

Comments

@kik-kik
Copy link
Contributor

kik-kik commented Nov 6, 2023

What is the issue?

What was observed, any logs or relevant links?

ETL task (checks__fail_firefox_ios_derived__new_profile_activation__v2) responsible for validating that the firefox_ios_derived.new_profile_activation_v2 dataset is correctly being updated currently fails silently with the following error:

[2023-11-06, 13:48:00 UTC] {pod_manager.py:235} INFO - No checks.sql file found in sql//None/sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2

Example failed run: https://workflow.telemetry.mozilla.org/dags/bqetl_firefox_ios/grid?root=&dag_run_id=scheduled__2023-10-29T04%3A00%3A00%2B00%3A00&tab=logs&task_id=checks__fail_firefox_ios_derived__new_profile_activation__v2

This is unintended behavior, and only got caught by chance when manually attempting to debug an issue with a downstream dataset. This resulted in the task being marked as successful without the dataset checks being executed.

How can we reproduce the bug

Re-running the task produces the same result, please keep in mind that the process exits with status code `0` therefore the task is marked as success in Airflow (hence why this is a silent failure).

It appears other ETL checks in the same DAG are running and detecting issues (as intended), it is unclear what's causing this specific task to behave this way.

┆Issue is synchronized with this Jira Task

@kik-kik
Copy link
Contributor Author

kik-kik commented Nov 6, 2023

Failed to reproduce this locally. Tried running the following command:

./script/bqetl check run sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2 --parameter=submission_date:DATE:2023-10-30

And correctly seeing the following error:

Error in query string: Error processing job 'moz-fx-data-shared- prod:bqjob_r42bead83008a3193_0000018ba50ab3e2_1': Less than 0 rows found (expected more than 1) 

(currently, this is expected as indeed the dataset is missing data for this specific submission_date)

This appears to mimic what should be happening in the prod deployment:
https://github.com/mozilla/telemetry-airflow/blob/0e83b178dc105a897487b911898f5c94ff67c329/utils/gcp.py#L389-L397

@kik-kik
Copy link
Contributor Author

kik-kik commented Nov 6, 2023

./script/bqetl check run --marker fail sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2 --parameter=submission_date:DATE:2023-10-30

Yielded:

/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
Error in query string: Error processing job 'moz-fx-data-shared- prod:bqjob_r354728dfd7c4e974_0000018ba514808b_1': Less than 0 rows found (expected more than 1) 
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
./script/bqetl check run --marker fail sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2 --parameter=submission_date:DATE:2023-10-29

Yielded:

/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
No checks.sql file found in sql//None/sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2

It appears the same command for different submission_dates resulted in different outcomes for the command. For submission dates:

  • 2023-10-29 - we see No checks.sql file found in sql//None/sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2 and no indication of a BQ job being run.
  • 2023-10-30 - we see Error in query string: Error processing job 'moz-fx-data-shared- prod:bqjob_r354728dfd7c4e974_0000018ba514808b_1': Less than 0 rows found (expected more than 1) indicating a bq job was executed.

@kik-kik
Copy link
Contributor Author

kik-kik commented Nov 6, 2023

It appears the opposite is true for the submission dates when running the checks__fail_firefox_ios_derived__new_profile_activation__v2 task via Airflow. For 2023-10-30 we see no checks.sql message and for 2023-10-29 we see a check running. Tried to re-run the task for execution_date 2023-10-29 and we see a BQ job running: Waiting on bqjob_r253c142020408433_0000018ba5218b78_1 ... (2s) Current status: DONE along with the no checks.sql message.

When investigating the BQ job id to see further details of the query executed, it appears the query used is completely off from what it is expected to be:
image

This is what the bq console shows as executed SQL:

#fail
WITH non_unique AS (
  SELECT
    COUNT(*) AS total_count
  FROM
    `moz-fx-data-shared-prod.firefox_ios_derived.new_profile_activation_v2`
  GROUP BY
    client_id
  HAVING
    total_count > 1
)
SELECT
  IF(
    (SELECT COUNT(*) FROM non_unique) > 0,
    ERROR(
      "Duplicates detected (Expected combined set of values for columns ['client_id'] to be unique.)"
    ),
    NULL
  );

This is the template we expect to be used: https://github.com/mozilla/bigquery-etl/blob/a30788cdd53d0236dcff139987f2e29db15bde6a/sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2/checks.sql

Which should yield the following query:

#fail
WITH non_unique AS (
  SELECT
    COUNT(*) AS total_count
  FROM
    `moz-fx-data-shared-prod.firefox_ios_derived.new_profile_activation_v2`
  GROUP BY
    client_id
  HAVING
    total_count > 1
)
SELECT
  IF(
    (SELECT COUNT(*) FROM non_unique) > 0,
    ERROR(
      "Duplicates detected (Expected combined set of values for columns ['client_id'] to be unique.)"
    ),
    NULL
  );

#fail
WITH min_row_count AS (
  SELECT
    COUNT(*) AS total_rows
  FROM
    `moz-fx-data-shared-prod.firefox_ios_derived.new_profile_activation_v2`
  WHERE
    `date` = @submission_date
)
SELECT
  IF(
    (SELECT COUNTIF(total_rows < 1) FROM min_row_count) > 0,
    ERROR(
      CONCAT(
        "Less than ",
        (SELECT total_rows FROM min_row_count),
        " rows found (expected more than 1)"
      )
    ),
    NULL
  );

#fail
SELECT
  IF(
    COUNTIF(is_new_profile) <> COUNT(*),
    ERROR("Number of is_new_profile TRUE values should be the same as the row count."),
    NULL
  )
FROM
  `moz-fx-data-shared-prod.firefox_ios_derived.new_profile_activation_v2`
WHERE
  `date` = @submission_date;

#fail
SELECT
  IF(
    DATE_DIFF(`date`, first_seen_date, DAY) <> 6,
    ERROR("Day difference between values inside `date` and submission_date fields should be 6."),
    NULL
  )
FROM
  `moz-fx-data-shared-prod.firefox_ios_derived.new_profile_activation_v2`
WHERE
  `date` = @submission_date;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant