Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema updates for tables with not associated query.sql #5379

Open
quiiver opened this issue Apr 16, 2024 · 3 comments
Open

Schema updates for tables with not associated query.sql #5379

quiiver opened this issue Apr 16, 2024 · 3 comments

Comments

@quiiver
Copy link
Contributor

quiiver commented Apr 16, 2024

We have a table who's definition is managed by bqetl but does not have a corresponding query SQL file.

https://github.com/mozilla/bigquery-etl/blob/main/sql/moz-fx-data-shared-prod/search_terms_derived/sanitization_job_metadata_v2/schema.yaml

When the schema is updated in the schema.yaml file the artifact deploy does not update the table schema (it does update the table metadata). Additionally using the bqetl command bqetl query schema update <table_ref> appears to complete but does not update the table schema.

┆Issue is synchronized with this Jira Task

@whd
Copy link
Member

whd commented Apr 16, 2024

For that table specifically I'd expect bqetl query schema update to only be runnable by DSRE/Airflow or possibly workgroup:search-terms/sanitized-writer since access to that dataset is heavily restricted.

I would expect (or hope) that artifact deployment via Airflow would automatically update this schema.

@quiiver
Copy link
Contributor Author

quiiver commented Apr 16, 2024

Yeah, i would expect the CLI command to fail when it either doesn't have a schema to update or the executor does not have the correct permissions.

@BenWu
Copy link
Contributor

BenWu commented Apr 22, 2024

bqetl query schema update looks for a corresponding query.sql and doesn't do anything if it doesn't exist (code). I agree that it should fail in that case.

bqetl query schema deploy looks for query.* or script.sql and runs the sql generators if none are found. This will fail if there are still no queries to deploy.

Kind of hacky, but as a workaround, you can add a query.py stub to allow the schema to be picked up by the schema deploy. I did that here https://github.com/mozilla/bigquery-etl/blob/main/sql/moz-fx-data-shared-prod/telemetry_derived/public_data_report_hardware_v1/query.py
I agree that we should consider making this unnecessary though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants