You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
It's impossible to delete DataProcessInstance objects in bulk (by filtering by entity-type). It either raises an error or doesn't find anything depending on whether you provide additional filter (e.g. --plaftform=airflow).
Only delete by --urn works, which isn't convenient for managing Datahub content.
To Reproduce
Deploy datahub locally:
datahub docker quickstart
Ingest sample job & its "start" event:
fromdatahub.api.entities.datajobimportDataFlow, DataJobfromdatahub.api.entities.dataprocess.dataprocess_instanceimportDataProcessInstancefromdatahub.ingestion.graph.clientimportDatahubClientConfig, DataHubGraphgraph=DataHubGraph(DatahubClientConfig(server="http://localhost:8080"))
flow=DataFlow(env="prod", orchestrator="airflow", id="flow_api_simple")
flow.emit(graph)
job=DataJob(flow_urn=flow.urn, id="job1", name="My Job 1")
job.emit(graph)
run=DataProcessInstance.from_datajob(datajob=job, id=f"{flow.id}-1")
run.emit(graph)
# optionally, DataProcessInstance is created event without startimporttimerun.emit_process_start(graph, int(time.time() *1000))
Try to delete info about job run using CLI:
datahub delete --platform airflow --entity-type dataProcessInstance
# outputs:
[2024-05-18 18:54:15,396] INFO {datahub.cli.delete_cli:341} - Using DataHubGraph: # configured to talk to http://localhost:8080
Found no urns to delete. Maybe you want to change your filters to be something different?
datahub delete --entity-type dataProcessInstance
# outputs
[2024-05-18 18:53:54,557] INFO {datahub.cli.delete_cli:341} - Using DataHubGraph: configured to talk to http://localhost:8080
[2024-05-18 18:53:55,059] ERROR {datahub.entrypoints:205} - Command failed: Error executing graphql query: [{'message': "The field at path '/scrollAcrossEntities/searchResults[0]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 0, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[1]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 1, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[2]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 2, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[3]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 3, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[4]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 4, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[5]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 5, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[6]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 6, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[7]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 7, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[8]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 8, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[9]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 9, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[10]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 10, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[11]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 11, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[12]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 12, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[13]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 13, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}]
Traceback (most recent call last):
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/entrypoints.py", line 192, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/upgrade/upgrade.py", line 396, in async_wrapper
loop.run_until_complete(run_func_check_upgrade())
File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
returnfuture.result()
^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/upgrade/upgrade.py", line 383, in run_func_check_upgrade
ret = await main_func_future
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/upgrade/upgrade.py", line 378, in run_inner_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/telemetry/telemetry.py", line 454, in wrapper
raise e
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/telemetry/telemetry.py", line 403, in wrapper
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/cli/delete_cli.py", line 367, in by_filter
urns = list(
^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/ingestion/graph/client.py", line 782, in get_urns_by_filter
forentityin self._scroll_across_entities(graphql_query, variables):
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/ingestion/graph/client.py", line 795, in _scroll_across_entities
response = self.execute_graphql(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/ingestion/graph/client.py", line 883, in execute_graphql
raise GraphError(f"Error executing graphql query: {result['errors']}")
datahub.configuration.common.GraphError: Error executing graphql query: [{'message': "The field at path '/scrollAcrossEntities/searchResults[0]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 0, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[1]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 1, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[2]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 2, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[3]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 3, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[4]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 4, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[5]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 5, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[6]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 6, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[7]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 7, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[8]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 8, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[9]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 9, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[10]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 10, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[11]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 11, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[12]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 12, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[13]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 13, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}]
Expected behavior
Step 3 from the section above should find and delete relevant DataProcessInstance objects.
Screenshots
Desktop (please complete the following information):
Describe the bug
It's impossible to delete DataProcessInstance objects in bulk (by filtering by entity-type). It either raises an error or doesn't find anything depending on whether you provide additional filter (e.g.
--plaftform=airflow
).Only delete by
--urn
works, which isn't convenient for managing Datahub content.To Reproduce
Expected behavior
Step 3 from the section above should find and delete relevant DataProcessInstance objects.
Screenshots
Desktop (please complete the following information):
Additional context
We tried to find some workaround for this problem by providing additional arguments or using GraphQL directly but got no luck. Here is a thread from Datahub's slack: https://datahubspace.slack.com/archives/C029A3M079U/p1715184338329459
The text was updated successfully, but these errors were encountered: