Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataProcessInstance delete doesn't work (crash) #10538

Open
obaltian opened this issue May 18, 2024 · 0 comments
Open

DataProcessInstance delete doesn't work (crash) #10538

obaltian opened this issue May 18, 2024 · 0 comments
Labels
bug Bug report

Comments

@obaltian
Copy link
Contributor

obaltian commented May 18, 2024

Describe the bug
It's impossible to delete DataProcessInstance objects in bulk (by filtering by entity-type). It either raises an error or doesn't find anything depending on whether you provide additional filter (e.g. --plaftform=airflow).

Only delete by --urn works, which isn't convenient for managing Datahub content.

To Reproduce

  1. Deploy datahub locally:
datahub docker quickstart
  1. Ingest sample job & its "start" event:
from datahub.api.entities.datajob import DataFlow, DataJob
from datahub.api.entities.dataprocess.dataprocess_instance import DataProcessInstance
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

graph = DataHubGraph(DatahubClientConfig(server="http://localhost:8080"))

flow = DataFlow(env="prod", orchestrator="airflow", id="flow_api_simple")
flow.emit(graph)
job = DataJob(flow_urn=flow.urn, id="job1", name="My Job 1")
job.emit(graph)
run = DataProcessInstance.from_datajob(datajob=job, id=f"{flow.id}-1")
run.emit(graph)

# optionally, DataProcessInstance is created event without start
import time
run.emit_process_start(graph, int(time.time() * 1000))
  1. Try to delete info about job run using CLI:
datahub delete --platform airflow --entity-type dataProcessInstance
# outputs: 
[2024-05-18 18:54:15,396] INFO     {datahub.cli.delete_cli:341} - Using DataHubGraph: # configured to talk to http://localhost:8080
Found no urns to delete. Maybe you want to change your filters to be something different?

datahub delete --entity-type dataProcessInstance
# outputs
[2024-05-18 18:53:54,557] INFO     {datahub.cli.delete_cli:341} - Using DataHubGraph: configured to talk to http://localhost:8080
[2024-05-18 18:53:55,059] ERROR    {datahub.entrypoints:205} - Command failed: Error executing graphql query: [{'message': "The field at path '/scrollAcrossEntities/searchResults[0]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 0, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[1]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 1, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[2]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 2, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[3]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 3, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[4]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 4, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[5]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 5, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[6]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 6, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[7]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 7, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[8]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 8, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[9]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 9, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[10]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 10, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[11]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 11, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[12]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 12, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[13]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 13, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}]
Traceback (most recent call last):
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/entrypoints.py", line 192, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/upgrade/upgrade.py", line 396, in async_wrapper
    loop.run_until_complete(run_func_check_upgrade())
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/upgrade/upgrade.py", line 383, in run_func_check_upgrade
    ret = await main_func_future
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/upgrade/upgrade.py", line 378, in run_inner_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/telemetry/telemetry.py", line 454, in wrapper
    raise e
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/telemetry/telemetry.py", line 403, in wrapper
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/cli/delete_cli.py", line 367, in by_filter
    urns = list(
           ^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/ingestion/graph/client.py", line 782, in get_urns_by_filter
    for entity in self._scroll_across_entities(graphql_query, variables):
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/ingestion/graph/client.py", line 795, in _scroll_across_entities
    response = self.execute_graphql(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/obaltian/maklai/datacatalog-ingestion/.venv/lib/python3.12/site-packages/datahub/ingestion/graph/client.py", line 883, in execute_graphql
    raise GraphError(f"Error executing graphql query: {result['errors']}")
datahub.configuration.common.GraphError: Error executing graphql query: [{'message': "The field at path '/scrollAcrossEntities/searchResults[0]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 0, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[1]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 1, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[2]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 2, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[3]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 3, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[4]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 4, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[5]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 5, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[6]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 6, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[7]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 7, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[8]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 8, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[9]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 9, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[10]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 10, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[11]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 11, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[12]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 12, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}, {'message': "The field at path '/scrollAcrossEntities/searchResults[13]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'", 'path': ['scrollAcrossEntities', 'searchResults', 13, 'entity'], 'extensions': {'classification': 'NullValueInNonNullableField'}}]

Expected behavior
Step 3 from the section above should find and delete relevant DataProcessInstance objects.

Screenshots
Screenshot 2024-05-18 at 19 05 36

Desktop (please complete the following information):

  • OS: Mac OS
  • Browser Safari (version 17.4.1
  • acryl-datahub, version 0.13.1.3

Additional context
We tried to find some workaround for this problem by providing additional arguments or using GraphQL directly but got no luck. Here is a thread from Datahub's slack: https://datahubspace.slack.com/archives/C029A3M079U/p1715184338329459

@obaltian obaltian added the bug Bug report label May 18, 2024
@obaltian obaltian changed the title DataProcessInstance delete doesn't work DataProcessInstance delete doesn't work (crash) May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

1 participant