Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Superset ingestor is inefficient to ingest large amount of data #10536

Open
jeff-xu-z opened this issue May 18, 2024 · 1 comment
Open

Superset ingestor is inefficient to ingest large amount of data #10536

jeff-xu-z opened this issue May 18, 2024 · 1 comment
Labels
bug Bug report

Comments

@jeff-xu-z
Copy link

jeff-xu-z commented May 18, 2024

Describe the bug
Superset ingestor shall have better efficiency to handle large amount of data or access_token could expire before ingestion completes.

To Reproduce
Ingestion context:

  • The system I work on has relatively large amount of data.
    • ~17k datasets
    • ~3k dashboards
    • ~30k charts
  • My Superset's access token TTL is 15 minutes.

The snippet below represents my superset ingestion

source:
  type: superset
  config:
    # Coordinates
    connect_uri: https://<hostname>

    # Credentials
    username: <username>
    password: <password>

sink:
  type: file
  config:
    filename: /tmp/superset.json

Running "datahub ingest -c /tmp/superset.yml" will end with "401 Unauthorized" roughly 15 minutes after the ingestion starts on my system.

Expected behavior
Superset ingestion should complete successfully.

Desktop (please complete the following information):

  • OS: Linux
  • Browser Chrome
  • Datahub: 0.13.0

Additional context
N/A

@jeff-xu-z jeff-xu-z added the bug Bug report label May 18, 2024
@jeff-xu-z
Copy link
Author

I shall be able to contribute a proposed fix/improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

1 participant