SQL stream processing, analytics, and management. We decouple storage and compute to offer instant failover, dynamic scaling, speedy bootstrapping, and efficient joins.
-
Updated
Jun 12, 2024 - Rust
SQL stream processing, analytics, and management. We decouple storage and compute to offer instant failover, dynamic scaling, speedy bootstrapping, and efficient joins.
PawMark is a platform for developers to build, schedule and monitor data pipelines.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
My personal project for data engineering zoomcamp
Cool DE Projects
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee
Workflow Engine for Kubernetes
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Open Source Feature Flagging and A/B Testing Platform
"api-to-dataframe" is a Python library that simplifies obtaining data from API endpoints by converting them directly into Pandas DataFrames. This library offers robust features, including retry strategies for failed requests and automatic generation of detailed reports on the received data.
Distributed DataFrame for Python designed for the cloud, powered by Rust
Apache Superset is a Data Visualization and Data Exploration Platform
Snowflake Snowpark Python API
Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.
An orchestration platform for the development, production, and observation of data assets.
SageWorks: An easy to use Python API for creating and deploying AWS SageMaker Models
Turns Data and AI algorithms into production-ready web applications in no time.
A Fast, Declarative, and Extensible ETL Framework for Graph Databases.
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."