Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,306 public repositories matching this topic...
SageWorks: An easy to use Python API for creating and deploying AWS SageMaker Models
-
Updated
Jun 2, 2024 - Python
YTsaurus is a scalable and fault-tolerant open-source big data platform.
-
Updated
Jun 2, 2024 - C++
🧙 Build, run, and manage data pipelines for integrating and transforming data.
-
Updated
Jun 2, 2024 - Python
Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook
-
Updated
Jun 2, 2024 - Python
DataPulse is a platform for developers to build, schedule and monitor data pipelines.
-
Updated
Jun 2, 2024 - JavaScript
An opensource AI & model as a service platform.
-
Updated
Jun 2, 2024 - TypeScript
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
Jun 2, 2024 - Scala
ELTL pipeline to monitor air quality in the Paris Île-de-France area
-
Updated
Jun 2, 2024 - Python
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 417 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia