#

Apache Spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 8,306 public repositories matching this topic...

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

python java r scala sql big-data spark jdbc

Updated Jun 2, 2024
Scala

apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator

rust spark arrow datafusion

Updated Jun 2, 2024
Rust

SuperCowPowers / sageworks

SageWorks: An easy to use Python API for creating and deploying AWS SageMaker Models

python aws machine-learning big-data spark pandas data-engineering

Updated Jun 2, 2024
Python

fabiogouw / spark-aws-messaging

A custom sink provider for Apache Spark that sends the content of a dataframe to an AWS SQS

spark aws-sqs spark-sql

Updated Jun 2, 2024
Java

ytsaurus / ytsaurus

YTsaurus is a scalable and fault-tolerant open-source big data platform.

sql big-data spark clickhouse distributed-database lakehouse olap-database ytsaurus

Updated Jun 2, 2024
C++

mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated Jun 2, 2024
Python

alvertogit / bigdata_docker

Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook

python docker data-science machine-learning scala big-data spark jupyter-notebook jupyter-lab spark3

Updated Jun 2, 2024
Python

Ophiase / Big-Data-Project-IFEBY310

Analysis website of the New York Shared Bike systems (Citibikes 🚲️) dataset. Extract Load Transform using pyspark in parquet format.

Updated Jun 2, 2024
Jupyter Notebook

iimeta / fastapi-admin

智元 Fast API 是一站式API管理系统，将各类大模型API进行统一格式、统一规范、统一管理，使其在功能、性能和用户体验上达到极致。

api fast spark openai glm gpt fastapi gpt-4 chatgpt ernie-bot qwen

Updated Jun 2, 2024
Go

nessie

projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

git java data spark aws-lambda iceberg

Updated Jun 2, 2024
Java

iimeta / fastapi-web

智元 Fast API 是一站式API管理系统，将各类大模型API进行统一格式、统一规范、统一管理，使其在功能、性能和用户体验上达到极致。

api fast spark openai glm gpt fastapi gpt-4 chatgpt ernie-bot qwen

Updated Jun 2, 2024
Vue

iimeta / fastapi

智元 Fast API 是一站式API管理系统，将各类大模型API进行统一格式、统一规范、统一管理，使其在功能、性能和用户体验上达到极致。

api fast spark openai glm gpt fastapi gpt-4 chatgpt ernie-bot qwen

Updated Jun 2, 2024
Go

iimeta / fastapi-sdk

智元 Fast API 是一站式API管理系统，将各类大模型API进行统一格式、统一规范、统一管理，使其在功能、性能和用户体验上达到极致。

api fast spark openai glm gpt fastapi gpt-4 chatgpt ernie-bot qwen

Updated Jun 2, 2024
Go

apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery real-time sql database spark hive hadoop etl snowflake olap query-engine redshift dbt elt iceberg hudi delta-lake lakehouse

Updated Jun 2, 2024
Java

xuwenyihust / DataPulse

DataPulse is a platform for developers to build, schedule and monitor data pipelines.

kubernetes workflow spark jupyter-notebook gcp orchestration data-engineering data-platform mlflow delta-lake

Updated Jun 2, 2024
JavaScript

uni-openai / uniai-maas

An opensource AI & model as a service platform.

ai spark gpt moonshot midjourney chatgpt stability-ai chatglm uniai kimichat

Updated Jun 2, 2024
TypeScript

scalactic / spark-insert-into-tests

spark-insert-into-tests

sql spark table scalatest hms insertinto

Updated Jun 2, 2024
Scala

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

big-data spark analytics acid delta-lake

Updated Jun 2, 2024
Scala

AliMarzouk / Paris-AQ

ELTL pipeline to monitor air quality in the Paris Île-de-France area

bigquery airflow big-data spark gcs airquality dataengineering

Updated Jun 2, 2024
Python

tobymao / sqlglot

Python SQL Parser and Transpiler

Updated Jun 2, 2024
Python

Created by Matei Zaharia

Released May 26, 2014

Followers: 417 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics