big-data

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

python aws data-science machine-learning caffe theano big-data spark deep-learning hadoop tensorflow numpy scikit-learn keras pandas kaggle scipy matplotlib mapreduce

Updated Mar 20, 2024
Python

apache / flink

Star

Apache Flink

python java scala sql big-data flink

Updated Dec 4, 2024
Java

amark / gun

Sponsor

Star

An open source cybersecurity protocol for syncing decentralized graph data.

Updated Dec 1, 2024
JavaScript

prestodb / presto

Star

The official home of the Presto distributed SQL query engine for big data

java data query sql big-data presto hive hadoop lakehouse

Updated Dec 4, 2024
Java

heibaiying / BigData-Notes

Star

大数据入门指南 ⭐

phoenix scala kafka big-data spark yarn hive hadoop storm bigdata hbase zookeeper hdfs mapreduce flume azkaban sqoop

Updated Jan 5, 2024
Java

questdb / questdb

Star

QuestDB is an open source time-series database for fast ingest and SQL queries

java iot postgres sql database big-data time-series analytics cpp grafana postgresql simd low-latency financial-analysis tsdb hacktoberfest time-series-database questdb

Updated Dec 4, 2024
Java

andkret / Cookbook

Star

The Data Engineering Cookbook

big-data best-practices cookbook data-engineering data-engineer

Updated Nov 28, 2024
Python

apache / predictionio

Star

PredictionIO, a machine learning server for developers and ML engineers.

scala big-data predictionio

Updated Jan 9, 2021
Scala

yahoo / CMAK

Star

CMAK is a tool for managing Apache Kafka clusters

scala kafka big-data cluster-management

Updated Aug 2, 2023
Scala

vesoft-inc / nebula

Star

A distributed, fast open-source graph database featuring horizontal scalability and high availability

distributed-systems database big-data cpp graph raft scalability distributed graph-database graphdb hacktoberfest nebula nebula-graph nebulagraph

Updated Nov 18, 2024
C++

trinodb / trino

Star

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

java distributed-systems data-science sql database big-data presto hive hadoop analytics jdbc databases distributed-database query-engine iceberg datalake prestodb trino delta-lake

Updated Dec 4, 2024
Java

provectus / kafka-ui

Star

Open-Source Web UI for Apache Kafka Management

opensource kafka big-data web-ui streams kafka-connect apache-kafka kafka-producer kafka-client kafka-streams hacktoberfest streaming-data kafka-manager kafka-cluster event-streaming cluster-management kafka-ui kafka-brokers

Updated Jul 26, 2024
Java

cython / cython

Star

The most widely used Python to C compiler

python c performance big-data cpp cython cpython cpython-extensions

Updated Nov 27, 2024
Python

StarRocks / starrocks

Star

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Updated Dec 4, 2024
Java

quickwit-oss / quickwit

Star

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

rust open-source search-engine big-data logs cloud-storage cloud-native log-management distributed-tracing tantivy

Updated Dec 4, 2024
Rust

catboost / catboost

Star

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

python data-science machine-learning data-mining tutorial r big-data gpu cuda kaggle gbdt gbm gpu-computing decision-trees gradient-boosting coreml catboost categorical-features

Updated Dec 5, 2024
Python

apache / beam

Star

Apache Beam is a unified programming model for Batch and Streaming data processing.

python java golang streaming sql big-data beam batch

Updated Dec 5, 2024
Java

Improve this page

Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

big-data

Here are 4,311 public repositories matching this topic...

binhnguyennus / awesome-scalability

apache / spark

ClickHouse / ClickHouse

donnemartin / data-science-ipython-notebooks

apache / flink

amark / gun

prestodb / presto

heibaiying / BigData-Notes

questdb / questdb

andkret / Cookbook

apache / predictionio

yahoo / CMAK

vesoft-inc / nebula

trinodb / trino

provectus / kafka-ui

cython / cython

StarRocks / starrocks

quickwit-oss / quickwit

catboost / catboost

apache / beam

Improve this page

Add this topic to your repo