Building a Real-Time Social Media Streaming Pipeline

This small project focuses on the use of Apache Kafka and Python to learn more on event streaming.

Overview

In this project, we will build a real-time social media streaming pipeline using Apache Kafka and Python. The pipeline will enable the ingestion and processing of sample social media posts and engagements, empowering us to perform dynamic analyses such as sentiment analysis and engagement metrics calculation.

Steps followed

0. Installation & Setup

Website: https://kafka.apache.org/downloads

# Setup in our PATH (e.g. in the .bashrc)
export KAFKA_HOME=/path/to/kafka
export PATH=$KAFKA_HOME/bin:$PATH

1. Start the Kafka environment

# Start ZooKeeper
apache_kafka_folder/bin/zookeeper-server-start.sh apache_kafka_folder/config/zookeeper.properties

# Start Kafka
apache_kafka_folder/bin/kafka-server-start.sh apache_kafka_folder/config/server.properties

2. Write a Python script to generate a sample social media data

I use the Python library faker to generate fake data.

For more details, go check the docs or GitHub.

Check the content of generate_social_media_data.py.

3. Create a Kafka topic to store events

apache_kafka_folder/bin/kafka-topics.sh --create --topic social_media_stream --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

4. Implement the Kafka Producer and Consumer

Producer: those client applications that publish (write) events to Kafka.
Consumer: those that subscribe to (read and process) these events.

Python library confluent-kafka is used to create basic clients. For more details, go check the docs or GitHub.

Check the content of kafka_producer.py and kafka_consumer.py.

5. Run Python files

First, generate the social media data.
Simultaneously, the Kafka scripts in order to see the data ingestion and processing in real-time.

Extra: Setup of pre-commit

pip install pre-commit

Once the .pre-commit-config.yaml completed, we need to set up the git hooks scripts.

pre-commit install

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
generate_social_media_data.py		generate_social_media_data.py
kafka_consumer.py		kafka_consumer.py
kafka_producer.py		kafka_producer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

.pre-commit-config.yaml

.pre-commit-config.yaml

README.md

README.md

generate_social_media_data.py

generate_social_media_data.py

kafka_consumer.py

kafka_consumer.py

kafka_producer.py

kafka_producer.py

Repository files navigation

Building a Real-Time Social Media Streaming Pipeline

Overview

Steps followed

0. Installation & Setup

1. Start the Kafka environment

2. Write a Python script to generate a sample social media data

3. Create a Kafka topic to store events

4. Implement the Kafka Producer and Consumer

5. Run Python files

Extra: Setup of pre-commit

About

Releases

Packages

Languages

mokwilliam/real-time-social-media-streaming

Folders and files

Latest commit

History

Repository files navigation

Building a Real-Time Social Media Streaming Pipeline

Overview

Steps followed

0. Installation & Setup

1. Start the Kafka environment

2. Write a Python script to generate a sample social media data

3. Create a Kafka topic to store events

4. Implement the Kafka Producer and Consumer

5. Run Python files

Extra: Setup of pre-commit

About

Resources

Stars

Watchers

Forks

Languages