Reddit Sentiment Analysis with Apache Kafka-Based Microservices

Streaming Audio: Apache Kafka® & Real-Time Data

Treść dostarczona przez Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka®. Cała zawartość podcastów, w tym odcinki, grafika i opisy podcastów, jest przesyłana i udostępniana bezpośrednio przez Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® lub jego partnera na platformie podcastów. Jeśli uważasz, że ktoś wykorzystuje Twoje dzieło chronione prawem autorskim bez Twojej zgody, możesz postępować zgodnie z procedurą opisaną tutaj https://pl.player.fm/legal.

1+ y ago 35:23

MP3•Źródło odcinka

How do you analyze Reddit sentiment with Apache Kafka® and microservices? Bringing the fresh perspective of someone who is both new to Kafka and the industry, Shufan Liu, nascent Developer Advocate at Confluent, discusses projects he has worked on during his summer internship—a Cluster Linking extension to a conceptual data pipeline project, and a microservice-based Reddit sentiment-analysis project. Shufan demonstrates that it’s possible to quickly get up to speed with the tools in the Kafka ecosystem and to start building something productive early on in your journey.

Shufan's Cluster Linking project extends a demo by Danica Fine (Senior Developer Advocate, Confluent) that uses a Kafka-based data pipeline to address the challenge of automatic houseplant watering. He discusses his contribution to the project and shares details in his blog—Data Enrichment in Existing Data Pipelines Using Confluent Cloud.

The second project Shufan presents is a sentiment analysis system that gathers data from a given subreddit, then assigns the data a sentiment score. He points out that its results would be hard to duplicate manually by simply reading through a subreddit—you really need the assistance of AI. The project consists of four microservices:

A user input service that collects requests in a Kafka topic, which consist of the desired subreddit, along with the dates between which data should be collected
An API polling service that fetches the requests from the user input service, collects the relevant data from the Reddit API, then appends it to a new topic
A sentiment analysis service that analyzes the appended topic from the API polling service using the Python library NLTK; it calculates averages with ksqlDB
A results-displaying service that consumes from a topic with the calculations

Interesting subreddits that Shufan has analyzed for sentiment include gaming forums before and after key releases; crypto and stock trading forums at various meaningful points in time; and sports-related forums both before the season and several games into it.

EPISODE LINKS

265 odcinków

#Tech #Tech News #News #Confluent #Event Stream Processing #Data #Event Driven Architecture #Open Source #Data In Motion #Kafka Cloud Native #Data Mesh #Data Pipeline #Serverless Kafka #Podcasting Education #Confluent, original creators of Apache Kafka® #original creators of Apache Kafka® #Apache Kafka® #Cloud IT #Real Time