Reddit Sentiment Analysis

This is the start of a project for a machine learning pipeline for performing sentiment analysis on Reddit posts and comments. It begins by using the Reddit API (via the praw library) to collect text data from selected subreddits, either from posts or comments. The raw text is then cleaned and preprocessed by removing URLs, punctuation, and common stopwords, and converting everything to lowercase. Once cleaned, the data is vectorized using a TF-IDF approach and fed into a logistic regression classifier to train a sentiment model based on labeled data (e.g., positive, negative, or neutral). After training, the model is evaluated using accuracy and classification metrics, then saved to disk for reuse. The script also includes functionality to load this trained model and predict sentiment on new, unlabeled Reddit data. This setup allows for scalable sentiment monitoring of Reddit communities and could be extended for use in dashboards, research, or moderation tools.

 

 

Add comment

Comments

There are no comments yet.