Blog | Jack Leasure

Reddit Sentiment Analysis

This is the start of a project for a machine learning pipeline for performing sentiment analysis on Reddit posts and comments. It begins by using the Reddit API (via the praw library) to collect text data from selected subreddits, either from posts or comments. The raw text is then cleaned and preprocessed by removing URLs, punctuation, and common stopwords, and converting everything to lowercase. Once cleaned, the data is vectorized using a TF-IDF approach and fed into a logistic regression classifier to train a sentiment model based on labeled data (e.g., positive, negative, or neutral). After training, the model is evaluated using accuracy and classification metrics, then saved to disk for reuse. The script also includes functionality to load this trained model and predict sentiment on new, unlabeled Reddit data. This setup allows for scalable sentiment monitoring of Reddit communities and could be extended for use in dashboards, research, or moderation tools.

Interactive Python Map

After my first blog post where I calculated the driving distance to four office locations using Python, I decided it would be a good idea to extend the project and build an interactive visual to display the data. I settled on creating a map in Python. The idea was to create a map that contained data points of every zip code in the first blog, the four office locations, and some way to display the driving distance I previously calculated.

Driving Distance to Office Locations

For my first post on my blog, I wanted to come up with a practical project that has real world applications. My initial goal is to improve my coding skills in both Python as I am relatively new to the language.