Imbalanced Product Review Sentiment Classification

<!DOCTYPE html> Class Imbalance and Problem Statement Class imbalance is a common problem when building classifiers in the machine learning world, and our awesome previously-scraped croc reviews data is unfortunately not so awesome from a class balance standpoint. Soon, we’ll assign binary class labels based on the rating a customer gave with their review where we’ll consider ratings of 2 stars (out of 5) or less to be negative sentiment and the remaining reviews as positive sentiment. As you’ll see in a moment, the vast majority of reviews belong to the positive sentiment class, and I think that’s great! ...

August 29, 2024 · 13 min · Chandler Underwood

Running Data Dashboard

<!DOCTYPE html> About This post contains the dashboard I built to go along with my data cleaning project where I clean up my own running data from college. Shoutout to Andy Kriebel and his awesome YouTube video for getting me started! The dashboard allows for dynamic exploration of the time series data from years down to days. I would recommend opening up the visualization to full screen, so you can see the “timeframe snapshots” along with my most important running stats. ...

August 14, 2024 · 1 min · Chandler Underwood

Using Unsupervised ML to 'Typicalize' Product Reviews

<!DOCTYPE html> Motivation In my last post where I scraped reviews for Crocs Clogs, I mentioned that I often find myself wishing for a succinct summary of the reviews for a product. Let’s flesh that out a bit more. What I mean when I say “succinct summary” is that I want a quick understanding of a specific aspect for a given product. For example, I know that crocs come in amazing colors already. I can see that in the photos. But, how do they fit? What about their comfort? I find myself often most concerned with a specific aspect of a product such as those. I want to know what people are typically saying about fit and comfort. Many retailers offer a search bar for reviews, so you can filter reviews on a keyword. BUT, searching for “fit” across all crocs reviews would return a ton of samples, and how can we know which ones are representative of the general sentiment people have in regards to fit? What if we could give consumers a snapshot of the reviews containing a word or phrase they search for? Could we show them a small set of reviews that best represent all the reviews that mention the word “fit”, for example? I think we can! ...

July 10, 2024 · 16 min · Chandler Underwood

Building a Product Reviews Webscraper

<!DOCTYPE html> Motivation Before we get into things, here’s a link to download the dataset if you would like. Do you ever find yourself pouring over a product’s reviews trying to decide if it’s right for you? I sure do. Many of these times I’ve wished there was a quick summary of the reviews I could read to speed up the decision process. I’ve yet to see any online retailers doing exactly what I’m looking for, so I’ve decided to make my own review summarizer. But, we need some data for training such a tool. ...

June 15, 2024 · 6 min · Chandler Underwood

Cleaning Up Years of Daily Running Data

<!DOCTYPE html> Introduction For those of you that don’t know me personally, I managed to run collegiately for 6 years (thanks Covid?). Over that time I logged a lot of miles and a lot of Garmin activities. I still run quite a bit, but my barn burning days are behind me. I’d like to build a dashboard to get some insights to my running trends during that time as a sort of “last hoorah”, but sadly, a lot of my running data is missing and messy. I think cleaning it up will make for a great project to test my skills! Follow along here as I clean up and fill-in my running data using various techniques such as pulling outside data sources and training some ML models to predict missing values. ...

May 3, 2024 · 13 min · Chandler Underwood