Text Pre-processing Basics with Pandas

2017-01-30T13:47:13-08:00May 10th, 2016|4 Comments

In this post, we'll take a look at the data provided in Kaggle's Home Depot Product Search Relevance challenge to demonstrate some techniques that may be helpful in getting started with feature generation for text data. Dealing with text data is considerably different than numerical data, so there are a few basic approaches that are an excellent place to start. As always, before we start creating features we'll need to clean and massage the data! In the Home Depot challenge, we have a few files which provide attributes and descriptions of each of the products on their website. The [...]