Using Category Encoders library in Scikit-learn

2023-01-16T16:36:58-08:00December 15th, 2016|1 Comment

I recently found a relatively new library on github for handling categorical features named categorical_encoding and decided to give it a spin. As a reminder - categorical features are variables in your data that have a finite (ideally small) set of possible values, for example months of the year or hair color. You can't feed these into predictive models as raw text, so some conversion is necessary to prepare these variables to be useable. Typically, you create a new, separate column for each possible value (or alternately depending on the intended model, n-1 values) and each of these new [...]