random forest

3 Dec, 2014

Kaggle Titanic Competition Part VIII – Hyperparameter Optimization

2023-01-16T21:20:38-08:00December 3rd, 2014|0 Comments

In the last post, we generated our first Random Forest model with mostly default parameters so that we could get an idea of how important the features are. From that we can further reduce the dimensionality of our data set by throwing out some arbitrary amount of the weakest features. We could continue experimenting with the threshold with which to remove "weak" features, or even go back and experiment with the correlation and PCA thresholds as well to modify how many parameters we end up with... but we'll move forward with what we've got. Now that we've got our [...]

1 Dec, 2014

Kaggle Titanic Competition Part VII – Random Forests and Feature Importance

2023-01-16T21:20:01-08:00December 1st, 2014|0 Comments

In the last post we took a look at how reduce noisy variables from our data set using PCA, and today we'll actually start modelling! Random Forests are one of the easiest models to run, and highly effective as well. A great combination for sure. If you're just starting out with a new problem, this is a great model to quickly build a reference model. There aren't a whole lot of parameters to tune, which makes it very user friendly. The primary parameters include how many decision trees to include in the forest, how much data to include in [...]

Go to Top