Hyperparameter Tuning
All Machine learning models contain hyperparameters which you can tune to change the way the learning occurs. For each machine learning model, the hyperparameters can be different, and different datasets require different hyperparameter setting and adjusting.
for example, in a decision tree classifier, some of the hyperparameters will be
- max_depth (tells you how deep you want the tree to go)
- min_samples_split(the minimum number of samples required to split an internal node)
- min_samples_leaf( the minimum number of samples required to be a leaf node)
Hyperparameters VS Parameters
As mentioned earlier the hyperparameters are all the knobs you turn and have control over, and are set during instantiation, and do not change during training. A lot of models also have ‘normal’ parameters that are learned during the training when you call the fit method. These parameters are different than hyperparameters. An example of a ‘normal’ parameter is the slope and intercept in Linear regression that is fit during the training.
Meta-estimator
We use GridSearchCV which is known as a meta-estimator to iterate over all the combinations of hyperparameters for a learning model given a sequence of hyperparameters. It computes cross-validates mean score and fits each combination on the data and finds which combination performs the best.
GridSearchCV is not a learning model by itself although it is a sci-kit-learn estimator. It follows the same three steps as an estimator would as it takes an estimator as input during the instantiation.
How to use GridSearchCV
firstly you must create a dictionary with the key-value being the hyperparameter and the pair-value being the value that the hyperparameter will be set to. The example I will use will be in a Decision Tree with max_depth and min_samples_leaf
grid = {'max_depth': [2,3,4,5],
'min_samples_leaf': [2, 5, 10, 20, 50, 100]}
in this case, we are saying we want to set the hyperparameters for ‘max_depth’ and ‘min_samples_leaf’ to each of those values and see how well each combination performs. It will look something like:
- Combination 1: max_depth = 2 and min_samples_leaf = 2
- Combination 2: max_depth = 2 and min_samples_leaf = 5
- Combination 3: max_depth = 2 and min_samples_leaf = 10
This will continue until all combinations have been exhausted. For each one we will see how well the model performs.
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV, KFold
dtr = DecisionTreeRegressor()
kf = KFold(n_splits=5, shuffle=True, random_state=123)
gs = GridSearchCV(dtr, grid, cv=kf)
We import the necessary libraries, instantiate our DecisionTree and apply KFold which allows us to split our data into test and training sets(read more on KFold Cross-Validation here: https://machinelearningmastery.com/k-fold-cross-validation/)
Then we instantiate the GridSearchCV and we input the Model, the grid(combination), and the CV
gs.fit(X, y)gs.best_params_gs.best_score_
We then fit the data to our GridSearchCV and we can find the best_params_ which tells us which combination was best(in this case it may be something like {‘max_depth’: 4, ‘min_samples_leaf’: 20}) and the best_score_ which tells us the best mean score of that combination.
If you are working with a lot of hyperparameters it may also be helpful to organize it all in a data frame by putting the cv_results_ inside a data frame like so:
df_results = pd.DataFrame(gs.cv_results_)
and for even further analysis you can plot the hyperparameter to see how the mean_test score performs as the hyperparameter increase(something to note in the cv_results the column will be named param_max_depth and not max_depth)
df_results.plot(x='param_max_depth', y='mean_test_score', kind='line', figsize=(10, 4));
you can see that at around param_max_depth = 3 the mean test score start to decrease so a good param_max_depth may be a value around 3–4
There is a lot of different hyperparameters you can choose from but be careful as the more hyperparameters you have the number of combinations will increase and the time it takes to test them will take even longer. It is best if you test it out with a small number of combinations and see which direction the results move in, it may be you notice that the combination of max_depth = 3 always performs best with other combinations so you can try to keep that constant while tuning the other hyperparameters.
Overall you can improve your model results by applying hyperparameter tuning and an easy way to do that is by using the meta-estimator GridSearchCV.