Hyperparameters tuning

In brief

Article summary

Hyperparameter tuning is the process of optimizing the settings that control the behavior of a machine learning model. Unlike model parameters, hyperparameters are set before training and significantly influence a model’s performance and efficiency.

Main takeaways

Hyperparameters govern the training process and structure of machine learning models.
Tuning involves finding the best combination of hyperparameter values to optimize performance.
Common techniques include grid search, random search, and advanced methods like Bayesian optimization and genetic algorithms.

Article contents

1 — Hyperparameters and tuning them

1.1 — What are hyperparameters?

Hyperparameters are the settings that define how a machine learning model is structured and trained. Unlike model parameters, which are learned directly from the data during training (e.g., weights in a neural network), hyperparameters must be specified before training begins and remain constant throughout the process. They govern key aspects of the model’s behavior, such as its complexity, training dynamics, or the structure of the algorithm.

Examples of hyperparameters include the learning rate for gradient descent, the depth of decision trees in a random forest, the number of clusters in kMeans, or the number of epochs for training a neural network. Choosing the right hyperparameter values is critical because they directly influence the model’s ability to learn patterns effectively, avoid overfitting, and generalize well to unseen data. Hyperparameter tuning is the process of systematically searching for the best combination of these values to optimize model performance.

1.2 — Why is hyperparameter tuning important?

Hyperparameter tuning is a critical step in machine learning that can dramatically improve a model’s performance. Hyperparameters differ from model parameters: while parameters are learned from the data during training (e.g., weights in a neural network), hyperparameters must be specified before training begins. Examples include the learning rate, regularization strength, number of trees in a random forest, or the number of clusters in kMeans.

The goal of hyperparameter tuning is to identify the combination of values that minimizes a chosen error metric (e.g., loss, mean squared error, or accuracy) on a validation dataset. Improperly tuned hyperparameters can lead to underfitting or overfitting, negatively impacting the model’s ability to generalize to new data.

2 — Key hyperparameter tuning techniques

2.1 — Grid search

Grid search is a straightforward and systematic method for hyperparameter tuning that involves exhaustively evaluating all possible combinations of hyperparameter values within a predefined search space. For each combination, the model is trained and validated, and the configuration that yields the best performance on a validation metric (e.g., accuracy, F1-score, or mean squared error) is selected.

For example, in a support vector machine (SVM), grid search might explore various combinations of kernel types, regularization parameters, and gamma values. While grid search is simple to implement and guarantees finding the best configuration within the search space, it becomes computationally expensive as the number of hyperparameters and their possible values increase. This method is best suited for smaller, well-defined hyperparameter spaces where exhaustive evaluation is feasible. To mitigate the computational cost, practitioners often combine grid search with parallel processing or use a coarse-to-fine approach by narrowing the search space iteratively.

2.2 — Random search

Random search is a hyperparameter tuning method that explores the hyperparameter space by sampling random combinations of values from predefined distributions. Unlike grid search, which evaluates all possible combinations systematically, random search selects configurations at random, allowing it to cover a broader range of the search space in fewer evaluations. This approach is especially effective when only a few hyperparameters have a significant impact on model performance, as it can often find near-optimal configurations faster than grid search.

For example, in tuning a neural network, random search might sample learning rates from a log-uniform distribution and the number of hidden units from a uniform distribution. While it may not guarantee the absolute best solution within the search space, random search is computationally more efficient and scales well with larger spaces. Additionally, its simplicity and flexibility make it a practical choice for initial explorations of hyperparameter tuning.

2.3 — Bayesian optimization

Bayesian optimization is an advanced hyperparameter tuning technique that uses probabilistic models to intelligently explore the hyperparameter space. Unlike grid or random search, which blindly evaluate configurations, Bayesian optimization builds a surrogate model (often using Gaussian Processes or Tree-structured Parzen Estimators) to approximate the relationship between hyperparameters and the performance metric. It then uses this surrogate model to identify promising hyperparameter combinations, balancing exploration of unknown areas and exploitation of high-performing regions.

For example, when tuning a neural network, Bayesian optimization might predict the performance of a learning rate and number of layers combination before training the model. This approach significantly reduces the number of evaluations needed to find optimal or near-optimal hyperparameters, making it ideal for expensive-to-train models like deep neural networks. However, Bayesian optimization can be computationally demanding for very high-dimensional spaces or noisy datasets, but its efficiency and ability to focus on the most promising regions make it a powerful tool for hyperparameter tuning.

2.4 — Genetic algorithms

Genetic algorithms (GAs) are a hyperparameter tuning method inspired by the principles of natural evolution, such as selection, mutation, and crossover. They treat hyperparameter optimization as an evolutionary process, where each “individual” in a population represents a set of hyperparameters. The algorithm starts by randomly initializing a population of configurations, which are evaluated based on their performance (fitness) on a validation metric. The best-performing configurations are then “selected” to form the basis for the next generation, with new configurations generated through processes like crossover (combining aspects of two configurations) and mutation (introducing small random changes).

This iterative process continues until a predefined stopping criterion is met, such as a maximum number of generations or convergence to an optimal solution. Genetic algorithms are particularly effective for exploring large and complex hyperparameter spaces, as they can discover unconventional yet effective combinations. However, they can be computationally expensive, as they require multiple evaluations per generation, and careful tuning of GA-specific parameters, like mutation rate and population size, is necessary for optimal performance.

3 — Practical tips for hyperparameter tuning

Here are some practical tips to help you navigate the hyperparameter tuning process effectively:

Start with sensible defaults – Begin by using default hyperparameter values provided by the algorithm’s library. Many libraries like scikit-learn and XGBoost offer well-tested defaults that work reasonably well for most datasets.
Understand the impact of key hyperparameters – Focus on tuning the most influential hyperparameters first. For example:
- For decision trees, prioritize max_depth and min_samples_split.
- For neural networks, focus on the learning rate, number of layers, and units per layer.
- For SVMs, concentrate on the kernel type, C, and gamma.
Use a coarse-to-fine strategy – Start with a broad search across a large range of hyperparameter values to identify promising regions. Then narrow the search to fine-tune the optimal values in these regions.
Evaluate with cross-validation – Use cross-validation to ensure that the selected hyperparameters generalize well to unseen data. This reduces the risk of overfitting to the validation set.
Automate with libraries – Leverage tools like:
- Scikit-learn (GridSearchCV or RandomizedSearchCV for grid/random search).
- Optuna or Hyperopt for advanced methods like Bayesian optimization.
- Ray Tune for distributed and scalable hyperparameter tuning.
Log and track experiments – Use experiment tracking tools like MLflow or Weights & Biases to log hyperparameter configurations, training metrics, and results. This helps in comparing experiments systematically.
Balance exploration and exploitation – Use methods like Bayesian optimization to intelligently search the hyperparameter space. These methods help focus on promising regions while occasionally exploring new areas.
Parallelize when possible – Take advantage of parallel processing or cloud services to evaluate multiple configurations simultaneously, saving time during computationally expensive searches.
Use learning curves for early stopping – When tuning iterative models like neural networks, monitor performance metrics (e.g., validation loss) during training. Use early stopping to avoid wasting time on poorly performing configurations.
Normalize and scale features – Many algorithms, such as SVMs and neural networks, are sensitive to the scale of input features. Ensure data is normalized or standardized before tuning hyperparameters.
Leverage domain knowledge – Incorporate knowledge about the data and task to set reasonable bounds for hyperparameters. For example, the number of clusters in kMeans can be informed by prior understanding of the dataset.
Avoid overfitting – Regularize models by tuning hyperparameters like alpha in Lasso or Ridge regression, or lambda in XGBoost. Use dropout or L2 regularization in neural networks.
Iterate and refine – Hyperparameter tuning is an iterative process. Use insights from earlier rounds to refine your search strategy and focus on configurations that improve performance.

To go further

Looks like this section is empty!

Anything you would have liked to see here? Let us know on the Discord server! Maybe we can add it quickly. Otherwise, it will help us improve the course for next year!

To go beyond

Neural architecture search.

Wikipedia article on neural architecture search, an advanced technique for automating the design of neural network architectures.