Artificial intelligence

Programming – Session 5

OOP and ML using sklearn
Model ensembling
Hyperparameters tuning

OOP and ML using sklearn

Using `sklearn` for ML models?

Training phase

The training phase is implemented using a m.fit() method that must be called first on model m
It sets up the parameters of the model using the training data

Inference phase

Supervised learning: A m.predict(X) method is used to generate predictions of labels using data X
Unsupervised learning: A m.transform(X) method may be defined to put data into a new space (e.g., distances to clusters, or a new coordinate system)
In the specific case of clustering, the class may also implement a m.predict(X) method to assign cluster labels to input data
A m.score(X, y) method may also be implemented to generate a performance measure given data and labels

Model ensembling

Many models voting

Principle

Aggregate the predicitions of (many) different classifiers to boost overall performance

Training

The different classifiers need to be trained independently
Ideally, each classifier is trained on a different subset of the training dataset

Inference

Inference is performed separately on each classifier
The final prediction is a majority vote between the predicitions of all the classifiers

Hyperparameters tuning

What are hyperparameters?

Motivation

Hyperparameters (HP) are used to setup machine learning algorithms
A few examples from algorithmics session 6:
- Number of nearest neighbors ($K$) to consider for a $K$-NN
- Number of clusters in ($K$) in $K$-means
HP can be difficult to find, motivating the need for methods that can search for best HP

Remark

HP are different from the parameters of the model, which are learned during the training phase

Hyperparameters tuning

Finding values for hyperparameters

Grid search with cross-validation

A “grid” of hyperparameters is defined using all possibilities of parameter combinations (e.g., a range of values of $K$ for a KNN)
We consider that the grid contains $P$ possibilities

Cross-validation (CV):
- Generate $P$ train and test subsets using the original training dataset
- Keep the model and HP that achieves the best score of the validation subset

Many other methods exist to tune hyperparameters, such as random search (see articles)

Recap of the session

Main elements to remember

Tools such as sklearn use object-oriented programming to define classes that are well suited to machine learning
Model ensembling methods combine multiple models to improve the quality of predictions
Hyperparameter tuning is the process of finding the best hyperparameters for a model
Many approaches exist for ensembling and hyperparameter tuning (check the articles for more details)

Artificial intelligence

Programming – Session 5

OOP and ML using sklearn

Using `sklearn` for ML models?

Training phase

Inference phase

Model ensembling

Many models voting

Principle

Training

Inference

Hyperparameters tuning

What are hyperparameters?

Motivation

Remark

Hyperparameters tuning

Finding values for hyperparameters

Grid search with cross-validation

Recap of the session

Main elements to remember

Recap of the session

What’s next?

Practical activity (~2h30)

OOP & sklearn

After the session

Evaluation

Artificial intelligence

Programming – Session 5

OOP and ML using sklearn

Using sklearn for ML models?

Training phase

Inference phase

Model ensembling

Many models voting

Principle

Training

Inference

Hyperparameters tuning

What are hyperparameters?

Motivation

Remark

Hyperparameters tuning

Finding values for hyperparameters

Grid search with cross-validation

Recap of the session

Main elements to remember

Recap of the session

What’s next?

Practical activity (~2h30)

OOP & sklearn

After the session

Evaluation

Using `sklearn` for ML models?