Practical session
Duration1h30Presentation & objectives
The goal of this session is put in practice object-oriented programming in the context of artificial intelligence, in particular with supervised learning approaches.
The aim of this session is to help you master important notions in computer science. An intelligent programming assistant such as GitHub Copilot, that you may have installed already, will be able to provide you with a solution to these exercises based only on a wisely chosen file name.
For the sake of training, we advise you to disable such tools first.
At the end of the practical activity, we suggest you to work on the exercise again with these tools activated. Following these two steps will improve your skills both fundamentally and practically.
Also, we provide you the solutions to the exercises. Make sure to check them only after you have a solution to the exercises, for comparison purpose! Even if you are sure your solution is correct, please have a look at them, as they sometimes provide additional elements you may have missed.
1 — Implement a generic classifier using OOP
Implement an object that will be a generic classifier with methods fit
, predict
and score
.
The class must be defined in a file named classifiers.py
and be named GenericClassifier.
The goal of this object is to define how a classifier should behave, and we will use this class in the next exercices to actually implement machine learning algorithms.
- The class must have a constructor without parameters.
- The method
fit
takes two parametersX
andy
(both of typenp.ndarray
) and will be used to train the model (here, it does not do anything). - The method
fit
modifies a private boolean attributeself._isfitted
of the class to indicate that the model has been trained. - The method
predict
takes a single parameterX
(of typenp.ndarray
) and returns predicted labelspredictions
. - The method
score
takes two parametersX
andy
(both of typenp.ndarray
) and returns the accuracy of the model on the given dataX
according to ground truth labelsy
. - Methods
predict
andscore
return an error if they are called while the model has not been trained yet.
Then, create a test file named test_classifier.py
that tests the three methods of the class individually.
2 — Implement a $k$-NN classifier using the generic classifier
Implement the KNN classifier coded in session algoS6 using an instance of inheriting of the GenericClassifier.
The class must be defined in a file named knn_classifier.py
and be named KNNClassifier.
The parameter k
is specific to an instance of the class.
- The class must have a constructor inheriting of GenericClassifier, and that takes a single parameter
k
which is the number of neighbors to consider. - The method
fit
takes two parametersX
andy
(both of typenp.ndarray
) and trains the model using the algorithm described in session algo 6. - The method
predict
takes a single parameterX
(of typenp.ndarray
) and returns the predicted labels. - The method
fit
modifies a private boolean attributeself._isfitted
of the class to indicate that the model has been trained. - The method
predict
returns an error if it is called while the model has not been trained yet. - You don’t have to implement the method
score
again as it’s definition inherits from the GenericClassifier.
Then, create a file named main.py
that uses the class to train a model on the digits dataset and print the accuracy of the model on the test set.
Here is a snippet of code to load the digits dataset :
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Load the digits dataset
digits = load_digits()
# Split the data into a training and test set
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, random_state=0)
Ideally, you also create a test file named test_knn_classifier.py
that tests the three methods of the class individually.
3 — Implement model ensembling
Now that you have implemented the classifier, you can use them to create an ensemble model.
Code a class ModelEnsemble that inherits from GenericClassifier, takes as argument a list of trained classifiers, and does a majority vote of all classifiers.
The class must be defined in a file named model_ensemble.py
and be named ModelEnsemble.
- The class takes as argument a list of trained classifiers
- At the initialisation of the class, raises an error if the provided list of classifiers is empty.
- The class must implement a
fit
method that checks that all the classifiers from the list have been trained before. - The class must have a method
predict
that takes a single parameterX
(of typenp.ndarray
) and returns the predicted labels. This method returns an error if the classifiers from the list have not been trained before.
Then, create a file named main.py
that uses the class to train an ensemble model on the digits dataset and print the accuracy of the model on the test set. As an ensemble model, you can use a list of KNNs with different values of K.
4 — Cross-validated model ensembling
Analyze the performances of the ensemble model from the previous section. Accuracy should be very high, and the ensemble model is not significantly better than the individual models.
In order to better see the interest of ensemble models, and also to show a more realistic use case of model ensembling, we will simulate a situation in which each classifier sees a different split of the training data:
- Keep the test set
X_test,y_test
the same as in the previous exercise - Split the previous train set
X_train,y_train
intoP
different subsets - Train
P
K-NN algorithms independently on each subset, with a the same value of K. As the splits are different, each K-NN will have a slightly different performance on the test set. - Evaluate the performance on an ensemble model taking all the
P
K-NN.
You can experiment with this framework, by using a different number of splits P
, and different values of K. We suggest to start with P=5
and K=1
, but feel free to experiment.
5 — Optimize your solutions
What you can do now is to use AI tools such as GitHub Copilot or ChatGPT, either to generate the solution, or to improve the first solution you came up with! Try to do this for all exercises above, to see the differences with your solutions.
To go further
6 — Implement cross-validated grid search
Code a class CVGridSearch that takes a dataset (data and labels), a range of an hyperparameter (e.g. a range of integer values of K) divides it in a training and validation dataset, and uses the validation dataset to optimize for the best hyperparameter. You can check the validity on your KNN implementation.
To go beyond
7 — Multilayer perceptron
The multilayer perceptron (MLP) is a basic building block very commonly used in machine learning solutions based on Deep Learning. This page uses a class definition to build a MLP from scratch using numpy.
8 — Automatic differentiation
Another interesting (but complex) application of OO programming is given by pytorch, with the notion of Automatic differentation. In summary, pytorch performs operations on tensors just like numpy does, but pytorch also automatically tracks the history and dependencies between all computations and their gradient. This enables a very straightforward implementation of Deep Learning Architectures. More information can be found on the official pytorch documentation.
You can also find a full pytorch tutorial here