Self-assessment quiz

Presentation & objectives

The following quizzes are here to help you check that you understood the articles you had to study. At the end of a quiz, you will be given explanations on your answers. If some of them are wrong, you will have the possibility to click on the question you failed to try again.

These quizzes are provided for self-assessment and will not be graded or stored.

Don’t hesitate to reach out on the Discord server for any precision/explanation!

Quizzes

OOP and ML using sklearn

# What does the `.fit()` method do in scikit-learn? - [x] Trains the model on input data > ✅ The `.fit()` method trains the model by learning patterns from the provided data. - [ ] Predicts the labels for input data > ❌ Prediction is handled by the `.predict()` method, not `.fit()`. - [ ] Assigns clusters to data points > ❌ While clustering models use `.fit()`, assigning clusters is part of `.predict()` or `.transform()`. - [x] Initializes the parameters of the model > ✅ The `.fit()` method adjusts the model's parameters based on the training data. - [ ] Visualizes the data points in 2D > ❌ Visualization is not a purpose of the `.fit()` method. # What distinguishes a predictor class from a transformer class in scikit-learn? - [x] Predictor classes are used for supervised learning, while transformer classes are used for unsupervised learning > ✅ Predictor classes handle supervised tasks, while transformer classes manage tasks like clustering and dimensionality reduction. - [ ] Predictor classes require labeled data for training, while transformer classes do not > ✅ Predictor classes rely on labeled data (`X, y`), while transformer classes use only input data (`X`). - [ ] Transformer classes cannot perform predictions > ❌ Transformer classes can perform predictions, such as assigning cluster labels, in some cases. - [x] Predictor classes implement `.predict()`, while transformer classes implement `.transform()` > ✅ Predictor classes generate predictions, and transformer classes transform data into a new space. - [ ] Transformer classes are only used for visualization > ❌ Transformer classes are used for a range of tasks, not just visualization. # Why is object-oriented programming (OOP) beneficial in scikit-learn? - [x] It enables modular and reusable components > ✅ OOP allows models to be encapsulated as classes, making them modular and reusable in different contexts. - [ ] It eliminates the need for training data > ❌ Training data is still required for model training, regardless of OOP. - [x] It standardizes interfaces for models > ✅ Using OOP, scikit-learn ensures that all models share common methods like `.fit()`, `.predict()`, and `.transform()`. - [ ] It simplifies data preprocessing > ❌ While OOP organizes functionality, preprocessing is handled by specific classes like `StandardScaler`. - [ ] It requires less memory during training > ❌ Memory usage depends on the model and data, not on OOP itself.

Model ensembling

# What is the main advantage of model ensembling? - [x] Improves predictive accuracy > ✅ Model ensembling combines predictions from multiple models to improve overall accuracy. - [ ] Reduces computational complexity > ❌ Ensembling often increases computational requirements as it involves multiple models. - [x] Enhances generalization to unseen data > ✅ By aggregating predictions, ensembles generalize better to unseen data compared to individual models. - [ ] Always guarantees better performance on training data > ❌ Performance improvements are more evident on test data rather than training data. - [x] Reduces the impact of overfitting for high-variance models > ✅ Techniques like bagging help reduce overfitting by averaging predictions from diverse models. # What is the core idea behind bagging? - [x] Training multiple models on different subsets of the data > ✅ Bagging involves bootstrapping to create varied training subsets, reducing variance. - [ ] Correcting errors made by previous models > ❌ This describes boosting, not bagging. - [x] Aggregating predictions to reduce variance > ✅ Bagging reduces variance by averaging or voting on predictions from individual models. - [ ] Using a meta-model to combine base models > ❌ Meta-models are used in stacking, not bagging. - [ ] Assigning weights to individual models based on accuracy > ❌ Weighting models is characteristic of weighted ensembling, not bagging. # What distinguishes boosting from bagging? - [x] Boosting focuses on reducing bias by sequentially training models > ✅ Boosting reduces bias by training models iteratively, with each correcting the errors of the previous one. - [ ] Boosting trains models on the same data subsets > ❌ Boosting assigns more weight to difficult instances rather than using identical data subsets. - [x] Boosting gives more weight to misclassified data points > ✅ Boosting emphasizes harder-to-predict instances during training. - [ ] Bagging uses weighted averages for predictions > ❌ Bagging typically uses simple averaging or voting, not weighted combinations. - [x] Boosting requires sequential training, while bagging trains models independently > ✅ Boosting trains models iteratively, while bagging trains them in parallel. # How does voting work in model ensembling? - [x] Combines predictions from multiple models to make a decision > ✅ Voting aggregates predictions from all base models to arrive at a final decision. - [ ] Uses a meta-model to integrate predictions > ❌ Meta-model integration is a feature of stacking, not voting. - [x] Can use hard voting or soft voting depending on the task > ✅ Hard voting selects the majority class, while soft voting averages probabilities. - [ ] Requires retraining the models after aggregation > ❌ Voting does not require retraining; it simply combines the predictions of pre-trained models. - [ ] Assigns weights to models based on their performance > ❌ Weighting is an aspect of weighted ensembling, not simple voting. # What makes diversity among models important in ensembling? - [x] Diverse models are less likely to make the same errors > ✅ Diversity ensures that errors from different models cancel out, improving overall performance. - [ ] Diverse models always require weighted combinations > ❌ While diversity is beneficial, it does not mandate weighted combinations. - [x] Combining diverse models improves generalization > ✅ Diverse models capture different patterns in the data, enhancing generalization to unseen inputs. - [ ] Diversity guarantees better training performance > ❌ Diversity improves generalization but does not always guarantee better training performance. - [ ] Requires identical hyperparameters for all models > ❌ Diverse models often have different architectures, hyperparameters, or data subsets.

Hyperparameters tuning

# What are hyperparameters in machine learning? - [x] Parameters that are set before training and control the model's behavior > ✅ Hyperparameters are predefined values, such as learning rate or tree depth, that govern the training process and structure of the model. - [ ] Parameters learned during the training process > ❌ Parameters learned during training are called model parameters, not hyperparameters. - [x] Examples include learning rate, number of layers, and regularization strength > ✅ These are common examples of hyperparameters that need to be optimized for model performance. - [ ] Values that change automatically during training > ❌ Hyperparameters remain fixed throughout training unless manually adjusted in iterative experiments. - [ ] Metrics used to evaluate the performance of a model > ❌ Performance metrics evaluate models, while hyperparameters control the training process. # Why is hyperparameter tuning important? - [x] To improve a model's ability to generalize to unseen data > ✅ Proper hyperparameter tuning enhances the model's performance on test data, avoiding overfitting or underfitting. - [ ] To make the model train faster > ❌ While tuning may influence training efficiency, its primary goal is to optimize model performance. - [x] To minimize the error metric on the validation dataset > ✅ The tuning process aims to find hyperparameters that achieve the lowest validation error. - [ ] To ensure the model always achieves 100% accuracy > ❌ Achieving perfect accuracy is unrealistic and often indicates overfitting. - [ ] To reduce the number of features in the dataset > ❌ Feature reduction is a preprocessing task, not related to hyperparameter tuning. # What is grid search? - [x] A systematic method that evaluates all possible combinations of hyperparameters > ✅ Grid search exhaustively explores the predefined search space for hyperparameters. - [ ] A method that selects random combinations of hyperparameters > ❌ This describes random search, not grid search. - [ ] A probabilistic approach to finding optimal hyperparameters > ❌ Grid search is deterministic, not probabilistic. - [x] Suitable for smaller, well-defined hyperparameter spaces > ✅ Grid search works best when the number of combinations is manageable. - [ ] Guaranteed to find the global optimum for any dataset > ❌ Grid search only finds the best parameters within the specified search space, which may not include the global optimum. # How does random search differ from grid search? - [x] Random search samples hyperparameter values from distributions > ✅ Random search explores the hyperparameter space by randomly sampling values, which can be more efficient for large spaces. - [ ] Random search evaluates all combinations systematically > ❌ This is a characteristic of grid search, not random search. - [x] It can cover a broader range of values with fewer evaluations > ✅ Random search often identifies good configurations faster, as it explores a diverse set of values. - [ ] Random search guarantees finding the best configuration > ❌ Random search does not guarantee finding the best combination but is computationally efficient. - [x] It is well-suited for high-dimensional hyperparameter spaces > ✅ Random search scales better to high-dimensional spaces than grid search. # What is a practical tip for hyperparameter tuning? - [x] Start with default hyperparameter values > ✅ Defaults are well-tested for many algorithms and serve as a good starting point for tuning. - [ ] Always use grid search for large hyperparameter spaces > ❌ Grid search becomes computationally expensive for large spaces; random search or Bayesian optimization may be better. - [x] Use cross-validation to avoid overfitting > ✅ Cross-validation ensures that tuned hyperparameters generalize well to unseen data. - [ ] Focus on all hyperparameters equally during tuning > ❌ It's more efficient to prioritize influential hyperparameters. - [x] Parallelize hyperparameter evaluations where possible > ✅ Parallel processing speeds up the search for optimal hyperparameter configurations.