Success in any field can be distilled into a set of small rules and fundamentals that produce great results when coupled together.
Machine learning and image classification is no different, and engineers can showcase best practices by taking part in competitions like Kaggle.
In this article, I’m going to give you a lot of resources to learn from, focusing on the best Kaggle kernels from 13 Kaggle competitions – with the most prominent competitions being:
We’ll go through three main areas of tweaking a Deep learning solution:
- Loss function
…and there will be a lot of example projects (and references) for you to check out along the way.
Image Pre-processing + EDA
Every Machine Learning/Deep Learning Solution starts with raw data. There are 2 essential steps in the data processing pipeline.
The first step is Exploratory Data Analysis (EDA). It helps us analyse the entire dataset and summarise its main characteristics, like class distribution, size distribution, and so on. Visual methods are often used to display the results of this analysis.
The second step is Image Pre-Processing, where the aim is to take the raw image and improve image data (also known as image features) by suppressing unwanted distortions, resizing and/or enhancing important features, making the data more suited to the model and improving performance.
You can dig into these Kaggle notebooks to check out a few examples of Image Pre-Processing and EDA techniques:
Data augmentation can expand our dataset by generating more training data from existing training samples. New samples are generated via a number of random transformations that not only yield believable-looking images but also reflect real-life scenarios—more on this later.
This technique is widely used, and not just in cases with too few data samples to train the model. In this case, the model starts to memorize the training set, but it is unable to generalize (performs poorly on never seen data).
Usually, when a model performs great on training data but poorly on validation data, we call this condition overfitting. To solve this problem, we usually try to get new data, and if new data isn’t available, data augmentation comes to the rescue.
Note: A general rule of thumb is to always use data augmentation techniques because it helps expose our model to more variations and generalize better. Even if we have a large dataset, although it comes at the cost of slow training speed because augmentations are done on-the-fly (which means during training).
Plus, for each task or dataset, we have to use augmentation techniques that reflect possible real-life scenarios (i.e. if we have a cat/dog detector we can use horizontal flip, crop, brightness and contrast because these augmentations match differences in how photos are taken).
Here are a few Kaggle competition notebooks for you to check out popular data augmentation techniques in practice:
Develop a baseline (example project)
Here we create a basic model using a very simple architecture, without any regularization or dropout layers, and see if we can beat the baseline score of 50% accuracy. Although we can’t always get there, if we can’t beat the baseline after trying multiple reasonable architectures, maybe the input data doesn’t hold the information required for our model to make a prediction.
In the wise and paraphrased words of Jeremy Howard:
“You should be able to quickly test if you are going into a promising direction, in 15 minutes using 50% or less of the dataset, if not you have to rethink everything.”
Develop a model large enough that it overfits (example project)
Once our baseline model has enough capacity to beat the baseline score, we can increase the baseline model capacity until it overfits the dataset, then we move to applying regularization. We can increase module capacity by:
- Adding more layers
- Using a better architecture
- Better training procedures
According to literature, the architecture refinements below improve model capacity, but barely change the computational complexity. They’re still pretty interesting if you want to dig into the linked examples:
Most of the time, model capacity and accuracy are positively correlated to each other – as the capacity increases, the accuracy increases too, and vice-versa.
Here are some training procedures you can use to tweak your model, with example projects to see how they work:
Unlike parameters, hyperparameters are specified by you when you configure the model (i.e. learning rate, number of epochs, number of hidden units, batch size, etc).
Instead of trying different model configurations manually, you can automate this process by using hyperparameter tuning libraries like Scikit learn Grid Search, Keras Tuner, and others that will try all hyperparameter combinations within the range you specify, and it will return the best performing model.
The more hyperparameters you need to tune, the slower the process, so it’s good to select a minimum subset of model hyperparameters to tune.
Not all model hyperparameters are equally important. Some hyperparameters have an outsized effect on the behaviour, and in turn the performance, of a machine learning algorithm. You should carefully pick the ones that impact your model’s performance the most, and tune them for maximum performance.
This method forces the model to learn a meaningful and generalisable representation of the data by penalising memorization/overfitting and underfitting, making the model more robust at dealing with data it has never seen before.
One simple method to solve the problems stated above is to get more training data, because a model trained on more data will naturally generalize better.
Also known as cost function or objective function, the loss function is used to find the difference between the models output from the target output, and to help the model minimize the distance between them.
Here are some of the most popular loss functions, with project examples where you’ll find tricks to improve your model capacity:
Evaluation + error analysis
Here, we do an ablation study, and analyse our experiment results. We identify our model’s weaknesses and strengths, and identify areas to improve in the future. You can use the below techniques at this stage, and see how they’re implemented in the linked examples:
There are many experiment tracking and management tools that take the minimal setup to save all the data for you automatically, which makes the ablation study easier – Neptune.ai does a great job here.
There are many ways to tweak your models, and new ideas come out all the time. Deep Learning is a fast moving field and there are no silver bullet methods. We have to experiment a lot, and enough trial and error causes breakthroughs. This article already contains a lot of links, but for the most knowledge-hungry readers, I also added a long reference section below for you to read more and run some notebooks.
- Wide Residual Networks
- mixup: BEYOND EMPIRICAL RISK MINIMIZATION
- ArcFace: Additive Angular Margin Loss for Deep Face Recognition
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
- Searching for Activation Functions
- Residual Attention Network for Image Classification
- Mixed Precision Training
- Self-training with Noisy Student improves ImageNet classification
- When Does Label Smoothing Help?
- Additive Angular Margin Loss for Deep Face Recognition
- Grad-CAM: Why did you say that? Visual Explanations from Deep Networks…
- A Comparative Study of Deep Learning Loss Functions for Multi-Label Remote Sensing Image Classification
- Wide Residual Nets: “Why deeper isn’t always better…”
- Tune Hyperparameters for Classification Machine Learning Algorithms
- Image Pre-Processing
- Understanding Categorical Cross-Entropy Loss, Binary Cross-Entropy Loss…
- Noisy student
- Overfitting and Underfitting With Machine Learning Algorithms
- Developing AI projects under pressure
- Understanding Neural Networks
- Kaggle competitions
- Intel Image Classification
- Recursion Cellular Image Classification
- SIIM-ISIC Melanoma Classification
- APTOS 2019 Blindness Detection
- Diabetic Retinopathy Detection
- ML Project — Image Classification
- Cdiscount’s Image Classification Challenge
- Plant seedlings classifications
- Aesthetic Visual Analysis
- Data Science Bowl 2017
- Plant Pathology 2020 – FGVC7
- Lyft Motion Prediction for Autonomous Vehicles
- Humpback Whale Identification
- Distributed Training
- 3D Image classification
- Ultimate Image Classification Guide 2020 🔥
- Protein Atlas – Exploration and Baseline
- Intel Image Classification (CNN – Keras)
- APTOS : Eye Preprocessing in Diabetic Retinopathy
- Lyft Level5: EDA + Training + Inference
- [BEG][TUT]Intel Image Classification[93.76% Accur]
- pretrained ResNet34 with RGBY (0.460 public LB)
- Fold1h4r3 ArcENetB4/2 256px RCIC
- Quick Visualization + EDA
- Analysis of Melanoma Metadata and EffNet Ensemble
- Triple Stratified KFold with TFRecords
- Melanoma. Pytorch starter. EfficientNet
- InceptionV3 for Retinopathy (GPU-HR)
- Fastai tutorial for image classification
- Google image classification v4
- Chest X-Ray Image Classification – TF Hub ResNet50
ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It
Jakub Czakon | Posted November 26, 2020
Let me share a story that I’ve heard too many times.
”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…
…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…
…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”
– unfortunate ML researcher.
And the truth is, when you develop ML models you will run a lot of experiments.
Those experiments may:
- use different models and model hyperparameters
- use different training or evaluation data,
- run different code (including this small change that you wanted to test quickly)
- run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)
And as a result, they can produce completely different evaluation metrics.
Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.
This is where ML experiment tracking comes in.Continue reading ->