Imagine if you could get all the tips and tricks you need to tackle a binary classification problem on Kaggle or anywhere else. I have gone over 10 Kaggle competitions including:
- Toxic Comment Classification Challenge $35,000
- TalkingData AdTracking Fraud Detection Challenge $25,000
- IEEE-CIS Fraud Detection $20,000
- Jigsaw Multilingual Toxic Comment Classification $50,000
- RSNA Intracranial Hemorrhage Detection $25,000
- SIIM-ACR Pneumothorax Segmentation $30,000
- Jigsaw Unintended Bias in Toxicity Classification $65,000
- Santander Customer Transaction Prediction $65,000
- Microsoft Malware Prediction $25,000
- Humpback Whale Identification $25,000
– and pulled out that information for you.
Dealing with imbalance problems
BCE and Dice Based
Focal Loss Based
Cross-validation + proper evaluation
- Use Adversarial validation
- Apply GroupKFold cross-validation
- Simple time-split and using about last 100k records as a validation set
- Generate predictions using unshuffled KFold
- Use stratified 5 fold without early stopping for predicting test data
- Implement LightGBM on 10 KFolds with no shuffle
- If using pseudo labeling, don’t validate on the pseudo labels to avoid overfitting
- Use the Standard 10 fold Stratified cross-validation with multiple seeds for the final blend
- Use of the history of submissions to tweak our test set predictions
- Select random 30% of CV, optimize the thresholds for the 30% and apply them to the remaining 70% and check how far off they are from the optimal thresholds of the 70%
- Use a re-scaling factor for predictions >0.8 as well as <0.01 through the use of probabilistic random noise that introduces a small penalty
- Scale up the predicted probability of comments that contain cursed words of different languages
- Label the test samples using the best-performing ensemble, adding them to the train set, and training to convergence
Averaging over multiple seeds
Average different models
- Stack Bi-LSTM, Bert-Large-Uncased with WWM, XLNET, with the meta model as ExtraTreesClassifier
- LightGBM Stacking
- Stack LightGBM with heavy bayesian optimization
- Stack models using PyStackNet and MlXtend
- An ensemble of RNN, CNN, LightGBM, and NBSV
- Use 5 time bagged XGB
- CV scores with heavy bayesian optimization
Repositories and open solutions
Repos with open source solutions
Image based solutions
- Humpback Whale Identification 1st Place Code
- Data Science Bowl 2nd Place Solution
- Forecasting Lung Cancer Diagnoses with Deep Learning
- Kaggle data science bowl 2017
- RSNA Intracranial Hemorrhage Detection 1st Place Solution
- 2nd Place Solution — RSNA Intracranial Hemorrhage Detection
- 3rd place solution RSNA Intracranial Hemorrhage Detection
- 4th Place Solution with code RSNA Intracranial Hemorrhage Detection
- 5th place solution for RSNA Intracranial Hemorrhage Detection
- RSNA Intracranial Hemorrhage Detection Entrypoint for the 5th-place-solution
- SIIM-ACR Pneumothorax Segmentation 1st Place Solution
- SIIM-ACR Pneumothorax Segmentation 3rd Place Solution
- 5th place solution SIIM-ACR Pneumothorax Segmentation
- Humpback Whale Identification 5th Place Solution
- 4th Place Solution Humpback Whale Identification
- Kaggle Humpback Whale Identification Challenge 2019 2nd place code
Tabular based solutions
- How to implement LibFM in Keras and how it was used in the Talking Data competition on Kaggle
- XGB Fraud Detection Solution
- Fraud Detection Feature Engineering
- 2nd Place Solution Santander Customer Transaction Prediction
- Santander Customer Transaction Prediction 5th Place Solution
- Solution to the Kaggle Santander Customer Transaction Prediction competition
- 2nd place Solution the Microsoft Malware Prediction Challenge on Kaggle
Text classification based solutions
- Toxic Comment Classification Challenge, 12th place solution
- Code and write-up for the Kaggle Toxic Comment Classification Challenge
- Jigsaw Unintended Bias in Toxicity Classification 4th Place Solution
- An open solution to the Toxic Comment Classification Challenge
- TalkingData AdTracking Fraud Detection Challenge 4th Place Solution
- Bronze medal Jigsaw Solution
- 2nd place solution for the 2017 national data science bowl
- Jigsaw Unintended Bias in Toxicity Classification 10th Place Solution
- Code for 3rd place solution in Kaggle Humpback Whale Identification Challenge
Hopefully, this article gave you some background into binary classification tips and tricks, as well as, some tools and frameworks that you can use to start competing.
We’ve covered tips on:
- tools and frameworks.
If you want to go deeper, simply follow the links and see how the best binary classification models are built.
F1 Score vs ROC AUC vs Accuracy vs PR AUC: Which Evaluation Metric Should You Choose?
9 mins read | Author Jakub Czakon | Updated July 13th, 2021
PR AUC and F1 Score are very robust evaluation metrics that work great for many classification problems but from my experience more commonly used metrics are Accuracy and ROC AUC. Are they better? Not really. As with the famous “AUC vs Accuracy” discussion: there are real benefits to using both. The big question is when.
There are many questions that you may have right now:
- When accuracy is a better evaluation metric than ROC AUC?
- What is the F1 score good for?
- What is PR Curve and how to actually use it?
- If my problem is highly imbalanced should I use ROC AUC or PR AUC?
As always it depends, but understanding the trade-offs between different metrics is crucial when it comes to making the correct decision.
In this blog post I will:
- Talk about some of the most common binary classification metrics like F1 score, ROC AUC, PR AUC, and Accuracy
- Compare them using an example binary classification problem
- tell you what you should consider when deciding to choose one metric over the other (F1 score vs ROC AUC).
Ok, let’s do this!Continue reading ->