Blog » Machine Learning Models » Binary Classification: Tips and Tricks from 10 Kaggle Competitions

Binary Classification: Tips and Tricks from 10 Kaggle Competitions

Imagine if you could get all the tips and tricks you need to tackle a binary classification problem on Kaggle or anywhere else. I have gone over 10 Kaggle competitions including:

– and pulled out that information for you.

Dive in.

Modeling

Dealing with imbalance problems

Metrics

Loss

BCE and Dice Based

Focal Loss Based

Custom Losses

Others

Cross-validation + proper evaluation

Post-processing

Ensembling

Averaging 

Averaging over multiple seeds

Geometric mean

Average different models

Stacking

Blending 

Others

Repositories and open solutions

Repos with open source solutions

Image based solutions

Tabular based solutions 

Text classification based solutions

Final thoughts

Hopefully, this article gave you some background into binary classification tips and tricks, as well as, some tools and frameworks that you can use to start competing.

We’ve covered tips on:

  • architectures,
  • losses,
  • post-processing,
  • ensembling,
  • tools and frameworks.

If you want to go deeper, simply follow the links and see how the best binary classification models are built.


READ NEXT

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Jakub Czakon | Posted November 26, 2020

Let me share a story that I’ve heard too many times.

”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…

…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…

…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”

– unfortunate ML researcher.

And the truth is, when you develop ML models you will run a lot of experiments.

Those experiments may:

  • use different models and model hyperparameters
  • use different training or evaluation data, 
  • run different code (including this small change that you wanted to test quickly)
  • run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)

And as a result, they can produce completely different evaluation metrics. 

Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.  

This is where ML experiment tracking comes in. 

Continue reading ->
image segmentation kaggle tips and tricks

Image Segmentation: Tips and Tricks from 39 Kaggle Competitions

Read more

Interview with a Lead Data Scientist: Gabriel Preda

Read more

Data Augmentation in NLP: Best Practices From a Kaggle Master

Read more

Tabular Data Binary Classification: All Tips and Tricks from 5 Kaggle Competitions

Read more