It’s the last day of the year, so we decided to go with the end-of-the-year flow and create our own blog summary! It was a very active time on Neptune’s blog – tons of articles written by amazing authors, thousands of blog visitors, and many important topics covered.
We always say that visualization is the key to data understanding so, here are a few highlights of the year 2020. 😉
But, that’s not all. I checked which articles were most visited and read, and prepared a list of the top posts in various categories.
This post was only published in November but already managed to gain a huge audience. I guess that’s because the topic is not covered enough in the industry literature. Experiment tracking is a part of MLOps which focuses on the iterative model development phase when you try many things to get your model performance to the level you need. Here, Jakub Czakon explains its purpose, best practices, and implementation process.
ML projects are messy. You may start clean but, for some reason, things get in the way. This is why in this article, Dhruvil Karani shared some key pointers, guidelines, tips and tricks that can help you stay on top of things and keep your NLP projects (mostly) in check. And it looks like it was very much needed!
If you’re curious about how Reinforcement Learning can be used in real-life (and we noticed a lot of you is interested in it), here’s an article for you. In this post, you’ll find 10 examples from areas like engineering, news recommendation, gaming, robotics, and more.
Derrick Mwiti looks at a major problem with using the random forest for regression which is extrapolation. He compares random forest regression vs linear regression, explains the random forest regression extrapolation problem, and presents potential solutions.
A few months ago, a Natural Language model (GPT-3) wrote an article for Guardian. Understandably, this caused a flurry of apocalyptic terminator-esque social media buzz. In this article, Cathal Horan, wonders what are the limitations of such models.
Tips and Tricks from Kaggle Competitions
We have a few posts on the blog where we gathered tips & tricks from Kaggle competitions. There’s a huge audience visiting them every month and they seem to be very helpful for Kagglers.
- Image Classification
- Tabular Data Binary Classification
- Binary Classification
- Text Classification
- Image Segmentation
Google Colab comes with (almost) all the setup you need to start coding, but what it doesn’t have out of the box is your datasets. In this post, Siddhant Sadangi explains how to load data to Colab from a multitude of data sources and how to write back to those data sources from within Colab. It seems that it was a useful tutorial for many of you.
In this article, Jakub Cieślik explores his go-to algorithm for most tabular data problems – LightGBM. After reading it, you’ll know which parameters are important in general, which regularization parameters need to be tuned, how to tune LightGBM parameters in Python, and more.
In the past, creating a custom object detector looked like a time-consuming and challenging task. Now, with tools like TensorFlow Object Detection API, we can create reliable models quickly and with ease. TensorFlow Object Detection API got a lot of attention this year, and this tutorial was definitely one of the favorites.
This in-depth TensorBoard tutorial covers visualizing images in TensorBoard, visualizing the model’s architecture, sending custom diagnostic charts to TensorBoard, using it with Keras, PyTorch, and XGBoost, and more. We heard it’s one of the best TensorBoard tutorials online! 😉
How to Make Sense of the Reinforcement Learning Agents? What and Why I Log During Training and Debug
Whether you’re just starting out in Reinforcement Learning or you already have some experience under your belt, this article will help with what to keep track of to inspect/debug your agent learning trajectory.
This is a complete guide on Keras loss functions. You’ll get to know available functions, how to use them, how you can define your own custom loss function in Keras, how to avoid nans in the loss, how you can monitor the loss function via plotting and callbacks, and more.
Looks like people are really interested in loss functions! In this article, Alfrick Opidi talks about popular loss functions in PyTorch, and about building custom loss functions. You’ll find out what loss functions are, how to add PyTorch loss functions, which loss functions are available in PyTorch, and how to create a custom loss function in PyTorch.
Choosing the right hyperparameters for Machine Learning or Deep Learning models is a common practice to extract the last juice out of your models. Read about how to do it well in this guide prepared by Shahul Es.
In this article, we explain why data scientists and machine learning engineers need a tool for tracking machine learning experiments and what is the best software for that. We received super positive feedback especially about the comparison table that’s in the article – it’s a breakdown of all the important features and integrations of 15 different experiment tracking tools.
MLOps has been a hot topic this year, so it’s not a surprise that people are looking for the best tools for that. Here, we recommend the best MLOps tools divided into 6 categories: data and pipeline versioning, run orchestration, experiment tracking and organization, hyperparameter tuning, model serving, and production model monitoring.
Jakub Czakon asked the authors of 6 high-level training APIs in the PyTorch Ecosystem to explain the differences between them. lots of first-hand information about really great tools.
To wrap it up
It was a great year on our blog, and we’re very happy that so many people found something interesting here! Thanks for visiting our place on the Internet, and stay tuned for more in the New Year. We are not slowing down!
ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It
Jakub Czakon | Posted November 26, 2020
Let me share a story that I’ve heard too many times.
”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…
…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…
…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”
– unfortunate ML researcher.
And the truth is, when you develop ML models you will run a lot of experiments.
Those experiments may:
- use different models and model hyperparameters
- use different training or evaluation data,
- run different code (including this small change that you wanted to test quickly)
- run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)
And as a result, they can produce completely different evaluation metrics.
Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.
This is where ML experiment tracking comes in.Continue reading ->