Neptune Blog

When to Choose CatBoost Over XGBoost or LightGBM [Practical Guide]

8 min
8th May, 2025

Boosting algorithms have become one of the most powerful algorithms for training on structural (tabular) data. The three most famous boosting algorithm implementations that have provided various recipes for winning ML competitions are:

  • 1 CatBoost 
  • 2 XGBoost
  • 3 LightGBM

In this article, we will primarily focus on CatBoost, how it fares against other algorithms, and when you should choose it over others.

Overview of gradient boosting

To understand boosting, we must first understand ensemble learning, a set of techniques that combine the predictions from multiple models (weak learners) to get better predictive performance. Its strategy is strength in unity, as efficient combinations of weak learners can generate more accurate and robust models. The three main classes of ensemble learning methods are:

  • Bagging: This technique builds different models in parallel using random subsets of data and deterministically aggregates the predictions of all predictors.
  • Boosting: This technique is iterative, sequential, and adaptive as each predictor fixes its predecessor’s error.
  • Stacking: It is a meta-learning technique that involves combining predictions from multiple machine learning algorithms, like bagging and boosting.

In 1988, Micheal Kearns, in his paper Thoughts on Hypothesis Boosting, presented the idea of whether a relatively poor hypothesis can be converted to a good hypothesis. Or, in other words, can a weak learner can be modified to become better? Since then, there have been multiple successful applications of this technique to develop some powerful boosting algorithms. 

The most popular boosting algorithms: Catboost, XGBoost, LightGBM
The most popular boosting algorithms: Catboost, XGBoost, LightGBM | Source: Author

The three algorithms in scope (CatBoost, XGBoost, and LightGBM) are all variants of gradient boosting algorithms. A good understanding of gradient boosting will be beneficial as we progress. Gradient boosting algorithms can be a regressor (predicting continuous target variables) or a classifier (predicting categorical target variables). 

This technique involves training learners based upon minimizing the differential loss function of a weak learner using a gradient descent optimization process, in contrast to tweaking the weights of the training instances like Adaptive Boosting (Adaboost). Hence there is an equal distribution of weights to all the learners. Gradient boosting uses decision trees connected in series as weak learners. Due to its sequential architecture, it is a stage-wise additive model, where decision trees are added one at a time, and existing decision trees are not changed.  

Gradient boosting is primarily used to reduce the bias error of the model. Based on the bias-variance tradeoff, it is a greedy algorithm that can overfit a training dataset quickly. However, this overfitting can be controlled by shrinkage, tree constraint, regularization, and stochastic gradient boosting.

Overview of CatBoost 

CatBoost is an open-source machine learning (gradient boosting) algorithm, with its name coined from “Category” and “Boosting.” It was developed by Yandex (Russian Google) in 2017. According to Yandex, CatBoost has been applied to a wide range of areas such as recommendation systems, search ranking, self-driving cars, forecasting, and virtual assistants. It is the successor of MatrixNet that was widely used within Yandex products.

CatBoost logo
CatBoost logo | Source

Key features of CatBoost 

Let’s take a look at some of the key features that make CatBoost better than its counterparts:

  1. Symmetric trees: CatBoost builds symmetric (balanced) trees, unlike XGBoost and LightGBM. In every step, leaves from the previous tree are split using the same condition. The feature-split pair that accounts for the lowest loss is selected and used for all the level’s nodes. This balanced tree architecture aids in efficient CPU implementation, decreases prediction time, makes swift model appliers, and controls overfitting as the structure serves as regularization. 
Symmetric trees
Asymmetric tree (left) vs symmetric tree (right) | Source: Author
  1. Ordered boosting: Classic boosting algorithms are prone to overfitting on small/noisy datasets due to a problem known as prediction shift. When calculating the gradient estimate of a data instance, these algorithms use the same data instances that the model was built with, thus having no chances of experiencing unseen data. CatBoost, on the other hand, uses the concept of ordered boosting, a permutation-driven approach to train a model on a subset of data while calculating residuals on another subset, thus preventing target leakage and overfitting.  
  2. Native feature support: CatBoost supports all kinds of features, be it numeric, categorical, or text and saves time and effort that would have gone into preprocessing. 

Numerical features

CatBoost handles numeric features like other tree-based algorithms by selecting the best possible split based on the information gain.

Numerical features
Decision tree splitting by numerical features | Source: Author

Categorical features

Decision trees split categorical features based on classes rather than a threshold in continuous variables. The split criterion is intuitive as the classes are divided into sub-nodes.

Categorical features
Decision tree splitting by categorical features | Source: Author

Categorical features can be more complex in high cardinality features like id. Every machine learning algorithm requires parsing of input and output variables in numerical form. 

CatBoost provides the various native strategies to handle categorical variables: 

  • One-hot encoding: By default, CatBoost represents all binary (two-category) features with one-hot encoding. This strategy can be extended to features with N number categories by changing the training parameter one_hot_max_size = N. CatBoost handles one-hot encoding by specifying the categorical features and categories to yield better, faster, and quality results.
  • Statistics based on category: CatBoost applies target encoding with random permutation to handle categorical features. This strategy can be very efficient for high cardinality columns as it creates just a new feature to account for the category encoding. The addition of random permutation to the encoding strategy is to prevent overfitting due to data leakage and feature bias.
  • Greedy search for combination: CatBoost also automatically combines categorical features, most times two or three. To keep possible combinations limited, CatBoost does not enumerate through all the combinations but rather some of the best, using statistics like category frequency. So, for each tree split, CatBoost adds all categorical features (and their combinations) already used for previous splits in the current tree with all categorical features in the dataset.

Text features

CatBoost also handles text features (containing regular text) by providing inherent text preprocessing using Bag-of-Words (BoW), Naive-Bayes, and BM-25 (for multiclass) to extract words from text data, create dictionaries (letter, words, n-grams), and transform them into numeric features. This text transformation is fast, customizable, production-ready, and can be used with other libraries too, including neural networks. Ranking techniques are applied majorly to search engines to solve search relevancy problems. Ranking can be broadly done under three objective functions: Pointwise, Pairwise, and Listwise. The difference on a high level of these three objective functions is the number of instances under consideration at the time of training your model.

CatBoost has a ranking mode – CatBoostRanking just like XGBoost ranker and LightGBM ranker, however, it provides many more powerful variations than XGBoost and LightGBM. The variations are:

  • Ranking (YetiRank, YetiRankPairwise)
  • Pairwise (PairLogit, PairLogitPairwise)
  • Ranking + Classification (QueryCrossEntropy)
  • Ranking + Regression (QueryRMSE)
  • Select top 1 candidate (QuerySoftMax)

CatBoost also provides ranking benchmarks comparing CatBoost, XGBoost and LightGBM with different ranking variations which includes:

  • CatBoost: RMSE, QueryRMSE, PairLogit, PairLogitPairwise, YetiRank, YetiRankPairwise
  • XGBoost: reg:linear, xgb-lmart-ndcg, xgb-pairwise
  • LightGBM: lgb-rmse, lgb-pairwise 

These benchmarks evaluation used four (4) top ranking datasets:

  1. Million queries dataset from TREC 2008, MQ2008, (train and test folds).
  2. Microsoft LETOR dataset (WEB-10K), MSLR (First set, train, and test folds).
  3. Yahoo LETOR dataset (C14), Yahoo (First set, set1.train.txt and set1.test.txt files).
  4. Yandex LETOR dataset, Yandex (features.txt.gz and featuresTest.txt.gz files).

The results were as follows using the mean NDCG metric for performance evaluation:

Performance of different learning models on the MQ2008 dataset
Performance of different learning models on the MQ2008 dataset | Source
Performance of different learning models on the MSLR dataset
Performance of different learning models on the MSLR dataset | Source
Performance of different learning models on the Yahoo dataset
Performance of different learning models on the Yahoo dataset | Source
Performance of different learning models on the Yandex dataset
Performance of different learning models on the Yandex dataset | Source

It can be seen that CatBoost outperforms LightGBM and XGBoost in all cases. More details of the ranking mode variations and their respective performance metrics can be found on CatBoost documentation. These techniques can be run both on CPU and GPU.  

CatBoost provides scalability by supporting multi-server distributed GPUs (enabling multiple hosts for accelerated learning) and accommodating older GPUs. It has set some CPU and GPU training speed benchmarks on large datasets like Epsilon and Higgs. Its prediction time came out to be faster than XGBoost and LightGBM, which is extremely important for low latency environments.

Benchmarking learning speed on the Epsilon dataset (400K samples, 2000 features). Parameters: 128 bins, 64 leafs, 400 iterations
Benchmarking learning speed on the Epsilon dataset (400K samples, 2000 features). Parameters: 128 bins, 64 leafs, 400 iterations | Source: Author
Benchmarking learning speed on the Higgs dataset (4M samples, 28 features). Parameters: 128 bins, 64 leafs, 400 iterations
Benchmarking learning speed on the Higgs dataset (4M samples, 28 features). Parameters: 128 bins, 64 leafs, 400 iterations | Source: Author
Prediction time on CPU and GPU respectively on the Epsilon dataset
Prediction time on CPU and GPU respectively on the Epsilon dataset | Source: Author

CatBoost provides inherent model analysis tools to help understand, diagnose and refine machine learning models with the help of efficient statistics and visualization. Some of them are:

Feature importance

CatBoost has some intelligent techniques for finding the best features for a given model:

  • PredictionValuesChange: This shows how much, on average, the prediction changes over the feature value changes. The bigger the average values of prediction changes due to features, the higher the importance. Feature importance values are normalized to avoid negation, and all features’ importances are equal to 100. It is easy to compute but can lead to misleading results for ranking problems.
Feature Importance based on PredictionValuesChange
Feature Importance based on PredictionValuesChange | Source: Author
  • LossFunctionChange: This is a heavy computing technique that gets feature importance by taking the difference between the loss function of a model, including a given feature, and the model without that feature. The higher the difference, the more the feature is important.
Feature Importance based on LossFunctionChange
Feature Importance based on LossFunctionChange | Source: Author
  • InternalFeatureImportance: This technique calculates values for each input feature and various combinations using the split values in the node on the path symmetric tree leaves. 
Pairwise feature importance scores for various feature combinations
Pairwise feature importance scores for various feature combinations | Source: Author
  • SHAP: CatBoost uses SHAP (SHapley Additive exPlanations) to break a prediction value into contributions from each feature. It calculates feature importance by measuring the impact of a feature on a single prediction value compared to the baseline prediction. This technique provides visual explanations of features that make the most impact on your model’s decision-making. SHAP can be applied in two ways: per data instance and for all the features.

Per data instance 

First prediction explanation (Waterfall plot)
First prediction explanation (Waterfall plot) | Source: Author

The above visualization shows the features pushing the model output from the base value (the average model output over the training dataset) to the model output. The red features are the ones pushing the prediction higher, while the blue features push the prediction lower. This concept can be visualized using the force plot.

First prediction explanation (Force plot)
First prediction explanation (Force plot) | Source: Author

Whole dataset

SHAP provides plotting capabilities to highlight the most important features of a model. The plot sorts features by the sum of SHAP value magnitudes over all data instances and uses SHAP values to highlight the impact distribution of each feature on the model output.

Summarized effects of all the features
Summarized effects of all the features | Source: Author

Feature analysis chart

This is another unique feature that CatBoost has integrated into its recent version. This functionality provides calculated and plotted feature-specific statistics and visualizes how CatBoost is splitting the data for each feature. More specifically, the statistics are:

  • Mean target value for each bin (bins groups continuous feature) or category (supported currently for only One-Hot Encoded features)
  • Mean prediction value for each bin
  • Number of data instances (object) in each bin
  • Predictions for various feature values
Statistics (prediction and target) for each feature
Statistics (prediction and target) for each feature | Source: Author

CatBoost parameters 

CatBoost has common training parameters with XGBoost and LightGBM but provides a much flexible interface for parameter tuning. The following table provides a quick comparison of parameters offered by the three boosting algorithms. 

Function
CatBoost
XGBoost
LightGBM

Parameters controlling overfitting

– learning_rate

– depth

– l2_reg

– learning_rate

– max_depth

– min_child_weight

– learning_rate

– max_depth

– num_leaves

– min_data _in_leaf

Parameters for handling categorical values

– cat_features

– one_hot_max_size

N/A

Categorical_feature

Parameters for controlling speed

– rsm

– iteration

– colsample_bytree

– subsample

– n_estimators

– feature_fraction

– bagging fraction

– num_iterations

Also, as evident from the following image, CatBoost’s default parameters provide an excellent baseline model, quite better than other boosting algorithms. 

Log loss values (the lower thebetter) for Classification mode.  The percentage is metric difference measured against tuned CatBoost results
Log loss values (the lower thebetter) for Classification mode.  The percentage is metric difference measured against tuned CatBoost results | Source: Author

CatBoost’s parameters control overfitting, categorical features, and speed.

Other useful features

  • Overfitting detector: CatBoost’s algorithm structure inhibits gradient boosting biases and overfitting. In addition, CatBoost has an overfitting detector that can stop training earlier than the training parameters dictate if overfitting occurs. CatBoost implements overfitting detection using two strategies:
    • Iter: Consider the overfitted model and stop training after the specified number of iterations using the iteration with the optimal metric value. This strategy uses the early_stopping_rounds parameter like other gradient boosting algorithms like LightGBM and XGBoost. 
    • IncToDec: Ignore the overfitting detector when the threshold is reached and continue learning for the specified number of iterations after the iteration with the optimal metric value. The overfitting detector is activated by setting od_type in the parameters to produce more generalized models.
  • Missing value support: CatBoost provides three inherent missing values strategies for processing missing values:
    • “Forbidden”: Missing values are interpreted as an error as they are not supported.
    • “Min”: Missing values are processed as the minimum value(less than all other values) for the feature under observation. 
    • “Max”: Missing values are processed as the maximum value(greater than all other values) for the feature under observation. CatBoost only has missing values imputation for numerical values only and the default mode in Min.
  • CatBoost viewer: In addition to the CatBoost model analysis tool, CatBoost has a standalone executable application for plotting charts with different training statistics in a browser.
  • Cross-validation: CatBoost allows to perform cross-validation on the given dataset. In cross-validation mode, the training data is split into folds of learning and evaluation. 
  • Community support: CatBoost has a vast and growing open-source community that provides a lot of tutorials on theories and applications.

CatBoost vs XGBoost and LightGBM: hands-on comparison of performance and speed

The previous sections covered some of CatBoost’s features that will serve as potent criteria in choosing CatBoost over LightGBM and XGBoost. This section will provide hands-on experience as we compare performance and speed using a flight delay prediction problem. 

Dataset and environment

The dataset contains on-time performance data of domestic flights operated by large air carriers in 2015, provided by The U.S. Department of Transportation (DOT), and can be found on Kaggle. This comparative analysis explores and models the flight delay with the available independent features using the CatBoost, LightGBM, and XGBoost. A subset (25%) of this data was used for modeling, and the respective generated models will be evaluated using the ROC AUC score. The analysis will cover default and tuned settings while measuring training, prediction, and parameter tuning times. 

For ease of comparison, we will be using neptune.ai, a metadata store for MLOps, built for projects that may involve a lot of experiments (like ours).‌ Specifically, we will be using Neptune for:

  • Experiment tracking: To log, display, organize, and compare the experiments in a single place
  • Monitoring ML runs live: Record and monitor model training, evaluation, or production runs live

So, without further ado, let’s get started!

Disclaimer

Please note that this article references a deprecated version of Neptune.

For information on the latest version with improved features and functionality, please visit our website.

First, we have to install the required libraries:

!pip install -U xgboost lightgbm catboost scikit-learn neptune pandas python-dotenv

Import the installed libraries:

# Importing machine learning algorithms
import lightgbm as lgb
import xgboost as xgb
import catboost as cb

# Importing other packages
import timeit
import pandas as pd
import neptune

# Importing packages for machine learning operations
from sklearn.model_selection import train_test_split # type: ignore
from sklearn.metrics import roc_auc_score # type: ignore

import warnings
warnings.filterwarnings('ignore')

Now, we create a function to log the project’s metadata appropriately using your credentials. You can read more about how to set up your Neptune credentials in the quickstart:

import os
from dotenv import load_dotenv

load_dotenv()

def create_neptune_run(tags=[]):
    """ Initialize a new Neptune run and connect your script to Neptune
    """
    run = neptune.init_run(
        project=os.getenv("NEPTUNE_PROJECT"),
        api_token=os.getenv("NEPTUNE_API_TOKEN"),
        tags=tags
    )
    return run

Let’s load the data (make sure you have downloaded it from Kaggle), perform some preprocessing, and split the data:

# Download the flights.csv from https://www.kaggle.com/datasets/usdot/flight-delays

# Importing the dataset
data_df = pd.read_csv("data/flights.csv")

# Displaying the first 5 rows of the dataset
data_df.head()

# Selecting features (i.e removing highly correlated features, redundant features and features with high missing values percentage)
data_df = data_df[
    [
        "MONTH",
        "DAY",
        "DAY_OF_WEEK",
        "AIRLINE",
        "DESTINATION_AIRPORT",
        "ORIGIN_AIRPORT",
        "AIR_TIME",
        "DEPARTURE_TIME",
        "DISTANCE",
        "ARRIVAL_DELAY",
        "DIVERTED",
        "CANCELLED",
        "ARRIVAL_TIME",
    ]
]


# Filling missing values with mean
data_df["DEPARTURE_TIME"] = data_df["DEPARTURE_TIME"].fillna(
    data_df["DEPARTURE_TIME"].mean()
)
data_df["AIR_TIME"] = data_df["AIR_TIME"].fillna(data_df["AIR_TIME"].mean())
data_df["ARRIVAL_DELAY"] = data_df["ARRIVAL_DELAY"].fillna(
    data_df["ARRIVAL_DELAY"].mean()
)
data_df["ARRIVAL_TIME"] = data_df["ARRIVAL_TIME"].fillna(data_df["ARRIVAL_TIME"].mean())

# Change some features to categorical data type
cat_cols = ["AIRLINE", "DESTINATION_AIRPORT", "ORIGIN_AIRPORT"]
for item in cat_cols:
    data_df[item] = data_df[item].astype("category").cat.codes + 1

# Encoding the target
data_df["ARRIVAL_DELAY"] = data_df["ARRIVAL_DELAY"].apply(lambda x: 1 if x > 15 else 0)

X = data_df.drop(columns=["ARRIVAL_DELAY"])
y = data_df["ARRIVAL_DELAY"]

# Splitting the dataset with a test size of 30%
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state = 2021, test_size = 0.3 
)

Models 

Next, let’s define the metric evaluation function and model execution function. The metric evaluation function logs the ROC AUC score.

# Metric evaluation
def log_metrics(run, y_pred_test):
    score = roc_auc_score(y_test, y_pred_test)
    run["ROC AUC score"] = score

Now on to the model execution function which accepts four main arguments: 

  • model: The respective machine learning models generated i.e. the LightGBM, XGBoost and CatBoost
  • run: The run object created for the model
  • key: The key specifies the model training setup, especially the categorical feature parameters to be implemented
  • cat_features: serves as the categorical features names (for LightGBM) or index(CatBoost)

The function calculates and logs the metadata including training time, prediction time, and ROC AUC score.

import functools

def timer(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = timeit.default_timer()
        result = func(*args, **kwargs)
        stop = timeit.default_timer()
        return result, stop - start
    return wrapper

def run_model(run, model, name, key, cat_features=None):
    """
    Function to train model, log training and prediction time,        and track performance.
    """
    @timer
    def train_model():
        if key == "LGB":
            return model.fit(X_train, y_train, categorical_feature=cat_features)
        elif key == "CAT":
            return model.fit(
                X_train,
                y_train,
                eval_set=(X_test, y_test),
                cat_features=cat_features,
                use_best_model=True,
            )
        else:
            return model.fit(X_train, y_train)

    @timer
    def predict_model():
        return model.predict(X_test)

    # Training session
    _, training_time = train_model()
    run[f"{name}/Training time"] = training_time

    # Prediction session
    y_pred_test, prediction_time = predict_model()
    run[f"{name}/Prediction time"] = prediction_time

    # Performance evaluation
    log_metrics(run, y_pred_test)

Let’s run the function with the respective models in two settings:

1. CatBoost vs XGBoost vs LightGBM: default hyperparameters

# Default LightGBM without categorical features support
run = create_neptune_run(["LightGBM_Default"])
model_lgb_def = lgb.LGBMClassifier()
run_model(
    run,
    model_lgb_def,
    "LGB",
    key="LGB",
)
run.stop()

# Default LightGBM with categorical feature support
run = create_neptune_run(["LightGBM_Categorical"])
model_lgb_cat_def = lgb.LGBMClassifier()
run_model(
    run,
    model_lgb_cat_def,
    "LGB",
    key="LGB",
    cat_features=cat_cols,
)
run.stop()

# Default XGBoost
run = create_neptune_run(["XGBoost_Default"])
model_xgb_def = xgb.XGBClassifier(tree_method='gpu_hist')
run_model(
    run,
    model_xgb_def,
    "XGB",
    key="XGB",
)
run.stop()

# Default CatBoost without categorical encoding
run = create_neptune_run(["CatBoost_Default"])
model_cat_def = cb.CatBoostClassifier(task_type='GPU', verbose=False, devices='0')
run_model(
    run,
    model_cat_def,
    "CAT",
    key="CAT",
)
run.stop()

# Default CatBoost with categorical encoding
run = create_neptune_run(["CatBoost_Categorical"])
model_cat_cat_def = cb.CatBoostClassifier(task_type='GPU', verbose=False, devices='0')
cat_features_index = [X_train.columns.get_loc(col) for col in cat_cols]
run_model(
    run,
    model_cat_cat_def,
    "CAT",
    key="CAT",
    cat_features=cat_features_index,
)
run.stop()

Comparative analysis based on the default setting of the LightGBM, XGBoost, and CatBoost algorithms can be viewed on your Neptune dashboard (experiments 1-5).

See in the app
The default setting comparative analysis in Neptune

Results: default setting

As evident from the dashboard: 

  • CatBoost had the fastest prediction time without categorical support. 
  • CatBoost also had the best score for the AUC metric (the higher the AUC score, the better the model’s performance at distinguishing between the classes) for the test data with categorical support. 
  • LightGBM had the lowest ROC-AUC Score with default settings despite having the same speed as XGBoost.

2. CatBoost vs XGBoost vs LightGBM: tuned hyperparameters

Following are the tuned hyperparameters that we will be using in this run. The selected parameters are quite similar between the three algorithms: 

  • The max_depth and depth control the tree model’s depth. 
  • The learning_rate accounts for the magnitude of modification added to the tree model and depicts how fast the model learns. 
  • The n_estimators and iterations account for the number of trees(rounds), highlighting the number boosting iterations. CatBoost’ l2_leaf_reg represents the L2 regularization coefficient to discourage learning a more complex or flexible model to prevent overfitting. 
  • While the LightGBM num_leaves parameter corresponds to the maximum number of leaves per tree and XGBoost min-child-weight represents the minimum number of instances required to be in each node. 

These parameters were tuned to control overfitting and learning speed.

 
LightBGM
XGBoost
CatBoost

Parameters used

max_depth: 7

learning_rate: 0.08

num_leaves: 100

n_estimators: 1000

max_depth: 5

min_child_weight: 6

n_estimators:  1000

learning_rate: 0.08

depth: 10

learning _rate: 0.5

l2_leaf_reg: 5

Iteration: 1000

Parameter tuning time

2919.72435

12587.39855

4353.38733

The hyperparameter tuning section can be found in the reference notebook

Now let’s run these models with the aforementioned tuned settings.

# Tuned parameters for LightGBM
params = {
    "max_depth": 7,
    "learning_rate": 0.08,
    "num_leaves": 100,
    "n_estimators": 1000,
}

# Without Categorical Features
run = create_neptune_run(["LightGBM_Tuned"])
model_lgb_tun = lgb.LGBMClassifier(
    boosting_type="gbdt", objective="binary", metric="auc", **params, verbose=-1
)
run_model(
    run,
    model_lgb_tun,
    "LGB",
    key="LGB",
)
run.stop()

# With Categorical Features
run = create_neptune_run(["LightGBM_Tuned_Categorical"])
model_lgb_cat_tun = lgb.LGBMClassifier(
    boosting_type="gbdt", objective="binary", metric="auc", **params, verbose=-1
)
run_model(
    run,
    model_lgb_cat_tun,
    "LGB",
    key="LGB",
    cat_features=cat_cols,
)
run.stop()

# Tuned parameters for XGBoost
params = {
    "max_depth": 5,
    "learning_rate": 0.8,
    "min_child_weight": 6,
    "n_estimators": 1000,
    "tree_method": "gpu_hist",  # Enable GPU for XGBoost
}

# Tuned XGBoost
run = create_neptune_run(["XGBoost_Tuned"])
model_xgb_tun = xgb.XGBClassifier(**params)
run_model(
    run,
    model_xgb_tun,
    "XGB",
    key="XGB",
)
run.stop()

# Tuned parameters for CatBoost
params = {
    "depth": 10,
    "learning_rate": 0.5,
    "iterations": 1000,
    "l2_leaf_reg": 5,
    "task_type": "GPU",  # Enable GPU for CatBoost
    "devices": "0",
}

# Tuned CatBoost with no categorical feature support
run = create_neptune_run(["CatBoost_Tuned"])
model_cat_tun = cb.CatBoostClassifier(verbose=False, **params)
run_model(
    run,
    model_cat_tun,
    "CAT",
    key="CAT",
)
run.stop()

# Tuned CatBoost with categorical feature support
run = create_neptune_run(["CatBoost_Tuned_Categorical"])
model_cat_cat_tun = cb.CatBoostClassifier(verbose=False, **params)
cat_features_index = [X_train.columns.get_loc(col) for col in cat_cols]
run_model(
    run,
    model_cat_cat_tun,
    "CAT",
    key="CAT",
    cat_features=cat_features_index,
)
run.stop()

Again, the comparative analysis based on the tuned settings can be viewed in your Neptune dashboard.

See in the app
Tuned setting comparative analysis in Neptune

Results: tuned setting

As evident from the dashboard:

  • CatBoost still retained the fastest prediction time and best performance score with categorical feature support.
  • Despite the hyperparameter tuning, the difference between the default and tuned results is not that much for LightGBM and XGBoost and it also highlights the fact that CatBoost’s default settings yield a great result. 
  • LightGBM still shows the lowest ROC AUC performance. 

Conclusion

CatBoost’s algorithmic design might be similar to the “older” generation of GBDT models, however, it has some key attributes such as: 

  • ranking objective function,
  • native categorical features preprocessing,
  • model analysis,
  • fastest prediction time 

CatBoost also provides significant performance potential as it performs remarkably well with default parameters, significantly improving performance when tuned.  This article aimed to help you in making a decision about when to choose CatBoost over LightGBM or XGBoost by talking about these crucial features and the advantages they offer. I hope now you have a good idea about this and the next time you are faced with such a choice, you will be able to make an informed decision. 

Was the article useful?

    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.