Neptune Blog

Hugging Face Pre-trained Models: Find the Best One for Your Task

9 min
29th April, 2025

When tackling machine learning problems, pre-trained models can significantly speed up the process. Repurposing existing solutions saves time and computational costs for both engineers and companies. Launched in 2017 (originally as a chat interface), Hugging Face is a key provider of open-source libraries with pre-trained models, making it a valuable resource in this space.

Soon after, they released the Transformers library and other NLP resources like datasets and tokenizers, making high-quality NLP models accessible to everyone. This move quickly gained traction, especially among major tech companies.

Hugging Face specializes in Natural Language Processing (NLP) tasks, focusing on models that not only recognize words but also understand their meaning and context. Unlike humans, computers need a structured pipeline—a series of processing steps—to interpret language meaningfully. Hugging Face’s models and tools provide this structure, making it easier for companies to integrate NLP technologies that enable natural, human-like interactions.

As more companies focus on better user experiences, NLP tools like Hugging Face are becoming essential. In the following sections, we’ll explore this tool and its transformers in more depth, with hands-on examples to help you start building your own projects.

Getting started

A transformer is a deep learning model that adopts the mechanism of attention, differentially weighting the significance of each part of the input data. It is used primarily in the field of natural language processing. Wikipedia

Before diving into model selection, it’s essential to clearly define your use case and the specific goals you want to achieve. Hugging Face offers a range of transformers and models tailored to different tasks. Their platform provides an efficient model search tool with various filters to help you find exactly what you need.

On the Hugging Face website, the model page includes filters like Tasks, Libraries, Datasets, and Languages:

List of models and filters on the Hugging Face website
List of models and filters on the Hugging Face website | Source

Let’s say you are looking for models that can satisfy the following requirements:

  • Translates text from one language to another
  • Supports PyTorch

Once you have selected these filters, you will get a list of pre-trained models as shown below:

Selecting a model from the Hugging Face website
Selecting a model from the Hugging Face website | Source

You will also need to ensure that you provide the inputs in the same format as the pre-trained models were trained with. Select a model from the list, and let’s start setting up the environment for it.

Setting up your environment

Hugging Face supports over 20 libraries, including popular ones like TensorFlow, PyTorch, and FastAI. Here’s how to install the necessary libraries using pip:

1. Install PyTorch:

!pip install torch

2. Install the Transformers library:

!pip install transformers

Once installed, you can start working with the Hugging Face NLP library. There are two main ways to get started:

  • Using Pipelines: A simple, high-level approach that provides pre-configured tasks.
  • Using Pre-trained Models Directly: Load any available model and adapt it to your specific task.

Note that these models can be large, so it’s recommended to experiment in cloud environments like Google Colab or Kaggle Notebooks, where downloading and storage are more manageable.

In the following sections, we’ll explore using pipelines and working directly with pre-trained models.

Basic NLP tasks supported by Hugging Face

Hugging Face offers powerful, pre-trained models for a wide range of NLP tasks. Here’s a quick look at the key tasks it supports and why they matter:

1. Sequence classification

This involves predicting a category for a sequence of inputs among predefined classes, which is useful for applications like sentiment analysis, spam detection, and grammar correction. For example, sequence classification can determine whether a review is positive or negative.

2. Question answering

This task focuses on generating answers to contextual questions, whether open- or close-ended. A question-answering model can search through a structured database or unstructured text to provide accurate answers, much like a virtual assistant.

3. Named entity recognition (NER)

NER identifies specific entities—like people, places, or organizations—within the text, enabling applications like automated document tagging and information extraction.

4. Summarization

Summarization makes long documents shorter and easier to read. Hugging Face supports both extractive summarization, which pulls key sentences, and abstractive summarization, which rephrases text to capture the essence of the original content.

5. Translation

Translation tasks involve converting text from one language to another. Unlike simple word substitution, effective translation requires a deep understanding of syntax, idioms, and linguistic context to produce human-like translations.

6. Language modeling

Language modeling predicts likely word sequences and completes sentences in meaningful ways. Hugging Face supports masked language modeling, where certain words are hidden for the model to predict, and causal language modeling, where the model predicts future words based on past context.

Beyond these, Hugging Face supports tasks such as speech recognition, computer vision, and transcription generation, expanding its utility to audio and visual data. 

Hugging Face Transformers and how to use them

The Transformer in NLP is a novel architecture that aims to solve sequence-to-sequence tasks while handling long-range dependencies with ease. KDnuggets

Transformers, introduced in 2017, revolutionized NLP by enabling models to handle long-range dependencies in text. This architecture consists of an encoder-decoder structure (that will be explained later on), and it facilitates sequence-to-sequence tasks, making it ideal for translation, summarization, and text generation.

The evolution of the Transformer architecture, 2018-2021
The evolution of the Transformer architecture, 2018-2021 | Source

Transformers are language models trained on vast amounts of text through self-supervised or transfer learning. In self-supervised learning, models learn by predicting parts of the data based on other parts, allowing them to train effectively without labeled data. This approach enables transformers to develop a deep understanding of language structure and context, making them powerful tools for various NLP tasks.

Transformer architecture

The transformer language model uses an encoder-decoder architecture, where each component can function both together and independently:

  • Encoder: The encoder takes in the input sequence, processes it iteratively, and identifies the relationships among different parts of the input, building a rich internal representation.
  • Decoder: The decoder then generates an output sequence using the encoder’s representation, drawing on contextual information to produce meaningful and coherent output. 
The transformer model architecture
The transformer model architecture | Source

A critical component of transformer model architecture is the attention layer. This layer enables the model to focus on specific words or details within the input, improving its ability to understand context. It works by mapping keys and the associated key-value pairs to an output, where each element (query, key, value, and output) is represented as a vector. This mapping allows the model to decide which parts of the input are most relevant at each step.

For a deeper dive into the architecture, refer to the influential paper Attention Is All You Need and the Illustrated Transformer blog.

With this foundational understanding, let’s explore how Hugging Face simplifies using transformers in practice.

Introduction to transformers and pipelines

The Hugging Face Transformers library offers a variety of models for different tasks through high-level APIs. Building transformer models from scratch is complex and resource-intensive, involving fine-tuning tens of billions of parameters and extensive training. Hugging Face created this library to make working with these sophisticated models easier, more flexible, and user-friendly by providing access through a single API. With this library, you can load, train, and save models seamlessly.

Creating an NLP solution typically involves several steps, from gathering data to fine-tuning the model for optimal performance. Hugging Face’s library streamlines this process by offering tools to simplify each step.

An example of a typical NLP machine learning pipeline
An example of a typical NLP machine learning pipeline | Source: Author

Using pre-defined pipelines

The Hugging Face Transformers library offers pipelines that handle all pre- and post-processing steps of the input text data. These pipelines encapsulate the overall process of each NLP solution. By connecting a model with the necessary pre- and post-processing steps, pipelines allow you to focus only on providing input texts, making it quick and easy to use pre-trained models for various tasks.

Steps encapsulated by a Hugging Face pipeline
Steps encapsulated by a Hugging Face pipeline | Source

With pipelines, you don’t need to manage each processing step individually. Simply select the relevant pipeline for your use case, and you can quickly create, for instance, a machine translator with minimal code:

from transformers import pipeline

translator = pipeline("translation_en_to_de")
text = "Hello world! Hugging Face is the best NLP tool."
translation = translator(text)

print(translation)
Sample pipeline output
Sample pipeline output | Source: Author

Pipelines offer an easy entry point into using Hugging Face, allowing you to create language models with pre-trained and fine-tuned transformers quickly. Hugging Face provides pipelines for key NLP tasks, as well as additional specialized pipelines for different applications.

Create a custom translation pipeline with Hugging Face

The default pipelines only support a few basic scenarios. What if you want to translate to a different language? Here’s how you can set up a translation pipeline for any language pair supported by Hugging Face’s model hub:

1. Import and Initialize the tokenizer

Transformer models process tokenized text, breaking sentences into numbers for the model to interpret.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-nl")
2. Import the model

Download and initialize the model for the translation task, which contains the necessary transformer layers for sequence-to-sequence learning:

model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-nl")
3. Tokenize and encode the text

Use the tokenizer to convert text into tokens, which includes:

  • Splitting the text into sub-words
  • Mapping each token to a unique integer

The output includes:

  • attention_mask: indicates whether each token is relevant (1) or ignored (0)
  • input_ids: integer IDs representing each token
text = "Hello my friends! How are you doing today?"
tokenized_text = tokenizer(text, return_tensors="pt")
print(tokenized_text)

The output:

{'input_ids': tensor([[ 147, 2105,  121, 2108,   54,  457,   56,   23,  728, 1042,   17,    0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
4. Translate and decode

Feed the tokenized text to the model, then decode the output to get the translated text.

translation = model.generate(**tokenized_text)
translated_text = tokenizer.batch_decode(translation, skip_special_tokens=True)[0]
print(translated_text)

As we can see beyond the simple pipeline which only supports English-German, English-French, and English-Romanian translations, we can create a language translation pipeline for any pre-trained Seq2Seq model within HuggingFace. Let’s see which transformer models support translation tasks.

Language transformer models

Transformers have become essential tools in NLP due to their use of attention mechanisms, which allow models to assign importance to different parts of the input data. Hugging Face offers a variety of transformer models, many based on this architecture, including popular translation and text-to-text models. 

Here, we’ll explore some widely used language translation models, particularly the multilingual BART (mBART) model, which provides robust support for translation tasks.

Overview of mBART

mBART is a sequence-to-sequence denoising auto-encoder model, adapted from BART and trained on large, monolingual text corpora in multiple languages. While BART is designed to reconstruct a corrupted document by mapping it back to its original form, mBART extends this functionality to multiple languages, making it highly effective for translation tasks.

The mBART input encoder and output decoder
The mBART input encoder and output decoder | Source

The input encoder in BART allows for different document transformations, such as token masking, deletion, infilling (filling in blanks within the text), permutation, and error correction. These transformations prepare the model to handle corrupted or incomplete input data and still generate meaningful output. This versatility enables BART to support a range of downstream NLP tasks, including sequence classification, token classification, sequence generation, and machine translation, making it a powerful tool for diverse applications in language processing.

Transformations for noising the input
Transformations for noising the input| Source

The mBART model was designed for multilingual denoising pre-training in neural machine translation. Unlike earlier models that focused only on the encoder, decoder, or partial text transformations, mBART introduced the capability to denoise entire texts across multiple languages, making it a significant advance in multilingual text generation.

mBART is a multilingual encoder-decoder (sequence-to-sequence) model specifically built for translation tasks. Being multilingual, it requires sequences to follow a specific format: a unique language ID token is added to both the source and target texts to specify the language, ensuring the model understands the translation context.

Illustration of mBART's Multilingual Denoising Pre-Training and Fine-Tuning for Machine Translation. The left panel demonstrates pre-training using corrupted input sentences across multiple languages, where the transformer model learns to reconstruct the original text. The right panel shows fine-tuning for specific machine translation tasks, including sentence-level (Sent-MT) and document-level (Doc-MT) translation, with the encoder-decoder architecture translating between English and Japanese.
Illustration of mBART’s Multilingual Denoising Pre-Training and Fine-Tuning for Machine Translation. The left panel demonstrates pre-training using corrupted input sentences across multiple languages, where the transformer model learns to reconstruct the original text. The right panel shows fine-tuning for specific machine translation tasks, including sentence-level (Sent-MT) and document-level (Doc-MT) translation, with the encoder-decoder architecture translating between English and Japanese | Source

The mBART model is trained once across all supported languages, providing a shared set of parameters that can be fine-tuned for both supervised (sentence- and document-level) and unsupervised machine translation without specific task- or language-specific modifications. Let’s see how it handles machine translation in these scenarios:

  • Sentence-level translation: mBART was evaluated on sentence-level translation to minimize representation differences between source and target sentences. It is pre-trained using bi-text (aligned bilingual sentence pairs that define translation relationships between languages) and enhanced with back translation. These techniques allow mBART to achieve significant performance gains compared to other models.
  • Document-level translation: For translating documents, mBART learns dependencies across sentences to handle entire paragraphs or documents. For training, sentences are separated by symbols, and each example ends with a language ID token. During translation, the model generates output until it encounters the language ID token, which signals the end of the document.
  • Unsupervised translation: mBART also supports unsupervised translation when bi-text is not available. In this case, the model uses back translation to generate synthetic data for training or, if a target language has related data in other language pairs, it applies language transfer to learn from these relationships and improve translation quality.

With its unique framework, mBART doesn’t require parallel data across all languages. Instead, it uses directional training data and shared representation across languages. This feature improves scalability, even for languages with limited resources or scarce domain-specific data.

The T5 model

The T5 model was introduced in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transfer. In this study, researchers tested the effectiveness of transfer learning by designing a unified framework that transforms all language tasks into a text-to-text format. Built on an encoder-decoder architecture, T5 takes text as input and generates new text as output.

Diagram of text-to-text framework with the T5 model
Diagram of text-to-text framework with the T5 model | Source

How T5 works 

T5 follows the same foundational principles as the original transformer architecture:

  • Encoder: The input text is tokenized and embedded, then processed by blocks consisting of a self-attention layer and a feed-forward network.
  • Decoder: The decoder has a similar structure but includes a standard attention layer after each self-attention layer to focus on the encoder’s output. It also uses autoregressive (causal) self-attention, which lets it consider past outputs for generating predictions.

T5 was trained on unlabeled text data using a cleaned version of Common Crawl, known as the Colossal Clean Crawled Corpus (C4). By leveraging this extensive dataset and the text-to-text transformer framework, T5 has shown broad applicability.

T5 can handle many language tasks by adding specific prefixes to the input sequence. For instance:

  • For translation: “Translate English to French”
  • For summarization: “Summarize”

The MarianMT model

The MarianMT model is built on an encoder-decoder architecture and was originally trained by Jörg Tiedemann using the Marian library. Marian is a Neural Machine Translation framework written in C++ designed for efficiency and simplicity, with features that support high-speed training

Some NLP problems solved with the Marian toolkit include:

  • Automatic Post-Editing: Using dual-attention over two encoders, Marian helps recover missing words in raw machine translation output.
  • Grammatical Error Correction (GEC): In GEC, Marian uses low-resource neural translation to correct grammatical errors. It can induce noise in the source text, specify weighted objectives, and incorporate transfer learning with pre-trained models.

With the flexibility of Marian’s framework, MarianMT was developed to streamline translation efforts. MarianMT was trained on the Open Parallel Corpus (OPUS), a vast collection of parallel texts from the web.

Hugging Face offers around 1,300 MarianMT models for different language pairs, all named following the format Helsinki-NLP/opus-mt-{src}-{target}, where src and target are language codes. Each model is approximately 298 MB, making them relatively lightweight and ideal for experimentation, fine-tuning, and integration in various applications. For new multilingual models, Marian uses three-character language codes.

Create your machine learning translator using Hugging Face

To build a translator from English to German, we’ll use pre-trained models from Hugging Face and fine-tune them on a relevant dataset. We’ll use models such as T5, MarianMT, and mBART.

1. Load the dataset

First, we’ll load an English-to-German dataset using Hugging Face’s datasets library.

from datasets import load_dataset
raw_datasets = load_dataset("wmt16", "de-en")

Here’s a sample of our dataset:

Create your own machine learning translator & fine tune them - log the data set

You can see data is already split into the test, training, and validation. The training set has a large amount of data, so model training and fine-tuning will take time.

2. Pre-process the dataset

Next, we need to tokenize the dataset so the model can process it.

from transformers import AutoTokenizer, MBart50TokenizerFast

# Define models and tokenizers for MarianMT, mBART, and T5
model_marianMT = "Helsinki-NLP/opus-mt-en-de"
tokenizer_marianMT = AutoTokenizer.from_pretrained(model_marianMT, use_fast=False)

model_mbart = "facebook/mbart-large-50-one-to-many-mmt"
tokenizer_mbart = MBart50TokenizerFast.from_pretrained(model_mbart, src_lang="en_XX", tgt_lang="de_DE")

model_t5 = "t5-small"
tokenizer_t5 = AutoTokenizer.from_pretrained(model_t5, use_fast=False)

# Define parameters for tokenization
prefix = "translate English to German:"  # Use only for T5
max_input_length = 128
max_target_length = 128
source_lang = "en"
target_lang = "de"

# Preprocessing function
def preprocess_function(examples):
    inputs = [prefix + ex[source_lang] for ex in examples["translation"]]
    targets = [ex[target_lang] for ex in examples["translation"]]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)
    
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(targets, max_length=max_target_length, truncation=True)
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_datasets = raw_datasets.map(preprocess_function, batched=True)

3. Create a data subset

To speed up training, let’s create smaller subsets for training and validation.

small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

4. Fine-tune the model

Now, we’ll load the models and configure the training.

from transformers import AutoModelForSeq2SeqLM, MBartForConditionalGeneration, Seq2SeqTrainingArguments, Seq2SeqTrainer, DataCollatorForSeq2Seq

# Load models
model_marianMT = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-de")
model_mbart = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-one-to-many-mmt")
model_t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-small")

# Set training parameters
# Adjust the 'per_device_train_batch_size' parameter based on your available GPU memory
training_args = Seq2SeqTrainingArguments(
   "translator-finetuned-en-de",
   evaluation_strategy="epoch",
   learning_rate=2e-5,
   per_device_train_batch_size=8,
   per_device_eval_batch_size=8,
   weight_decay=0.01,
   save_total_limit=3,
   num_train_epochs=1,
   predict_with_generate=True,
)

# Data collator for padding inputs and labels
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

# Metrics function
import numpy as np
import evaluate

metric = evaluate.load("sacrebleu")
meteor = evaluate.load("meteor")

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    preds = [pred.strip() for pred in tokenizer.batch_decode(preds, skip_special_tokens=True)]
    labels = [[label.strip()] for label in tokenizer.batch_decode(labels, skip_special_tokens=True)]
    result = metric.compute(predictions=preds, references=labels)
    meteor_result = meteor.compute(predictions=preds, references=labels)
    return {"bleu": result["score"], "meteor": meteor_result["meteor"]}
    
# Initialize Trainer
trainer = Seq2SeqTrainer(
   model=model,
   args=training_args,
   train_dataset=small_train_dataset,
   eval_dataset=small_eval_dataset,
   data_collator=data_collator,
   tokenizer=tokenizer,
   compute_metrics=compute_metrics
)

# Train and save the model
trainer.train()
trainer.save_model("fine-tuned-translator")

This approach allows you to create and fine-tune a translator model for English-to-German translation. You can also upload the model to the Hugging Face hub for sharing.

Evaluating and choosing the best translation model

After training and fine-tuning our models, we need to evaluate their performance. We’ll compare the translations generated by each fine-tuned model against the pre-trained versions and Google Translate.

1. Pre-trained model vs. fine-tuned vs. Google Translator

In the previous section, we saved our fine-tuned model in a local directory. We’ll test those models and compare the translated text with pre-trained and Google translation. 

Here’s how you can load your fine-tuned models from local storage and generate translations:

MarianMT model

from transformers import MarianMTModel, MarianTokenizer

model_name = 'opus-mt-en-de-finetuned-en-to-de'
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

src_text = ["USA Today is an American daily middle-market newspaper that is the flagship publication of its owner, Gannett. Founded by Al Neuharth on September 15, 1982."]
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
[tokenizer.decode(t, skip_special_tokens=True) for t in translated]

mBART50 model

from transformers import MBart50TokenizerFast, MBartForConditionalGeneration

model_name = 'mbart-large-50-one-to-many-mmt-finetuned-en-to-de'
tokenizer = MBart50TokenizerFast.from_pretrained(model_name, src_lang="en_XX")
model = MBartForConditionalGeneration.from_pretrained(model_name)

src_text = ["USA Today is an American daily middle-market newspaper that is the flagship publication of its owner, Gannett. Founded by Al Neuharth on September 15, 1982."]
model_inputs = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**model_inputs, forced_bos_token_id=tokenizer.lang_code_to_id["de_DE"])
translation = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
translation

T5 model

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = 't5-small-finetuned-en-to-de'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

src_text = ["USA Today is an American daily middle-market newspaper that is the flagship publication of its owner, Gannett. Founded by Al Neuharth on September 15, 1982."]
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
[tokenizer.decode(t, skip_special_tokens=True) for t in translated]

Let’s compare the translated text for the MarianMT, mBART, and T5 models:  

USA Today is an American daily middle-market newspaper that is the flagship publication of its owner, Gannett. Founded by Al Neuharth on September 15, 1982. Input Text
USA Today ist eine amerikanische Tageszeitung im mittleren Markt, die das Flaggschiff ihrer EigentĂĽmerin Gannett ist. GegrĂĽndet von Al Neuharth am 15. September 1982. Fine-Tuned MarianMT
USA Today ist eine amerikanische Tageszeitung fĂĽr den mittleren Markt, die die Flaggschiffpublikation ihres Besitzers Gannett ist. GegrĂĽndet von Al Neuharth am 15. September 1982. Pre-trained mBART
USA Today ist eine amerikanische Tageszeitung fĂĽr den Mittelstand, die das Flaggschiff ihres EigentĂĽmers Gannett ist. GegrĂĽndet von Al Neuharth am 15. September 1982. Google Translate

The fine-tuned MarianMT model produced more accurate translations than its pre-trained version and was close to the quality of Google Translate. However, some minor grammatical errors persisted across all models.

The pre-trained mBART model captured additional word nuances compared to MarianMT and Google Translate, but the translation quality was similar overall. However, mBART’s fine-tuning was computationally intensive and yielded results nearly identical to the pre-trained version.

The T5 model performed the worst, often failing to translate the full paragraph. Both its pre-trained and fine-tuned versions struggled with accuracy, making it less suitable for this translation task than the other models.

2. Evaluation metrics: comparing the models

We’ll use the following evaluation metrics to assess the quality and accuracy of our translations:

BLEU (bilingual evaluation understudy)

BLEU measures how closely machine translations match human translations. It compares machine-generated text to professional translations, with higher scores indicating closer alignment with human quality. This metric is widely used and highly correlates with human judgment at a low cost.

Meteor

METEOR is another automatic metric for machine translation. It evaluates translations by comparing individual unigrams (words) in the machine output with reference human translations, offering a more nuanced match score.

Our compute_metrics() function includes these metrics, so you’ll see the results as the code runs. However, tracking them over time in a clearer, user-friendly format can improve readability. I recommend using neptune.ai, which automatically logs and displays these metrics for easier analysis.

3. Track your model data: parameters, training loss, CPU usage, metrics, and more

Neptune offers a powerful user interface for tracking model training metrics, CPU and RAM usage, and performance metrics, simplifying model management.

Disclaimer

Please note that this article references a deprecated version of Neptune.

For information on the latest version with improved features and functionality, please visit our website.

Set up your Neptune account

  1. Create an account, 
  2. Retrieve your API key, and 
  3. Start a new Neptune project.

Add tracking code in your notebook

Add the below code in your notebook to create an experiment on the Neptune platform:

!pip install neptune
import neptune

run = neptune.init_run(
    project="YOUR_WORKSPACE/YOUR_PROJECT",
    api_token="YOUR_NEPTUNE_API_TOKEN",
)

Log evaluation metrics

After training, log metrics to Neptune using run[“metric_name”].append(value):

evaluate_results = trainer.evaluate()
run["epoch"].append(evaluate_results["epoch"])
run["bleu"].append(evaluate_results["bleu"])
run["meteor"].append(evaluate_results["meteor"])

View metrics in the Neptune UI

Let’s see what it looks like in the Neptune UI:

See in the app
Standard view of the project and experiments in the Neptune UI

In the UI, you can see BLEU and METEOR scores for all the pre-trained models. These metrics suggest that even after fine-tuning, T5 could not predict accurately. 

💡Neptune has released a guide about Neptune’s integration with HuggingFace Transformers, so you can now use it to start tracking even quicker.

Additional Neptune tracking features

Neptune’s HuggingFace integration makes tracking setup faster with report_to=”neptune” in Seq2SeqTrainingArguments. You can also view detailed CPU and RAM usage, model logs, and metadata for each experiment, as shown below:

See in the app
Model metadata, logs, and monitoring in the Neptune UI

Compare models side-by-side

Neptune’s interface allows easy side-by-side comparisons, offering insights beyond BLEU and METEOR scores:

See in the app
Side-by-side model comparison from the Neptune UI

Based on this analysis, MarianMT and mBART models (pre-trained and fine-tuned) outperform T5, with mBART showing slightly better performance, likely due to recognizing more words in the input.

Final thoughts 

In this article, we explored how Hugging Face simplifies the integration of NLP tasks by providing intuitive APIs and pre-trained models for different use cases. With this tool, we can ease the creation of specialized pipelines using pre-trained models for custom needs.

Focusing on language translation, we examined two popular models—MarianMT and mBART. We also walked through training and fine-tuning these models on new data to enhance translation quality. While mBART showed slightly higher resource usage than MarianMT and T5, its results were comparable. 

Hugging Face offers extensive tutorials to support the learning and fine-tuning of these models, and its models’ hub offers a variety of additional multilingual transformers for NLP translation tasks, including XLM, BERT, and T5. These models continue to advance language translation towards human-like accuracy.

Was the article useful?

    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.