MLOps Blog

Sentiment Analysis in Python: TextBlob vs Vader Sentiment vs Flair vs Building It From Scratch

Shahul ES

4 min

30th August, 2023

ML Model Development Natural Language Processing

Sentiment analysis is one of the most widely known Natural Language Processing (NLP) tasks. This article aims to give the reader a very clear understanding of sentiment analysis and different methods through which it is implemented in NLP. So let’s dive in.

The field of NLP has evolved very much in the last five years, open-source packages like Spacy, TextBlob, etc. provide ready to use functionalities for NLP like sentiment analysis. There are so many of these packages available for free to make you confused about which one to use for your application.

In this article, I will discuss the most popular NLP Sentiment analysis packages:

Textblob
VADER
Flair
Custom model

At the end, I will also compare the performance of each of them in a common dataset.

What is sentiment analysis?

Sentiment analysis is the task of determining the emotional value of a given expression in natural language.

It is essentially a multiclass text classification text where the given input text is classified into positive, neutral, or negative sentiment. The number of classes can vary according to the nature of the training dataset.

For example, sometimes it is formulated as a binary classification problem with 1 as positive sentiment and 0 as negative sentiment label.

Application of sentiment analysis

Sentiment analysis has applications in a wide variety of domains including analyzing user reviews, tweet sentiment, etc. Let’s go through some of them here:

Movie reviews: Analysing online movie reviews to get insights from the audience about the movie.
News sentiment analysis: analyzing news sentiments for a particular organization to get insights.
Social media sentiment analysis: analyze the sentiments of Facebook posts, twitter tweets, etc.
Online food reviews: analyzing sentiments of food reviews from user feedback.

Sentiment analysis in python

There are many packages available in python which use different methods to do sentiment analysis. In the next section, we shall go through some of the most popular methods and packages.

Rule-based sentiment analysis

Rule-based sentiment analysis is one of the very basic approaches to calculate text sentiments. It only requires minimal pre-work and the idea is quite simple, this method does not use any machine learning to figure out the text sentiment. For example, we can figure out the sentiments of a sentence by counting the number of times the user has used the word “sad” in his/her tweet.

Now, let’s check out some python packages that work using this method.

Textblob

It is a simple python library that offers API access to different NLP tasks such as sentiment analysis, spelling correction, etc.

Textblob sentiment analyzer returns two properties for a given input sentence:

Polarity is a float that lies between [-1,1], -1 indicates negative sentiment and +1 indicates positive sentiments.
Subjectivity is also a float which lies in the range of [0,1]. Subjective sentences generally refer to personal opinion, emotion, or judgment.

Let’s see how to use Textblob:

from textblob import TextBlob

testimonial = TextBlob("The food was great!")
print(testimonial.sentiment)

 Sentiment(polarity=1.0, subjectivity=0.75)

Textblob will ignore the words that it doesn’t know, it will consider words and phrases that it can assign polarity to and averages to get the final score.

VADER sentiment

Valence aware dictionary for sentiment reasoning (VADER) is another popular rule-based sentiment analyzer.

It uses a list of lexical features (e.g. word) which are labeled as positive or negative according to their semantic orientation to calculate the text sentiment.

Vader sentiment returns the probability of a given input sentence to be

Positive, negative, and neutral.

For example:

“The food was great!”
Positive : 99%
Negative :1%
Neutral : 0%

These three probabilities will add up to 100%.

Let’s see how to use VADER:

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
sentence = "The food was great!"
vs = analyzer.polarity_scores(sentence)
print("{:-<65} {}".format(sentence, str(vs)))

{'compound': 0.6588, 'neg': 0.0, 'neu': 0.406, 'pos': 0.594}

Vader is optimized for social media data and can yield good results when used with data from twitter, facebook, etc.

The main drawback with the rule-based approach for sentiment analysis is that the method only cares about individual words and completely ignores the context in which it is used.

For example, “the party was savage” will be negative when considered by any token-based algorithms.

Embedding based models

Text embeddings are a form of word representation in NLP in which synonymically similar words are represented using similar vectors which when represented in an n-dimensional space will be close to each other.

Embedding based python packages use this form of text representation to predict text sentiments. This leads to better text representation in NLP and yields better model performance.

One of such packages is Flair.

Flair

Flair is a simple to use framework for state of the art NLP.

It provided various functionalities such as:

pre-trained sentiment analysis models,
text embeddings,
NER,
and more.

Let’s see how to very easily and efficiently do sentiment analysis using flair.

Flair pretrained sentiment analysis model is trained on IMDB dataset. To load and make prediction using it simply do:

from flair.models import TextClassifier
from flair.data import Sentence

classifier = TextClassifier.load('en-sentiment')
sentence = Sentence('The food was great!')
classifier.predict(sentence)

# print sentence with predicted labels
print('Sentence above is: ', sentence.labels)

[POSITIVE (0.9961)

If you like to have a custom sentiment analyzer for your domain, it is possible to train a classifier using flair using your dataset.

The drawback of using a flair pre-trained model for sentiment analysis is that it is trained on IMDB data and this model might not generalize well on data from other domains like twitter.

Building sentiment analysis model from scratch

In this section, you will learn when and how to build a sentiment analysis model from scratch using TensorFlow. So, let’s check how to do it.

Why a custom model?

Let’s first understand when you will need a custom sentiment analysis model. For example, you have a niche application like analyzing sentiments of airline reviews.

By building a custom model you can also get more control over the output.

TFhub

TensorFlow Hub is a repository of trained machine learning models ready for fine-tuning and deployable anywhere.

For our purpose, we will use the universal sentence encoder which encodes text to high dimensional vectors. You can also use any of your preferred text representation models available like GloVe, fasttext, word2vec, etc.

Model

As we are using a universal sentence encoder to vectorize our input text we don’t need an embedding layer in the model. If you are planning to use any other embedding models like GloVe, feel free to follow one of my previous posts to get a step by step guide. Here I will just build a simple model for our purpose.

Dataset

For our example, I will be using the twitter sentiment analysis dataset from Kaggle. This dataset contains 1.4 million labeled tweets.

You can download the dataset from here.

For running the example in Colab just upload your Kaggle API key when prompted by the notebook and it will automatically download the dataset for you.

Example: Twitter sentiment analysis with Python

Here is the link to the Colab notebook.

Example: Twitter sentiment analysis with Python.

In the same notebook, I have implemented all the algorithms we discussed above.

Comparing results

Now, let’s compare the results from the notebook.

Algorithm	Accuracy
Textblob	56%
VADER	56%
Flair	50%
USE model	0.775

You can see that our custom model without any hyperparameter tuning yields the best results.

I have only trained the Use model on the Twitter data, the other ones come out-of-the-box.

You can see that none of the above packages are generalizing well on twitter data, I have been working on a cool open source project to develop a package especially for twitter data and this is under active contribution.

Feel free to check out my project on GitHub.