MLOps Blog

How AI and ML Can Solve Business Problems in Tourism – Chatbots, Recommendation Systems, and Sentiment Analysis

7 min
25th August, 2023

Tourism has enjoyed massive growth over the years, as people seek to spend time away from home in pursuit of recreation, relaxation, and pleasure. At least before COVID times, tourism has been a fast-growing sector that plays a big role in the global economy.

According to the United Nations World Tourism Organization, there were an estimated 25 million international arrivals in 1950. 68 years later, it grew to about 1.4 billion international arrivals, an approximately 56 fold increase. 

According to Statista, travel, and tourism directly contributed around $2.9 trillion in 2019 to the global economy.

Like every other industry, tourism keeps improving methods used to serve their clients and provide satisfaction in order to keep them coming back.

In this article, we will explore:

  • ways in which machine learning improves customer satisfaction and solves business problems,
  • how companies leverage machine learning to improve the tourist experience. 

We’ll also go through the process of building a machine learning model (data collection, data cleaning, and model building) for a tourism-related problem.


Machine Learning Model Management in 2020 and Beyond – Everything That You Need to Know

Machine Learning and the tourism industry

Modern technologies make travelling easy. You can book flights and hotels in mobile apps, easily find restaurants and entertainment, and pay for everything online.

This also means that a lot of data is generated all the time from mobile devices. Industries leverage this through Big Data solutions to improve services and make things easier for consumers.

Apart from just analyzing this data to find consumer patterns, machine learning and AI is used to predict future outcomes, which helps to solve issues before they happen. 

Data has become the most valuable asset in the world, and a key driver of growth. The impact of machine learning in tourism is heavily geared towards customer satisfaction and engagement.

How can Machine Learning solve problems in tourism?

1. Chatbots

Customers today want to stay up to date on information from companies serving them, and need to be able to ask questions and get answers quickly. 

Tourism companies used to only be able to hire Front Desk Attendants and Customer Care Representatives. This limited their ability to help customers, and sometimes lead to churn due to bad behaviors of human customer service agents. 

With the creation of chatbots, companies started using them as personal assistants for their customers, through existing platforms like web browsers and messenger applications (WhatsApp or Facebook). 

Chatbots can answer common questions, recommend places to visit or things to do while touring a city, all very quickly and without the hassle scrolling through a website or waiting to speak to a customer service agent.

Chatbot benefits:

  • Time savings,
  • Personalized services,
  • Very low financial cost for companies,
  • Chat can be analysed to understand what customers talk about and plan future improvements.


7 Applications of Reinforcement Learning in Finance and Trading

2. Recommendation systems

Recommendation systems are everywhere. They suggest relevant items to users based on different factors and data. Top companies including Netflix, Linkedin, and Amazon utilize the power of recommendation systems to suggest personalized items to users.

The tourism industry is no different. Here, these systems reduce customer churn and transaction costs, and save time for both customers and service providers. 

Companies use customer data and machine learning algorithms to build a recommendation model that can accurately suggest the best places to visit without having to manually check catalogs, websites, or reaching out to customer service agents. 

These models are built on data like past expenses, travel destinations, ratings, and previously chosen offers.

Recommendation system benefits:

  • Quickly provide personalized suggestions,
  • Supports precise marketing,
  • Facilitates smarter travels for tourists.

3. Social media sentiment analysis

Social media has become a crucial way of getting reviews from people, and this can affect how new users perceive your company.

Analyzing sentiments, locating trouble spots, and fixing them in tourism companies can help drive growth. Some customers might be dissatisfied with services while others can be delighted, and companies can use this information to their advantage. 

How can we analyze these reviews in an automated way to check if they’re good or bad?

With sentiment analysis. It uses Natural Language Processing, a sub-field of AI and ML that automates the process of examining relationships and meaning in reviews from customers.

Benefits of Social Media Sentiment Analysis:

  • Provides an efficient performance indicator,
  • Helps to understand customers,
  • Helps measure results of marketing campaigns.


Sentiment Analysis in Python: TextBlob vs Vader Sentiment vs Flair vs Building It From Scratch

4. Targeting the right audience

Understanding clients and knowing what to market to which customer has proved to be a very effective strategy for marketing. 

Clients have different characteristics, live in different locations, work different jobs, earn different salaries. In tourism, some customers might afford that luxurious Santorini vacation while some might not, and marketing it to the wrong class of clients can only increase marketing spending without yielding any results. 

Manually predicting client behavior and segmenting can be a burden. With thousands of data points generated daily, only machines can do this efficiently.

How can machine learning help target marketing in the tourism industry?

Machine learning can help identify segments of clients using clustering algorithms, where clients with similar characteristics are grouped based on features like travel frequency, duration of stay, amount spent, and so on. 

Machine learning can also help predict client behavior and avoid clients with a high probability of not engaging in a tourism offer. 

Benefits of targeting the right audience:

  • Better Recommendations,
  • Increased Conversions,
  • Better and more fruitful ad campaigns.

Now that we’ve talked about how machine learning and AI can improve service and marketing in tourism, let’s imagine an example company using machine learning and AI, and how they would do it.

Quick demo

Business problem

Humtourist, a tourism company, has been getting bad reviews from customers for poor suggestions for entertainment spots, restaurants, relaxation centers in Manhattan, New York. 

This has increased customer churn and drastically reduced revenue. The management heard about data science and machine learning, and that it can increase customer satisfaction and improve revenue. 

Applying a Machine Learning solution

We could build a recommendation and segmentation system for Humtourist clients, to ensure they have the best experience during their stay in Manhattan, New York. 

Part of this job includes recommending the best places / neighborhoods for different activities ranging from relaxation, busy areas, restaurants, parks, and more. All this to increase customer satisfaction and provide fun experiences while in Manhattan.

Note: the aim of this tutorial isn’t to build a 100% accurate model. Here we will be building a simple model.

To build this model, we will:

  • connect to an external source to get data, 
  • work with the JSON data format, 
  • clean the data in a form ready for modeling, 
  • build a simple weighted average recommendation system as well as a segmentation system. 


  • Python 3.0+ programming knowledge,
  • Data preprocessing knowledge,
  • Basic Machine Learning knowledge,
  • A Foursquare account.

We will connect to an external data source to extract data using the Application Programming Interface (API). 

Foursquare is a location technology platform dedicated to improving how people move around the world. Companies such as Uber, Facebook, Apple, and Samsung leverage Foursquare’s developer tools to help make sense of phone locations. 

We will use Foursquare’s API to get necessary data, but first you need to:

  1. Create an account at the foursquare developer page.
  2. Copy both client ID and Client Secret key to a note.

Coding steps

First, we will add New York borough data from During preprocessing we will extract Borough, Neighborhood, Latitude and Longitude data.

Since we’ll be focusing on just the Manhattan borough, we will extract only data for Manhattan.

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analysis
Import requests
from sklearn.cluster import KMeans # Clustering algorithm
import json # library to handle JSON files

#download dataset
!wget -q -O 'newyork_data.json'
#Open the json dataset. 
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

#This helps us in knowing which value names to use when iterating to get specific datas
neighborhoods_data = newyork_data['features']

column_names = ['Borough', 'Neighbourhood', 'Latitude', 'Longitude']
neighbourhoods = pd.DataFrame(columns=column_names)

#Pass in data to the new table
for data in neighborhoods_data:
    borough = neighborhoods_name = data['properties']['borough']
    neighborhoods_name = data['properties']['name']

    neighborhoods_latlon = data['geometry']['coordinates']
    neighborhoods_lat = neighborhoods_latlon[1]
    neighborhoods_lon = neighborhoods_latlon[0]
    neighborhoods = neighborhoods.append({
        'Borough': borough,
        'Neighbourhood': neighborhoods_name,
        'Latitude': neighborhoods_lat,
        'Longitude': neighborhoods_lon,
    }, ignore_index=True)

# Since we are working with only manhattan data
manhattan = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)

Next we will employ the Foursquare API to extract venues, and preprocess this data in a form ready for modeling. 

We will need the client ID and client secret key here to build the segmentation system first, and the recommendation system after it. Check the Foursquare Documentation on venues.

#Define foursquare credentials and Version
CLIENT_ID = '*****************'
CLIENT_SECRET = '************************'
VERSION = '20200202'
LIMIT = 100

#Create a function to extract Neighbourhood and venues within 500 radius
def getNearbyVenues(names, latitudes, longitudes, radius=500):

    for name, lat, lng in zip(names, latitudes, longitudes):

        # create the API request URL
        url = '{}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            lng, radius, LIMIT)

        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        # return only relevant information for each nearby venue
            name, lat, lng, v['venue']['name'], v['venue']['id'],
           v['venue']['location']['lat'], v['venue']['location']['lng'],
           v['venue']['categories'][0]['name']) for v in results])

     nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
     nearby_venues.columns = ['Neighbourhood', 'Neighbourhood Latitude', 'Neighbourhood      Longitude',  'Venue', 'id', 'Venue Latitude', 'Venue Longitude','Venue Category']


manhattan_venues = getNearbyVenues(names=manhattan['Neighbourhood'],

The code block above produces the output below:


Now that we have our venue and venue category ready, we will extract only necessary data, one hot encode, find the top 10 most common venues, and build our segmentation system.

#one hot encoding
manhattan_seg_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

#add neighborhood column back to dataframe,  and adjust dataframe columns
manhattan_seg_onehot['Neighbourhood'] = manhattan_venues['Neighbourhood']
fixed_columns = [manhattan_seg_onehot.columns[-1]] + list(manhattan_seg_onehot.columns[:-1])
manhattan_seg_onehot = manhattan_seg_onehot[fixed_columns]

# group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
manhattan_seg_group = manhattan_seg_onehot.groupby('Neighbourhood').mean().reset_index()

#Create a function to extract most common venues
def return_most_common_values(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)

    return row_categories_sorted.index.values[0: num_top_venues]

columns = ['Neighbourhood']
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
for ind in np.arange(num_top_venues):
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
manhattan_venues_seg = pd.DataFrame(columns=columns)
manhattan_venues_seg['Neighbourhood'] = manhattan_seg_group['Neighbourhood']

for ind in np.arange(manhattan_seg_group.shape[0]):
    manhattan_venues_seg.iloc[ind, 1:] = return_most_common_values(manhattan_seg_group.iloc[ind, :], num_top_venues)

The code block above produces the result below:

tourism table

These are the most common venues in different neighbourhoods in Manhattan. 

Next we will use the KMeans Algorithm,  a clustering algorithm for our segmentation system. 

We will set our K parameter to 5. You can experiment with different numbers or, even better, use the Elbow Method to determine the best K value.

Top Research Papers from the ECML-PKDD 2020 Conference

#Import library
From sklearn.cluster import KMeans
man_cluster = manhattan_seg_group.drop('Neighbourhood', 1)
km = KMeans(n_clusters=5, random_state=0).fit(man_cluster)

#insert clusters back to dataframe
manhattan_venues_seg.insert(0, 'Cluster', km.labels_)
manhattan_merge = manhattan
manhattan_merge = manhattan_merge.join(manhattan_venues_seg.set_index('Neighbourhood'), on='Neighbourhood')

Model building is done,  now lets see some results from the clusters 0, 1 and 2:







From our Segmentation system, we can see that Cluster 0 consists of lively areas (Restaurants, Theaters, Gym, Cafe, Stores), Cluster 1 comprises mainly restaurants, and Cluster 2 consists of Recreational and Relaxation venues.

Now that we’re done building our segmentation system, it’s time to build our simple weighted average recommendation system.

Foursquare only allows 50 premium API calls per day for categories like venue ratings. Since we have data already on ratings, we will just read them in. In the next phase we will preprocess and join ratings data to venue data.

#add rating data to manhattan_venues dataframe
#take note of the number of ratings
manhattan_rec_data = pd.concat([manhattan_venues[0:298],manhattan_venues[2579:2679]],  axis=0).reset_index(drop=True)
manhattan_rec_data['ratings'] = rating['rating']

#drop rows where venues where not rated
manhattan_rec_data = manhattan_data[manhattan_data['ratings'] != 'not rated']

from sklearn.preprocessing import LabelEncoder

# one hot encoding
recommender_onehot = pd.get_dummies(manhattan_data[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
recommender_onehot['Neighbourhood'] = manhattan_data['Neighbourhood']

# move neighborhood column to the first column
fixed_columns = [recommender_onehot.columns[-1]] + list(recommender_onehot.columns[:-1])
recommender_onehot=recommender_onehot.groupby(['Neighbourhood'], sort=False).sum()

#reset index of encoded data
matrix = recommender_onehot.reset_index(drop=True)

#extract neighbourhood and rating
#change rating data type to float
rec_rating['ratings'] = rec_rating['ratings'].astype('float')

rec_rating_grouped=rec_rating.groupby('Neighbourhood', sort=False)['ratings'].mean()

The result  below shows the average of entire ratings from each neighborhood:


Now let’s determine the user rating weights by performing a dot product between ratings and matrix. We’ll also create the final ratings table by multiplying the neighborhood by the weights and taking the weighted average.


#perform dot product to get weights
user_rate = matrix.transpose().dot(final_rating['ratings'])
#The user profile rating
rec_df= ((recommender_onehot*user_rate).sum(axis=1))/(user_rate.sum())
rec_df = rec_df.sort_values(ascending=False)


Now the recommendation table is ready. Let’s see the top 4 most recommended neighbourhoods:


Since Civic Center is our most recommended neighbourhood to visit, let’s see the top activities:


So anytime this user finds himself in Manhattan, he / she should visit the Civic Center, go to the Spa, be sure to visit the Coffee Shop and also order some french cuisine at that French Restaurant.

Result and discussion

Our system recommends the best neighborhoods to visit when clients of Humtourist are in Manhattan, also showing them the best neighborhoods for different activities with the help of our segmentation system. 

Our analysis shows that the majority of venues in Manhattan are restaurants. Also, we see that the coffee shop and Chinese restaurant are the most recommended venues to visit, we also found out that Chinatown was highly influential in the overall ratings of the Chinese restaurants, and the Civic Center was also highly influential in the overall ratings of the Coffee Shop. 

Our model also showed us different top activities to participate in once a user is in Manhattan.

This system can further be improved by obtaining more data, but it’s just a simple demonstration for now.


That’s it for this article, we have explored:

  • How machine learning can help the tourism industry,
  • How the modern day Tourism industry uses Machine Learning,
  • How to build simple recommendation and segmentation systems.

The tourism industry is in shambles at the moment, but once the COVID situation is resolved, it will most likely jump back to its high numbers. Machine learning and AI will be right there, helping companies stay afloat and generate growth.

Thank you for reading!

Was the article useful?

Thank you for your feedback!