Blog » ML Tools » Plotly Python Tutorial for Machine Learning Specialists

Plotly Python Tutorial for Machine Learning Specialists

Plotly is an open-source Python graphing library that is great for building beautiful and interactive visualizations. It is an awesome tool for discovering patterns in a dataset before delving into machine learning modeling. In this article, we will look at how to use it in an example-driven way. 

Some of the visualizations you can expect to see include:

  • line plots, 
  • scatter plots, 
  • bar charts, 
  • error bars, 
  • box plots, 
  • histograms, 
  • heatmaps, 
  • subplots, 
  • and bubble charts. 

CHECK RELATED ARTICLE
📊 The Best Tools for Machine Learning Model Visualization


Why you would choose Plotly

Now, the truth is that you can still get some of these visualizations using Matplotlib, Seaborn, or Bokeh. There are a couple of reasons why you would choose Plotly:

  • the visualizations are interactive unlike Seaborn and Matplotlib;
  • it’s quite straightforward to generate complicated visuals using Plotly’s high-level Express API;
  • Plotly also provides a framework known as Plotly Dash that you can use to host your visualizations as well as machine learning projects;
  • you can generate HTML code for your visualizations, if you like, you can embed this on your website.

That said, generating the visualizations will require that you have your dataset cleaned. That’s a crucial part, otherwise, you will have visuals that deliver the wrong information. In this article, we skip the cleaning and pre-processing part to focus on the visualizations. We’ll provide the entire notebook used at the end of the tutorial. 

It is important that you also keep in mind best practices when creating visualizations, for example:

  • using colors that are eye-friendly
  • ensure that the numbers add up, for example in a pie chart the percentages should total to 100%
  • use the right color scale so that it is automatically clear to the viewer which color represents the higher number and which one represents the lower
  • don’t put too much data in the same visual, for example, you can group and plot the topmost items instead of plotting everything in the dataset
  • ensure that the plot is not too busy
  • always add the source of your data, even when you are the one who has collected it. It builds credibility. 

We can interact with the Plotly API in two ways; 

In this piece, we’ll be using them interchangeably. 

Plotly histogram

A histogram is a representation of the distribution of numerical data with the data being grouped into bins. The count for each bin is then shown. In Plotly, the data can be aggregated using aggregation functions such as sum or average. In Plotly the data to be binned can also be categorical. Here’s an example:

import plotly.express as px
fig = px.histogram(views, x="views")
fig.show()
plotly histogram

Plotly bar chart

A Bar Plot is a great visualization when you want to display a categorical column and a numerical column. It shows the number of a certain numerical column in every category. Plotly Express makes it very easy to plot one. 

fig = px.bar(views_top, x='event', y='views')
fig.show()
plotly bar chart

You are not just limited to vertical bar charts, you can also use a horizontal one. This is done by defining the `orientation`. 

fig = px.bar(views_top, x='views', y='event',orientation='h')
fig.show()
plotly bar chart

Plotly pie chart

A pie chart is another visualization type for showing the number of items in every category. This type enables the user to quickly determine the share of a particular item or value on the whole dataset. Let’s show how one can be plotted using Plotly’s Graph Objects this time. 

import plotly.graph_objects as go

fig = go.Figure(
    data=[
        go.Pie(labels=labels, values=values)
    ])
fig.show()
plotly pie chart

Plotly donut chart

You can change the above visual to a donut chart by specifying the hole parameter. This is the size of the hole you would like to have for the donut chart. 

fig = go.Figure(
    data=[
        go.Pie(labels=labels, values=values, hole=0.2)
    ])
fig.show()
plotly donut chart

Plotly scatter plot

Scatterplots are great for determining whether there is a relationship or correlation between two numerical variables.

fig = px.scatter(df,x='comments',y='views')
fig.show()
plotly scatter plot

Plotly line chart

A line chart is majorly used to show how a certain numerical value changes over time or over a certain interval. 

fig = px.line(talks, x="published_year", y="number_of_events")
fig.show()
plotly line chart

Plotly annotations

Adding text labels and annotations is quite straightforward in Plotly. In a scatter plot this can be done by specifying the text parameter. 

fig = px.scatter(df,x='comments',y='views',color='duration',text="published_day")
fig.show()
plotly annotations

Plotly 3D scatter

In Plotly, a 3D scatterplot can be created by passing the x, y, and z parameters.

fig = px.scatter_3d(df,x='comments',y='views',z='duration',color='views')
fig.show()
plotly 3d scatter

Plotly Write to HTML

Plotly also allows you to save any of your visualizations to an HTML file. This is surprisingly easy to do. 

fig.write_html("3d.html")
plotly html

Plotly 3D surface

Let’s now look at how to plot a 3D surface in Plotly. Similar to the 3D scatter, we have to pass the x,y, and z parameters.

fig = go.Figure(data=[go.Surface(z=df[['duration','views','comments']].values)])

fig.update_layout(title='3D Surface', autosize=False,
                  width=500, height=500,
                  margin=dict(l=65, r=50, b=65, t=90))

fig.show()
plotly 3D Surface

Plotly bubble chart

A Plotly bubble chart is very similar to a scatterplot. In fact, it is built from the scatterplot. The only item we add to it is the size of the bubble. 

fig = px.scatter(df,x='comments',y='views',size='duration',color='num_speaker', log_x=True, size_max=60)
fig.show()
plotly bubble chart

Plotly table

Plotly can also be used to visualize a data frame as a table. We can use Plotly Graph Objects Table to achieve this. We pass the header and the cells to the table. We can also specify the styling as shown below:

fig = go.Figure(data=[go.Table(header=dict(values=views_top.columns,
                                           fill_color='yellow',
),
                 cells=dict(values=[views_top['event'],views_top['views']],
                            fill_color='paleturquoise',
                           ))
                     ])
fig.show()
plotly table

Plotly heatmap

We can use a density heatmap to visualize the 2D distribution of an aggregate function. The aggregate function is applied on the variable in the z axis. The function can be the sum, average or even the count. 

fig = px.density_heatmap(df, x="published_year", y="views",z="comments")
fig.show()
plotly heatmap

Plotly animations

Plotly Animations can be used to animate the changes in certain values over time. In order to achieve that, one has to define the animation_frame. In this case, it’s the year.

px.scatter(df, x="duration", y="comments",animation_frame="published_year", size="duration", color="published_day")
plotly animations

Plotly box plot

A box plot shows the representation of data through their quartiles. Values falling outside the fourth quartile represent the outliers in your dataset.

fig = px.box(df, x="published_day", y="duration")
fig.show()
plotly box plot

Plotly maps

In order to work with maps in Plotly, you will need to head over to Mapbox and grab your Mapbox API key. With the at hand, you can visualize your data on a map in Plotly. This is done using the scatter_mapbox while passing the latitude and the longitude. 

px.set_mapbox_access_token('YOURTOKEN')
fig = px.scatter_mapbox(df, lat="lat", lon="lon",
                        color="region", 
                        size="views",
                  color_continuous_scale=
                        px.colors.cyclical.IceFire, size_max=15)
fig.show()
plotly maps

Plotly subplots

With Plotly, we can also visualize multiple plots on the same graph. This is done using Plotly Subplots. The plots are created by defining a facet_col. The graphs will be broken into as many unique values as available from the facet_col column. 

px.scatter(df, x="duration", y="comments",
           animation_frame="published_month", animation_group="event",
           facet_col="published_day",width=1500, height=500,
           size="views", color="published_day",
          )
plotly subplots

Plotly error bars

Error bars are used to show the variability of data in a visualization. Generally, they help in showing the estimated error or the preciseness of a certain measure. The length of the error bar reveals the level of uncertainty. Longer error bars indicate that the data points are more spread out hence more uncertain. They can be applied to graphs such as line charts, bar graphs, and scatterplots.

fig =  go.Figure(
    data=[
        go.Bar(
    x=views_top['event'], y=views_top['views'],
    error_y=dict(type='data', array=views_top['error'].values)
)
    ])
fig.show()
plotly error bars

Final thoughts

Hopefully, this piece has shown you how you can use Plotly in your next machine learning workflow. You can even use it to visualize the performance metrics of your machine learning models. Unlike other tools, its visuals are eye-catching as well as interactive. 

The interactivity enables you to zoom in and out of specific parts in the graph. In this way, you can look a little deeper to analyze your graph in more detail. Specifically, we have seen how you can use popular graphs such as histograms, bar charts, and scatter plots in Plotly. We have also seen that we can build multiple plots on the same graph as well as visualize data on the map. 

The Notebook used can be found here

Happy plotting – no pun intended!


READ NEXT

The Best Tools for Machine Learning Model Visualization

4 mins read | Paweł Kijko | Posted May 25, 2020

The phrase “Every model is wrong but some are useful” is especially true in Machine Learning. When developing machine learning models you should always understand where it works as expected and where it fails miserably.

There are many methods that you can use to get that understanding:

  • Look at evaluation metrics (also you should know how to choose an evaluation metric for your problem)
  • Look at performance charts like ROC, Lift Curve, Confusion Matrix, and others
  • Look at learning curves to estimate overfitting
  • Look at model predictions on best/worst cases
  • Look how resource-intensive is model training and inference (they translate to serious costs and will be crucial to the business side of things) 

Once you get some decent understanding for one model you are good, right? Wrong 🙂

Typically, you need to do some or a lot of experimenting with model improvement ideas and visualizing differences between various experiments become crucial. 

You can do all of those (or most of those) yourself but today there are tools that you can use. If you’re looking for the best tools that will help you visualize, organize, and gather data, you’re in the right place.

Continue reading ->

The Best Tools for Machine Learning Model Visualization

Read more
Pandas plotting

Pandas Plot: Deep Dive Into Plotting Directly with Pandas

Read more

The Best Tools to Visualize Metrics and Hyperparameters of Machine Learning Experiments

Read more
Data analysis nlp featured

Exploratory Data Analysis for Natural Language Processing: A Complete Guide to Python Tools

Read more