Plotly is an open-source Python graphing library that is great for building beautiful and interactive visualizations. It is an awesome tool for discovering patterns in a dataset before delving into machine learning modeling. In this article, we will look at how to use it in an example-driven way.
Some of the visualizations you can expect to see include:
- line plots,
- scatter plots,
- bar charts,
- error bars,
- box plots,
- and bubble charts.
CHECK RELATED ARTICLE
📊 The Best Tools for Machine Learning Model Visualization
Why you would choose Plotly
- the visualizations are interactive unlike Seaborn and Matplotlib;
- it’s quite straightforward to generate complicated visuals using Plotly’s high-level Express API;
- Plotly also provides a framework known as Plotly Dash that you can use to host your visualizations as well as machine learning projects;
- you can generate HTML code for your visualizations, if you like, you can embed this on your website.
That said, generating the visualizations will require that you have your dataset cleaned. That’s a crucial part, otherwise, you will have visuals that deliver the wrong information. In this article, we skip the cleaning and pre-processing part to focus on the visualizations. We’ll provide the entire notebook used at the end of the tutorial.
It is important that you also keep in mind best practices when creating visualizations, for example:
- using colors that are eye-friendly
- ensure that the numbers add up, for example in a pie chart the percentages should total to 100%
- use the right color scale so that it is automatically clear to the viewer which color represents the higher number and which one represents the lower
- don’t put too much data in the same visual, for example, you can group and plot the topmost items instead of plotting everything in the dataset
- ensure that the plot is not too busy
- always add the source of your data, even when you are the one who has collected it. It builds credibility.
We can interact with the Plotly API in two ways;
In this piece, we’ll be using them interchangeably.
A histogram is a representation of the distribution of numerical data with the data being grouped into bins. The count for each bin is then shown. In Plotly, the data can be aggregated using aggregation functions such as sum or average. In Plotly the data to be binned can also be categorical. Here’s an example:
import plotly.express as px fig = px.histogram(views, x="views") fig.show()
Plotly bar chart
A Bar Plot is a great visualization when you want to display a categorical column and a numerical column. It shows the number of a certain numerical column in every category. Plotly Express makes it very easy to plot one.
fig = px.bar(views_top, x='event', y='views') fig.show()
You are not just limited to vertical bar charts, you can also use a horizontal one. This is done by defining the `orientation`.
fig = px.bar(views_top, x='views', y='event',orientation='h') fig.show()
Plotly pie chart
A pie chart is another visualization type for showing the number of items in every category. This type enables the user to quickly determine the share of a particular item or value on the whole dataset. Let’s show how one can be plotted using Plotly’s Graph Objects this time.
import plotly.graph_objects as go fig = go.Figure( data=[ go.Pie(labels=labels, values=values) ]) fig.show()
Plotly donut chart
You can change the above visual to a donut chart by specifying the
hole parameter. This is the size of the hole you would like to have for the donut chart.
fig = go.Figure( data=[ go.Pie(labels=labels, values=values, hole=0.2) ]) fig.show()
Plotly scatter plot
Scatterplots are great for determining whether there is a relationship or correlation between two numerical variables.
fig = px.scatter(df,x='comments',y='views') fig.show()
Plotly line chart
A line chart is majorly used to show how a certain numerical value changes over time or over a certain interval.
fig = px.line(talks, x="published_year", y="number_of_events") fig.show()
Adding text labels and annotations is quite straightforward in Plotly. In a scatter plot this can be done by specifying the
fig = px.scatter(df,x='comments',y='views',color='duration',text="published_day") fig.show()
Plotly 3D scatter
In Plotly, a 3D scatterplot can be created by passing the x, y, and z parameters.
fig = px.scatter_3d(df,x='comments',y='views',z='duration',color='views') fig.show()
Plotly Write to HTML
Plotly also allows you to save any of your visualizations to an HTML file. This is surprisingly easy to do.
Plotly 3D surface
Let’s now look at how to plot a 3D surface in Plotly. Similar to the 3D scatter, we have to pass the x,y, and z parameters.
fig = go.Figure(data=[go.Surface(z=df[['duration','views','comments']].values)]) fig.update_layout(title='3D Surface', autosize=False, width=500, height=500, margin=dict(l=65, r=50, b=65, t=90)) fig.show()
Plotly bubble chart
A Plotly bubble chart is very similar to a scatterplot. In fact, it is built from the scatterplot. The only item we add to it is the size of the bubble.
fig = px.scatter(df,x='comments',y='views',size='duration',color='num_speaker', log_x=True, size_max=60) fig.show()
Plotly can also be used to visualize a data frame as a table. We can use Plotly Graph Objects
Table to achieve this. We pass the header and the cells to the table. We can also specify the styling as shown below:
fig = go.Figure(data=[go.Table(header=dict(values=views_top.columns, fill_color='yellow', ), cells=dict(values=[views_top['event'],views_top['views']], fill_color='paleturquoise', )) ]) fig.show()
We can use a density heatmap to visualize the 2D distribution of an aggregate function. The aggregate function is applied on the variable in the z axis. The function can be the sum, average or even the count.
fig = px.density_heatmap(df, x="published_year", y="views",z="comments") fig.show()
Plotly Animations can be used to animate the changes in certain values over time. In order to achieve that, one has to define the
animation_frame. In this case, it’s the year.
px.scatter(df, x="duration", y="comments",animation_frame="published_year", size="duration", color="published_day")
Plotly box plot
A box plot shows the representation of data through their quartiles. Values falling outside the fourth quartile represent the outliers in your dataset.
fig = px.box(df, x="published_day", y="duration") fig.show()
In order to work with maps in Plotly, you will need to head over to Mapbox and grab your Mapbox API key. With the at hand, you can visualize your data on a map in Plotly. This is done using the
scatter_mapbox while passing the latitude and the longitude.
px.set_mapbox_access_token('YOURTOKEN') fig = px.scatter_mapbox(df, lat="lat", lon="lon", color="region", size="views", color_continuous_scale= px.colors.cyclical.IceFire, size_max=15) fig.show()
With Plotly, we can also visualize multiple plots on the same graph. This is done using Plotly Subplots. The plots are created by defining a
facet_col. The graphs will be broken into as many unique values as available from the
px.scatter(df, x="duration", y="comments", animation_frame="published_month", animation_group="event", facet_col="published_day",width=1500, height=500, size="views", color="published_day", )
Plotly error bars
Error bars are used to show the variability of data in a visualization. Generally, they help in showing the estimated error or the preciseness of a certain measure. The length of the error bar reveals the level of uncertainty. Longer error bars indicate that the data points are more spread out hence more uncertain. They can be applied to graphs such as line charts, bar graphs, and scatterplots.
fig = go.Figure( data=[ go.Bar( x=views_top['event'], y=views_top['views'], error_y=dict(type='data', array=views_top['error'].values) ) ]) fig.show()
Hopefully, this piece has shown you how you can use Plotly in your next machine learning workflow. You can even use it to visualize the performance metrics of your machine learning models. Unlike other tools, its visuals are eye-catching as well as interactive.
The interactivity enables you to zoom in and out of specific parts in the graph. In this way, you can look a little deeper to analyze your graph in more detail. Specifically, we have seen how you can use popular graphs such as histograms, bar charts, and scatter plots in Plotly. We have also seen that we can build multiple plots on the same graph as well as visualize data on the map.
The Notebook used can be found here.
Happy plotting – no pun intended!
The Best Tools for Machine Learning Model Visualization
4 mins read | Paweł Kijko | Posted May 25, 2020
The phrase “Every model is wrong but some are useful” is especially true in Machine Learning. When developing machine learning models you should always understand where it works as expected and where it fails miserably.
There are many methods that you can use to get that understanding:
- Look at evaluation metrics (also you should know how to choose an evaluation metric for your problem)
- Look at performance charts like ROC, Lift Curve, Confusion Matrix, and others
- Look at learning curves to estimate overfitting
- Look at model predictions on best/worst cases
- Look how resource-intensive is model training and inference (they translate to serious costs and will be crucial to the business side of things)
Once you get some decent understanding for one model you are good, right? Wrong 🙂
Typically, you need to do some or a lot of experimenting with model improvement ideas and visualizing differences between various experiments become crucial.
You can do all of those (or most of those) yourself but today there are tools that you can use. If you’re looking for the best tools that will help you visualize, organize, and gather data, you’re in the right place.Continue reading ->