Plotly is an open-source Python graphing library that is great for building beautiful and interactive visualizations. It is an awesome tool for discovering patterns in a dataset before delving into machine learning modeling. In this article, we will look at how to use it in an example-driven way.
Some of the visualizations you can expect to see include:
- line plots,
- scatter plots,
- bar charts,
- error bars,
- box plots,
- and bubble charts.
CHECK RELATED ARTICLE
📊 The Best Tools for Machine Learning Model Visualization
Why you would choose Plotly
- the visualizations are interactive unlike Seaborn and Matplotlib;
- it’s quite straightforward to generate complicated visuals using Plotly’s high-level Express API;
- Plotly also provides a framework known as Plotly Dash that you can use to host your visualizations as well as machine learning projects;
- you can generate HTML code for your visualizations, if you like, you can embed this on your website.
That said, generating the visualizations will require that you have your dataset cleaned. That’s a crucial part, otherwise, you will have visuals that deliver the wrong information. In this article, we skip the cleaning and pre-processing part to focus on the visualizations. We’ll provide the entire notebook used at the end of the tutorial.
It is important that you also keep in mind best practices when creating visualizations, for example:
- using colors that are eye-friendly
- ensure that the numbers add up, for example in a pie chart the percentages should total to 100%
- use the right color scale so that it is automatically clear to the viewer which color represents the higher number and which one represents the lower
- don’t put too much data in the same visual, for example, you can group and plot the topmost items instead of plotting everything in the dataset
- ensure that the plot is not too busy
- always add the source of your data, even when you are the one who has collected it. It builds credibility.
We can interact with the Plotly API in two ways;
In this piece, we’ll be using them interchangeably.
A histogram is a representation of the distribution of numerical data with the data being grouped into bins. The count for each bin is then shown. In Plotly, the data can be aggregated using aggregation functions such as sum or average. In Plotly the data to be binned can also be categorical. Here’s an example:
import plotly.express as px fig = px.histogram(views, x="views") fig.show()
Plotly bar chart
A Bar Plot is a great visualization when you want to display a categorical column and a numerical column. It shows the number of a certain numerical column in every category. Plotly Express makes it very easy to plot one.
fig = px.bar(views_top, x='event', y='views') fig.show()
You are not just limited to vertical bar charts, you can also use a horizontal one. This is done by defining the `orientation`.
fig = px.bar(views_top, x='views', y='event',orientation='h') fig.show()
Plotly pie chart
A pie chart is another visualization type for showing the number of items in every category. This type enables the user to quickly determine the share of a particular item or value on the whole dataset. Let’s show how one can be plotted using Plotly’s Graph Objects this time.
import plotly.graph_objects as go fig = go.Figure( data=[ go.Pie(labels=labels, values=values) ]) fig.show()
Plotly donut chart
You can change the above visual to a donut chart by specifying the
hole parameter. This is the size of the hole you would like to have for the donut chart.
fig = go.Figure( data=[ go.Pie(labels=labels, values=values, hole=0.2) ]) fig.show()
Plotly scatter plot
Scatterplots are great for determining whether there is a relationship or correlation between two numerical variables.
fig = px.scatter(df,x='comments',y='views') fig.show()
Plotly line chart
A line chart is majorly used to show how a certain numerical value changes over time or over a certain interval.
fig = px.line(talks, x="published_year", y="number_of_events") fig.show()
Adding text labels and annotations is quite straightforward in Plotly. In a scatter plot this can be done by specifying the
fig = px.scatter(df,x='comments',y='views',color='duration',text="published_day") fig.show()
Plotly 3D scatter
In Plotly, a 3D scatterplot can be created by passing the x, y, and z parameters.
fig = px.scatter_3d(df,x='comments',y='views',z='duration',color='views') fig.show()
Plotly Write to HTML
Plotly also allows you to save any of your visualizations to an HTML file. This is surprisingly easy to do.
Plotly 3D surface
Let’s now look at how to plot a 3D surface in Plotly. Similar to the 3D scatter, we have to pass the x,y, and z parameters.
fig = go.Figure(data=[go.Surface(z=df[['duration','views','comments']].values)]) fig.update_layout(title='3D Surface', autosize=False, width=500, height=500, margin=dict(l=65, r=50, b=65, t=90)) fig.show()
Plotly bubble chart
A Plotly bubble chart is very similar to a scatterplot. In fact, it is built from the scatterplot. The only item we add to it is the size of the bubble.
fig = px.scatter(df,x='comments',y='views',size='duration',color='num_speaker', log_x=True, size_max=60) fig.show()
Plotly can also be used to visualize a data frame as a table. We can use Plotly Graph Objects
Table to achieve this. We pass the header and the cells to the table. We can also specify the styling as shown below:
fig = go.Figure(data=[go.Table(header=dict(values=views_top.columns, fill_color='yellow', ), cells=dict(values=[views_top['event'],views_top['views']], fill_color='paleturquoise', )) ]) fig.show()
We can use a density heatmap to visualize the 2D distribution of an aggregate function. The aggregate function is applied on the variable in the z axis. The function can be the sum, average or even the count.
fig = px.density_heatmap(df, x="published_year", y="views",z="comments") fig.show()
Plotly Animations can be used to animate the changes in certain values over time. In order to achieve that, one has to define the
animation_frame. In this case, it’s the year.
px.scatter(df, x="duration", y="comments",animation_frame="published_year", size="duration", color="published_day")
Plotly box plot
A box plot shows the representation of data through their quartiles. Values falling outside the fourth quartile represent the outliers in your dataset.
fig = px.box(df, x="published_day", y="duration") fig.show()
In order to work with maps in Plotly, you will need to head over to Mapbox and grab your Mapbox API key. With the at hand, you can visualize your data on a map in Plotly. This is done using the
scatter_mapbox while passing the latitude and the longitude.
px.set_mapbox_access_token('YOURTOKEN') fig = px.scatter_mapbox(df, lat="lat", lon="lon", color="region", size="views", color_continuous_scale= px.colors.cyclical.IceFire, size_max=15) fig.show()
With Plotly, we can also visualize multiple plots on the same graph. This is done using Plotly Subplots. The plots are created by defining a
facet_col. The graphs will be broken into as many unique values as available from the
px.scatter(df, x="duration", y="comments", animation_frame="published_month", animation_group="event", facet_col="published_day",width=1500, height=500, size="views", color="published_day", )
Plotly error bars
Error bars are used to show the variability of data in a visualization. Generally, they help in showing the estimated error or the preciseness of a certain measure. The length of the error bar reveals the level of uncertainty. Longer error bars indicate that the data points are more spread out hence more uncertain. They can be applied to graphs such as line charts, bar graphs, and scatterplots.
fig = go.Figure( data=[ go.Bar( x=views_top['event'], y=views_top['views'], error_y=dict(type='data', array=views_top['error'].values) ) ]) fig.show()
Hopefully, this piece has shown you how you can use Plotly in your next machine learning workflow. You can even use it to visualize the performance metrics of your machine learning models. Unlike other tools, its visuals are eye-catching as well as interactive.
The interactivity enables you to zoom in and out of specific parts in the graph. In this way, you can look a little deeper to analyze your graph in more detail. Specifically, we have seen how you can use popular graphs such as histograms, bar charts, and scatter plots in Plotly. We have also seen that we can build multiple plots on the same graph as well as visualize data on the map.
The Notebook used can be found here.
Happy plotting – no pun intended!
ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It
Jakub Czakon | Posted November 26, 2020
Let me share a story that I’ve heard too many times.
”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…
…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…
…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”
– unfortunate ML researcher.
And the truth is, when you develop ML models you will run a lot of experiments.
Those experiments may:
- use different models and model hyperparameters
- use different training or evaluation data,
- run different code (including this small change that you wanted to test quickly)
- run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)
And as a result, they can produce completely different evaluation metrics.
Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.
This is where ML experiment tracking comes in.Continue reading ->