Plotly: Scatter Plots and Pie Charts
This is the 4th part of my Plotly series. If you haven’t read the previous ones, you can read them here.
In this blog, I am going to explain how can we render Scatter Plots and Pie Charts using Plotly. Let’s start with Scatter Plots.
Let’s say we are doing Iris data with different petal sizes. We will be using an included dataframe again, which is in Seaborn. Here, we will be using the iris dataframe.
The dataset contains the sizes of the sepals and the petals.
df_iris = px.data.iris()
px.scatter(df_iris, x='sepal_width', y='sepal_length', color='species', size='petal_length', hover_data=['petal_width'])
In the given code snippet, the `px` module from the `plotly.express` library is being used to create a scatter plot. Let’s break down the code step by step:
1. `df_iris = px.data.iris()`: This line loads the iris dataset from the `px.data` module and assigns it to the variable `df_iris`. The iris dataset is a popular dataset in machine learning and contains measurements of various iris flowers.
2. `px.scatter(df_iris, x=’sepal_width’, y=’sepal_length’, color=’species’, size=’petal_length’, hover_data=[‘petal_width’])`: This line creates a scatter plot using the `scatter` function from `plotly.express`. Here’s an explanation of the parameters used:
- `df_iris`: The first parameter is the DataFrame containing the data to be plotted, which is `df_iris` in this case.
- `x=’sepal_width’` and `y=’sepal_length’`: These parameters specify the columns from the DataFrame to be plotted on the x-axis and y-axis, respectively. In this case, the sepal width is plotted on the x-axis, and the sepal length is plotted on the y-axis.
- `color=’species’`: This parameter determines how the data points are colored on the plot. The ‘species’ column from the DataFrame is used to assign different colors to different species of iris flowers. Each species will have a distinct color on the scatter plot.
- `size=’petal_length’`: This parameter controls the size of the data points on the scatter plot. The ‘petal_length’ column from the DataFrame is used to determine the size of each data point. Larger values in the ‘petal_length’ column will result in larger data points on the plot.
- `hover_data=[‘petal_width’]`: This parameter specifies additional data to be shown when hovering over a data point on the scatter plot. In this case, when you hover over a point, it will display the value of the ‘petal_width’ column for that specific data point.
Overall, the code generates a scatter plot visualizing the relationship between sepal width and sepal length for different species of iris flowers. The color of the data points represents the species, the size represents the petal length, and hovering over a data point reveals the petal width.
Now, let’s say we want to create a more customized Scatter plot:
fig = go.Figure()
fig.add_trace(go.Scatter(
x=df_iris.sepal_width, y=df_iris.sepal_length,
mode='markers',
marker_color=df_iris.sepal_width,
text=df_iris.species, marker=dict(showscale=True)))
fig.update_traces(marker_line_width=2, marker_size=10)
In the given code snippet, the `go` module from the `plotly.graph_objects` library is being used to create a scatter plot. Let’s break down the code step by step:
1. `fig = go.Figure()`: This line creates a new empty figure object using the `Figure` class from `plotly.graph_objects`. The `fig` variable is used to build and modify the plot.
2. `fig.add_trace(go.Scatter(…))`: This line adds a scatter trace to the figure using the `Scatter` class from `plotly.graph_objects`. Here’s an explanation of the parameters used within the `go.Scatter()` function:
- `x=df_iris.sepal_width` and `y=df_iris.sepal_length`: These parameters specify the data to be plotted on the x-axis and y-axis, respectively. In this case, the sepal width is plotted on the x-axis, and the sepal length is plotted on the y-axis. The data is extracted from the `df_iris` DataFrame.
- `mode=’markers’`: This parameter specifies that the plot should be represented as individual markers instead of connected lines.
- `marker_color=df_iris.sepal_width`: This parameter determines the color of each data point. The ‘sepal_width’ column from the `df_iris` DataFrame is used to assign a color to each point. In this case, the color varies based on the sepal width.
- `text=df_iris.species`: This parameter sets the text displayed when hovering over each data point. The ‘species’ column from the `df_iris` DataFrame is used to provide the text for each point. The species name will be displayed when hovering over a data point.
- `marker=dict(showscale=True)`: This parameter configures the marker’s appearance. Setting `showscale` to `True` displays a color scale alongside the plot, which indicates the range of values represented by the marker color.
3. `fig.update_traces(marker_line_width=2, marker_size=10)`: This line updates the appearance of the scatter trace added to the figure. The `update_traces()` method allows modifying various properties of the traces in the figure. Here, `marker_line_width` is set to 2, which increases the thickness of the marker outline, and `marker_size` is set to 10, which increases the size of the markers.
Overall, the code generates a scatter plot using the `plotly.graph_objects` module. The plot displays sepal width on the x-axis, sepal length on the y-axis, and the color of each data point corresponds to the sepal width value. Hovering over a point reveals the species name. The marker size and line width are adjusted to enhance visibility and aesthetics.
Hence, we have styled the Scatter Plot and it looks pretty cool. We have a new type of key to better identify what’s going on.
Let’s say we are working with a massive Scatter Plot. When we are working with massive data, we can use something called ScatterGL.
fig = go.Figure(data=go.Scattergl(
x = np.random.randn(100000),
y = np.random.randn(100000),
mode = 'markers',
marker = dict(
color = np.random.randn(100000),
colorscale = 'Viridis',
line_width = 1)))
fig
In the given code snippet, the `go` module from the `plotly.graph_objects` library is being used to create a scatter plot. The `numpy` library (`np` alias) is also utilized. Let’s break down the code step by step:
1. `fig = go.Figure(data=go.Scattergl(…))`: This line creates a new figure object using the `Figure` class from `plotly.graph_objects` and specifies the data to be plotted as a scatter plot using the `Scattergl` class. Here’s an explanation of the parameters used within the `go.Scattergl()` function:
- `x = np.random.randn(100000)` and `y = np.random.randn(100000)`: These parameters specify the randomly generated data points to be plotted on the x-axis and y-axis, respectively. `np.random.randn(100000)` generates an array of 100,000 random numbers following a standard normal distribution.
- `mode = ‘markers’`: This parameter specifies that the plot should be represented as individual markers rather than connected lines.
- `marker = dict(…)`: This parameter configures the markers’ appearance. Here are the details of the properties within the `marker` dictionary:
- `color = np.random.randn(100000)`: This property determines the color of each data point. `np.random.randn(100000)` generates an array of 100,000 random numbers following a standard normal distribution, which will be used to assign colors to the markers.
- `colorscale = ‘Viridis’`: This property specifies the color scale to be used for mapping the marker colors. In this case, the ‘Viridis’ color scale is chosen, which ranges from deep blue to vibrant yellow.
- `line_width = 1`: This property sets the width of the marker outlines to 1, resulting in a thin line around each marker.
2. `fig`: This line simply displays the figure object, which will render the scatter plot in the output.
Overall, the code generates a scatter plot with 100,000 randomly generated data points. The x-axis and y-axis values are randomly generated using a standard normal distribution. Each data point is represented by a marker, and the marker colors are determined by another set of random numbers. The ‘Viridis’ color scale is used to map the colors. The markers have a thin outline with a line width of 1.
That’s all about Scatter Plots. Now, let’s do Pie Charts.
Let’s do a complicated Pie Chart to chart out the largest nations in Asia:
df_asia = px.data.gapminder().query("year == 2007").query("continent == 'Asia'")
px.pie(df_asia, values='pop', names='country', title='Population of Asian Continent', color_discrete_sequence=px.colors.sequential.RdBu)
In the given code snippet, the `px` module from the `plotly.express` library is being used to create a pie chart. Let’s break down the code step by step:
1. `df_asia = px.data.gapminder().query(“year == 2007”).query(“continent == ‘Asia’”)`: This line loads the gapminder dataset from the `px.data` module, filters it to include only the data for the year 2007, and further filters it to include only the data for the ‘Asia’ continent. The resulting filtered data is assigned to the variable `df_asia`.
2. `px.pie(df_asia, values=’pop’, names=’country’, title=’Population of Asian Continent’, color_discrete_sequence=px.colors.sequential.RdBu)`: This line creates a pie chart using the `pie` function from `plotly.express`. Here’s an explanation of the parameters used:
- `df_asia`: The first parameter is the DataFrame containing the data to be plotted, which is `df_asia` in this case.
- `values=’pop’` and `names=’country’`: These parameters specify the columns from the DataFrame to be used for the values and names of the pie chart slices, respectively. In this case, the ‘pop’ column is used as the values, representing the population, and the ‘country’ column is used as the names, representing the countries in Asia.
- `title=’Population of Asian Continent’`: This parameter sets the title of the pie chart to ‘Population of Asian Continent’.
- `color_discrete_sequence=px.colors.sequential.RdBu`: This parameter determines the color sequence used for the pie chart slices. The `px.colors.sequential.RdBu` color sequence is used, which ranges from red to blue, providing a distinct color for each pie slice.
If we go on to https://plotly.com/python/builtin-colorscales/, we can find out all sorts of coloring we can use in Plotly
Overall, the code generates a pie chart representing the population of Asian countries in the year 2007. Each slice of the pie chart corresponds to a country, with the size of the slice representing the population of that country. The chart is titled ‘Population of Asian Continent’, and the colors of the slices vary using the red-to-blue color scale.
Let’s now customize a brand new Pie Chart. Let’s make a pie chart with random Pokemon data and pulling out some specific values from the Pie Chart:
colors = ['blue', 'green', 'black', 'purple', 'red', 'brown']
fig = go.Figure(data = [go.Pie(labels = ['Water', 'Grass', 'Normal', 'Psychic', 'Fire', 'Ground'], values = [110, 90, 80, 80, 70, 60])])
fig.update_traces(hoverinfo = 'label+percent', textfont_size = 20, textinfo = 'label+percent', pull = [0.1, 0, 0.2, 0 , 0, 0], marker = dict(colors = colors, line = dict(color = '#FFFFFF', width = 2)))
fig
In the given code snippet, the `go` module from the `plotly.graph_objects` library is being used to create a pie chart. Let’s break down the code step by step:
1. `colors = [‘blue’, ‘green’, ‘black’, ‘purple’, ‘red’, ‘brown’]`: This line defines a list of colors. Each color in the list corresponds to a specific category in the pie chart.
2. `fig = go.Figure(data=[go.Pie(labels=[‘Water’, ‘Grass’, ‘Normal’, ‘Psychic’, ‘Fire’, ‘Ground’], values=[110, 90, 80, 80, 70, 60])])`: This line creates a new figure object using the `Figure` class from `plotly.graph_objects` and specifies the data to be plotted as a pie chart using the `Pie` class. Here’s an explanation of the parameters used within the `go.Pie()` function:
- `labels=[‘Water’, ‘Grass’, ‘Normal’, ‘Psychic’, ‘Fire’, ‘Ground’]`: This parameter specifies the labels or names of the pie chart slices. Each label corresponds to a category or type of data.
- `values=[110, 90, 80, 80, 70, 60]`: This parameter specifies the values or sizes of the pie chart slices. Each value represents the magnitude or proportion of each category.
3. `fig.update_traces(…)`: This line updates the appearance and behavior of the pie chart traces in the figure. The `update_traces()` method allows modifying various properties of the traces. Here’s an explanation of the properties being modified:
- `hoverinfo=’label+percent’`: This property configures the information displayed when hovering over a pie slice. It shows the label and the percentage of the total represented by each slice.
- `textfont_size=20`: This property sets the font size of the text displayed inside the pie chart slices to 20.
- `textinfo=’label+percent’`: This property determines the text displayed inside the pie chart slices. It shows the label and the percentage of the total represented by each slice.
- `pull=[0.1, 0, 0.2, 0, 0, 0]`: This property controls the amount of separation or “pull” of each slice from the center of the pie chart. The values in the list represent the pull amount for each slice. In this case, the first slice (‘Water’) is slightly pulled, the third slice (‘Normal’) is pulled more, and the rest have no pull.
- `marker=dict(colors=colors, line=dict(color=’#FFFFFF’, width=2))`: This property configures the marker’s appearance. Here are the details of the properties within the `marker` dictionary:
- `colors=colors`: This property sets the colors of the pie chart slices. The colors are defined in the `colors` list defined earlier.
- `line=dict(color=’#FFFFFF’, width=2)`: This property sets the color and width of the marker outlines. The outlines are set to white (‘#FFFFFF’) with a width of 2.
4. `fig`: This line simply displays the figure object, which will render the pie chart in the output.
Overall, the code generates a pie chart with labeled slices representing different categories. The sizes of the slices are determined by the values provided. The appearance of the pie chart is customized by specifying colors for the slices, setting the font size and text information, adjusting the pull of certain slices, and defining the marker outline color and width.
You can find the complete notebook here.