An efficient way of extracting the descriptive statistics for a dataset from a matplotlib boxplot

From Unsplash

A boxplot is a type of visualization used for displaying the five-number set of descriptive statistics for a dataset: the minimum and maximum (excluding the outliers), the median, the first (Q1) and third (Q3) quartiles. In Python, boxplots can be created in various data visualization libraries including the most basic one — matplotlib.

While the main scope of a boxplot is to visualize the statistical information about a dataset, what if we also need to extract and print out the exact numbers of such statistics? In this article, we’ll discuss the easiest way of doing so in the matplotlib library.

To start with, let’s create 3 dummy datasets and display the boxplots for them in matplotlib. To be able to further extract the necessary values, though, we have to assign the result of the plt.boxplot() method to a variable (bp):

import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1)
data_1 = np.random.normal(50, 30, 300)
data_2 = np.random.normal(100, 40, 300)
data_3 = np.random.normal(70, 10, 300)
data = [data_1, data_2, data_3]
bp = plt.boxplot(data)
plt.show()
Image by Author

The resulting variable bp is a Python dictionary:

type(bp)Output:
dict

with the following keys representing the main elements of a boxplot:

bp.keys()Output:
dict_keys(['whiskers', 'caps', 'boxes', 'medians', 'fliers', 'means'])

Here is the dictionary itself:

bpOutput:
{'whiskers': [<matplotlib.lines.Line2D at 0x1eaf6131b50>,
<matplotlib.lines.Line2D at 0x1eaf6131eb0>,
<matplotlib.lines.Line2D at 0x1eaf61533a0>,
<matplotlib.lines.Line2D at 0x1eaf6153700>,
<matplotlib.lines.Line2D at 0x1eaf6162b80>,
<matplotlib.lines.Line2D at 0x1eaf6162ee0>],
'caps': [<matplotlib.lines.Line2D at 0x1eaf614a250>,
<matplotlib.lines.Line2D at 0x1eaf614a5b0>,
<matplotlib.lines.Line2D at 0x1eaf6153a60>,
<matplotlib.lines.Line2D at 0x1eaf6153dc0>,
<matplotlib.lines.Line2D at 0x1eaf616d280>,
<matplotlib.lines.Line2D at 0x1eaf616d5e0>],
'boxes': [<matplotlib.lines.Line2D at 0x1eaf61317f0>,
<matplotlib.lines.Line2D at 0x1eaf6153040>,
<matplotlib.lines.Line2D at 0x1eaf6162820>],
'medians': [<matplotlib.lines.Line2D at 0x1eaf614a910>,
<matplotlib.lines.Line2D at 0x1eaf6162160>,
<matplotlib.lines.Line2D at 0x1eaf616d940>],
'fliers': [<matplotlib.lines.Line2D at 0x1eaf614ac70>,
<matplotlib.lines.Line2D at 0x1eaf61624c0>,
<matplotlib.lines.Line2D at 0x1eaf616dca0>],
'means': []}

We see that the values of the dictionary are…

Continue reading: https://towardsdatascience.com/how-to-fetch-the-exact-values-from-a-boxplot-python-8b8a648fc813?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com