Road map……..

  • Why do we need to Compare the Frequency Distribution?
  • Grouped Bar Plots
  • Kernel Density Estimate Plots
  • Strip Plots
  • Box Plots

Here our journey starts

Familiarize with Dataset

Throughout this article, we are using the wnba.csv dataset. The Women’s National Basketball Association (WNBA) is the professional basketball league in the USA. It has currently composed twelve teams. In our dataset, we have stats from all games of season 2016–2017. The dataset has 143 rows and 32 columns. The overview of a dataset is given below.

Prior knowledge of frequency distribution and visualization

To have a better insight into the necessity of comparing the frequency distribution, you need to have prior knowledge of frequency distribution and its visualization. If you haven’t any idea about it, you may read out my previous articles on frequency distribution and visualization.

Why do we need to Compare the Frequency Distribution?

For better explanation, we will use the wnba.csv dataset so that you can learn with a real-world example.

At first, we try to represent the experience column into Exper_ordianl column which variable measured in ordinal scale. In the below table, we try to describe the level of experience of players according to the following labeling convention:

Photo by author

Now, we are highly interested to know about the distribution of the ‘Pos’(Player position) variable with the level of experience. For example, we want to compare among the positions of experienced, very experienced, and veteran players.

We have used the below code to convert the experience of players according to the above labeling convention.

Output:

Now we try to segment the dataset according to the level of experience. Then we generate frequency distribution for each segment of the dataset. Finally, we try to have a comparative analysis of the frequency distribution.

Output:

The example shows that it’s a bit tricky to compare the distribution of multiple variables. Sometimes you have represented data in front of a non-technical audience. Understanding the above scenario is so difficult for the non-technical audience. Graphical representation is the best way to present our findings to a non-technical audience. In this article, we’ll discuss three kinds of graphs to compare the frequency of different variables. The following graphs will help us to get our job done —

(i)Grouped Bar Plots

(ii)Kernal Density Plot

(iii)Box Plot

Grouped Bar Plots

A grouped bar plot (aka clustered bar…

Continue reading: https://towardsdatascience.com/compare-multiple-frequency-distributions-to-extract-valuable-information-from-a-dataset-10cba801f07b?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com