I remember the good old college days where we spent weeks analyzing survey data in SPSS. It’s interesting to see how far we came from that point.
Today, we do all of them and a lot more in a single command before you even blink.
That’s a remarkable improvement!
This short article will share three impressive Python libraries for exploratory data analysis (EDA). Not a Python pro? Don’t worry! You can benefit from these tools even if you know nothing about Python.
They could save weeks of your data exploration and improve its quality. Also, you are going to have a lot fewer hair-pulling moments.
The first one is the most popular, then my favorite and the last one is the most flexible. Even if you know these libraries before, the CLI wrapper I introduce in this post may help you use them at lightning speed.
With over 7.7k stars in GitHub, Pandas-Profiling is our list’s most popular exploratory data analysis tool. It’s easy to install, straightforward to use, and impeccable in its results.
You can use either PyPI or Conda to install Pandas-Profiling.
pip install pandas-profiling
# conda install -c conda-forge pandas-profiling
The installation allows you to use the pandas-profiling CLI in your terminal window. Within seconds, it generates an HTML report with tons of analysis about your dataset.
The blink moment: Here’s a demo that shows how it works. We use the popular titanic survivor dataset for our analysis and store it in an HTML file. We then use our favorite browser to open it. Here is a live version you can play around with.
When you open the file or the live link above, it will look like the following.
The variables section is a comprehensive analysis of every variable in your dataset. It includes descriptive statistics, histograms, common and extreme values of the variable.
In the interactions section, you can choose any two variables and create a scatterplot.
It’s a single-page dependency-free web app. You can host it with any static site hosting provider because the generated HTML is a self-contained application.
One of my favorites in this report is the correlation section. It creates a heatmap of correlations of variables. You can choose the type of correlation to use in the heatmap.
Though it has only 1.7k stars on GitHub, Sweetviz fascinates me in many ways. The obvious magnet is the…
Continue reading: https://towardsdatascience.com/how-to-do-a-ton-of-analysis-in-the-blink-of-an-eye-16fa9affce06?source=rss—-7f60cf5620c9—4