Statistics

Which Hypothesis Test should you use? Pearson or Spearman? T-Test or Z-Test? Chi Square? No problem with easy-ht.

One of the main difficulties that a new data scientist may encounter regards statistics basics. In particular, it may be difficult for a data scientist to understand which hypothesis test to use in a specific situation, such as when a Chi Square test can be used, or what is the difference between the Pearson Correlation Coefficient and the Spearman Rank Correlation.

For this reason, I have implemented a Python package, called easy-ht, which permits to perform some statistical tests, such as correlation, normality, randomness, means and so on, without caring about the specific test to be used.

The package will configure itself automatically, according to the provided data.

In this article, I describe the easy-ht Python package, through the following steps:

  • overview of the package
  • example of usage in case of one single input dataset
  • example of usage in case of two input datasets

The easy-ht package can be easily installed through pip:

pip install easy-ht

Its full documentation is available at this link. The package permits to calculate the following hypothesis tests, in the case of one or two datasets:

  • normality — check if samples follow a normal distribution
  • correlation — check if samples are correlated. It can be used only in two samples tests.
  • randomness — check if the sample has been built in a random way.
  • means — in one sample test, compare the sample to an expected value. In two samples test, compare the mean of the two samples.
  • distributions — in one sample test, compare the sample to a distribution. In two samples tests, compare the distributions of the two samples.

In this first example, just one dataset is considered. I generate a random normally distributed dataset. The same procedure can be also applied to generic data.

Firstly I import the needed libraries:

from easy_ht import HypothesisTest
import random
import numpy as np

Then, I Generate data with a normal distribution:

mu, sigma = 0, 0.1
X = np.random.normal(mu, sigma, 100)

Now, I create a HypothersisTest object, which will be used for further analysis. I pass the dataset X as input parameter:

test = HypothesisTest(x = X)

Once created the object, I can run some tests, without caring about the specific test to be used.

value = 50
result = test.compare_means(value = value)
if result:
print("Test is True: There is no difference")
else:
print("Test is...

Continue reading: https://towardsdatascience.com/hypothesis-testing-made-easy-through-the-easy-ht-python-package-2ee395b95fe2?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com