Bootstrapping is a resampling method that allows us to infer statistics about a population from a sample. It is also easy to perform and understand, which makes it so darn cool. Practitioners who use bootstrap or fully appreciate its potential know that they can use it to estimate various population statistics, yet nearly all examples I could find online only use bootstrapping to estimate the population’s mean. I think it’s time to change that.
In this short article, I will review the bootstrap method and how to execute it in python. Then we’ll estimate the confidence intervals for the population’s standard deviation using this method to alleviate any confusion around how to bootstrap population statistics other than the mean from a sample. We’ll do a little visualization to understand better what we learned and experiment with drawing a larger number of samples to see how this affects the outcome.
Let’s dive in.
If you prefer, you can follow along in the Jupyter Notebook here.
Start by importing all the packages we will need.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as st
Now, let’s generate a fictitious “population.” I made up the mean and standard deviation. You can make up your own if you wish.
# generate a ficticious population with 1 million values
pop_mean = 53.21
pop_std = 4.23
population = np.random.normal(pop_mean, pop_std, 10**6)# plot the population
We’ve now created a “population” with one million values, a mean of 52.21, and a standard deviation of 4.23.
Draw a sample
We want to draw a small sample from the population to use for bootstrapping an approximation of the population parameters. In practice, we would only have the sample.
# Draw 30 random values from the population
sample = np.random.choice(population, size=30, replace=False)
sample now contains 30 randomly drawn values from the population.
I’m going to go quickly here. If you want a more in-depth look at the bootstrap method, check out my previous article Estimating Future Online Event Donation Revenue for Musicians and Nonprofits — Bootstrap estimation of confidence intervals with python.
All of the magic with bootstrapping happens as a result of sampling with replacement. Replacement means that when we draw a sample, we record that number then return that number to the source so that it has…
Continue reading: https://towardsdatascience.com/bootstrapping-the-standard-deviation-fb415a9d7f39?source=rss—-7f60cf5620c9—4