A walkthrough of permutation tests and how they can be applied to time series data.
Permutation tests are non-parametric tests that require very few assumptions. So, when you don’t know much about your data generating mechanism (the population), permutation tests are an effective way to determine statistical significance.
A recent paper published by researchers at Stanford extends the permutation testing framework to time series data, an area where permutation tests are often invalid. The method is very mathy and brand new, so there’s little support and no python/R libraries. However, it’s pretty efficient and can be implemented at scale.
In this post we will discuss the basics of permutation tests and briefly outline the time series method.
Let’s dive in.
Permutation tests are non-parametric tests that solely rely on the assumption of exchangeability.
To get a p-value, we randomly sample (without replacement) possible permutations of our variable of interest. The p-value is the proportion of samples that have a test statistic larger than that of our observed data.
Time series data is rarely exchangeable. To account for the lack of exchangeability, we divide our test statistic by an estimate of the standard error, thereby converting out test statistic to a t-statistic. This “studentization” process allows us to run autocorrelation tests on non-exchangeable data.
Let’s slow down a bit and really understand permutation tests…
Permutation Tests 101
Permutation tests are very simple, but surprisingly powerful.
The purpose of a permutation test is to estimate the population distribution, the distribution where our observations came from. From there, we can determine how rare our observed values are relative to the population.
In figure 2, we see a graphical representation of a permutation test. There are 5 observations, represented by each row, and two columns of interest, Risk and Deaths.
First, we develop many permutations of our variable of interest, labeled P1, P2, …, P120. At the end of this step, we’ll have a large number of theoretical draws from our population. Those draws are then combined to estimate the population distribution.
Note that we will never see a…