Time Series analysis has a wide range of applications. While it seems quite easy to just directly apply some of the popular time series analysis frameworks like the ARIMA model, or even the Facebook Prophet model, it is always important to know what is going on behind the function calls. In this post, we are going to focus on the time series analysis with the statsmodels library, and get to know more about the underlying math and concepts behind it. Without further ado, let’s dive in!
In this post, we are going to use the dataset of liquor store retail sales data across the US ranging from 1992 to 2021, which is originally from Kaggle. One of the reasons that I am choosing this dataset is that it covers the Covid time period, which is interesting to see if there are significant impacts on retail sales.
Overview of the time series dataset
Before diving into the relevant functions to describe time series in statsmodels, let’s plot out the data first. When reading in the time series data, it is generally a good idea to set parse_dates=True and set the DateTime column as the index column, as this is the default assumption about the underlying data for most time series function calls.
df = pd.read_csv(‘Retail Sales.csv’,parse_dates=True,index_col=’DATE’)
ax = df[‘Sales’].plot(figsize = (12,6))
ax.autoscale(axis = ‘both’,tight = True)
ax.set(ylabel=’Liquor Store Retail Sales(M)’,xlabel=”Dates”,title=’US Liquor Retail Sales Data (in Millions USD)’);
Here we could see a clear pattern on yearly basis in this time-series data. Generally, we are seeing the liquor sales peaking at the year-end, which is expected since Christmas and New Year is generally the time when people are having gatherings, thus the demands on Liquor go up. Another interesting observation is for the year 2020, the liquor sales start to go up in the first half of the year, which is much earlier than in previous years. This is a bit surprising to me since I thought the sales performance would get hit by the Covid, but it is the other way around.
As the name suggests, the ETS model describes the time series data by decomposing the data into 3 components: trend, seasonality, and errors. Statsmodels library provides a handy function call to separate out these elements, giving a direct view of how…
Continue reading: https://towardsdatascience.com/time-series-analysis-with-statsmodels-12309890539a?source=rss—-7f60cf5620c9—4