When modelling a time series with a model such as ARIMA, we often pay careful attention to factors such as seasonality, trend, the appropriate time periods to use, among other factors.
However, when it comes to using a machine learning model such as XGBoost to forecast a time series — all common sense seems to go out the window. Rather, we simply load the data into the model in a black-box like fashion and expect it to magically give us accurate output.
A little known secret of time series analysis — not all time series can be forecast, no matter how good the model. Attempting to do so can often lead to spurious or misleading forecasts.
To illustrate this point, let us see how XGBoost (specifically XGBRegressor) varies when it comes to forecasting 1) electricity consumption patterns for the Dublin City Council Civic Offices, Ireland and 2) quarterly condo sales for the Manhattan Valley.
XGBRegressor uses a number of gradient boosted trees (referred to as n_estimators in the model) to predict the value of a dependent variable. This is done through combining decision trees (which individually are weak learners) to form a combined strong learner.
When forecasting a time series, the model uses what is known as a lookback period to forecast for a number of steps forward. For instance, if a lookback period of 1 is used, then the X_train (or independent variable) uses lagged values of the time series regressed against the time series at time t (Y_train) in order to forecast future values.
Forecasting Electricity Consumption
Let’s see how this works using the example of electricity consumption forecasting.
The dataset in question is available from data.gov.ie. From this graph, we can see that a possible short-term seasonal factor could be present in the data, given that we are seeing significant fluctuations in consumption trends on a regular basis.
Let’s use an autocorrelation function to investigate further.
From this autocorrelation function, it is apparent that there is a strong correlation every 7 lags. Intuitively, this makes sense because we would expect that for a commercial building, consumption would peak on a weekday (most likely Monday), with consumption dropping at the weekends.
When forecasting such a time series with XGBRegressor, this means that a value of 7 can be used as the lookback period.