Time series forecasting is a very fascinating task. However, build a machine-learning algorithm to predict future data is trickier than expected. The hardest thing to handle is the temporal dependency present in the data. By their nature, time-series data are subject to shifts. This may result in temporal drifts of various kinds which may become our algorithm inaccurate.

One of the best tips I recommend, when modeling a time series problem, is to stay simple. Most of the time the simpler solutions are the best ones in terms of accuracy and adaptability. They are also easier to maintain or embed and more persistent to possible data shifts. In this sense, the gold standard for time series modeling consists in the adoption of linear-based algorithms. They require few assumptions and simple data manipulations to produce satisfactory results.

In this post, we carry out a sales forecasting task. We provide future forecasts using the standard linear regression and an improved version of it. We are referring to linear trees. They belong to the family of model trees, like classical decision trees, but are different because they compute linear approximations (instead of constant ones) fitting simple linear models in the leaves. The training is computed evaluating the best partitions on the data fitting multiple linear models. The final model is a tree-based structure with linear models in the leaves.

A pythonic implementation of linear trees is available in : a python library to build Model Trees with Linear Models at the leaves. The package is fully integrable with sklearn. It provides simple BaseEstimators that wrap every linear model present in sklearn.linear_model to build an optimal linear tree.

For our experiment, we simulate some data that replicate store sales. We generate artificial sales history of 400 stores. Where store sales are influenced by many factors, including promotions, holidays, and seasonality. Our duty is to forecast the future daily sales for up to one year in advance.

We don’t use any past information when we provide our predictions. We engineer all our regressors to be accurate and accessible in any future time period. This enables us to provide long-time forecasts.

We have 400 stores with historical sales from different years with visible daily seasonality.

Series of simulated store sales (image by the author)

We aim to predict sales up to a year ahead for all the stores at our disposal. To make this possible, we build a…

Continue reading: https://towardsdatascience.com/improve-linear-regression-for-time-series-forecasting-e36f3c3e3534?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com