In the context of time-series forecasting, the notion of backtesting refers to the process of assessing the accuracy of a forecasting method using existing historical data. The process is typically iterative and repeated over multiple dates present in the historical data. Backtesting is used to estimate the expected future accuracy of a forecasting method, which is useful to assess which forecasting model should be considered as most accurate.
How backtesting works
The backtesting process starts by selecting a list of threshold dates within a time span covered by the historical data. In the illustration below, the thresholds are noted T1, T2, T3 and T4.
Then, for each threshold,
- the historical data is truncated at the threshold,
- the forecasting model is trained and applied on the truncated data,
- the forecasts are compared with the original untruncated data.
Finally, an average forecast error is established over all the thresholds.
This averaged error can be interpreted as an estimation of the error that will be associated with the model when producing true forecasts (about the future). Choosing the proper set of thresholds typically involve some know-how related to the problem at hand. As a rule of thumb, increasing the number of thresholds typically improves the resilience of the process against Overfitting problems.
A common mistake: learn once, forecast many
Backtesting is typically fairly intensive in terms of computing resources, as a new forecasting model has to be trained for each threshold. As a result, we routinely observe practitioners who train the forecasting model only once, typically leveraging the whole range of historical data, and then proceed with backtesting iterations. The perceived benefit of this approach is typically a massive speed-up of backtesting.
However, such trick is misguided and lead to significant overfitting problems. Indeed, implicitly, as future data is made available to the forecasting model, whatever variable estimation takes place during the learning phase will cause the model to embed some information about this future. As a result, the accuracy measured from the backtests will not reflect the generalization capabilities of the model, but rather its memory capabilities, that is, the capacity of the model to reproduce identical situations found in the training dataset
Lokad Gotcha
Backtesting is at the core of forecasting technology of Lokad. We use it for each time-series to select which model will be used to deliver the final forecast. However the simple backtesting vision presented in this article is not suitable for all situations found in retail and manufacturing. For example, for newly launched products, the time-series might be too short to perform any significant backtesting. Promotions and product launches also require dedicated approaches.