Forecasting is a powerful way to guide decision making by seeing where metrics are trending into the future. Forecasting can be run at yearly, quarterly, monthly or daily levels of granularity. In the screenshot below you can see that it is easy to turn on a forecast for a chart and then see the results projected onto the chart in real time.
Storyboard forecasting currently applies to line charts where time is on the pivoted dimensions of the query. [Over time the forecasting capability will be extended to other visualizations.]
Storyboard designers can also set this to be a default behaviour for charts in the Storyboard design settings for a given chart. This means that whenever a particular Storyboard is opened this chart will automatically include a forecast.
One nice feature we added to forecast charts is a slightly different interaction for the forecast data points. Like regular data points on a chart you can click a forecast data pint, but you will see a slightly different pop-up window.
The pop-up window includes a brief description of the forecast data with definitions for the confidence interval (shaded area) etc. and at the bottom a hyperlink to this help document.
If the latest time period in your chart is incomplete (e.g. if your chart is at the yearly level and you are part way through the way), the forecast algorithm will automatically exclude the latest data point and generate a new forecasted data point using the complete historical trend data to improve forecast performance.
Sometimes the forecast option won't be available or a forecast won't run if there is not enough data, or the data structure isn't suitable for forecasting. e.g. forecasting needs more than 3 data points and the time periods need to be even and consistent, i.e. all months, or all quarters, not a mix of both. You should also use a continuous time selection, e.g. all months for the last 2 years, not just a few months from one year and different months from another year.
Forecasting has been developed for most standard time dimensions, but if you come across a dimension that doesn't work with forecasting then let us know.
The Forecast Algorithm
The embedded forecast within One Model Storyboards is based on an ARIMA Model.
ARIMA(p,d,q) forecasting equation: ARIMA models are, in theory, the most general class of models for forecasting a time series which can be made to be "stationary" by differencing (if necessary), perhaps in conjunction with nonlinear transformations such as logging or deflating (if necessary). A random variable that is a time series is stationary if its statistical properties are all constant over time. A stationary series has no trend, its variations around its mean have a constant amplitude, and it wiggles in a consistent fashion, i.e., its short-term random time patterns always look the same in a statistical sense. The latter condition means that its autocorrelations (correlations with its own prior deviations from the mean) remain constant over time, or equivalently, that its power spectrum remains constant over time. A random variable of this form can be viewed (as usual) as a combination of signal and noise, and the signal (if one is apparent) could be a pattern of fast or slow mean reversion, or sinusoidal oscillation, or rapid alternation in sign, and it could also have a seasonal component. An ARIMA model can be viewed as a "filter" that tries to separate the signal from the noise, and the signal is then extrapolated into the future to obtain forecasts.
The ARIMA forecasting equation for a stationary time series is a linear (i.e., regression-type) equation in which the predictors consist of lags of the dependent variable and/or lags of the forecast errors. That is:
Predicted value of Y = a constant and/or a weighted sum of one or more recent values of Y and/or a weighted sum of one or more recent values of the errors.
If the predictors consist only of lagged values of Y, it is a pure autoregressive ("self-regressed") model, which is just a special case of a regression model and which could be fitted with standard regression software. For example, a first-order autoregressive ("AR(1)") model for Y is a simple regression model in which the independent variable is just Y lagged by one period (LAG(Y,1) in Statgraphics or Y_LAG1 in RegressIt). If some of the predictors are lags of the errors, an ARIMA model it is NOT a linear regression model, because there is no way to specify "last period's error" as an independent variable: the errors must be computed on a period-to-period basis when the model is fitted to the data. From a technical standpoint, the problem with using lagged errors as predictors is that the model's predictions are not linear functions of the coefficients, even though they are linear functions of the past data. So, coefficients in ARIMA models that include lagged errors must be estimated by nonlinear optimization methods ("hill-climbing") rather than by just solving a system of equations.
The acronym ARIMA stands for Auto-Regressive Integrated Moving Average. Lags of the stationarized series in the forecasting equation are called "autoregressive" terms, lags of the forecast errors are called "moving average" terms, and a time series which needs to be differenced to be made stationary is said to be an "integrated" version of a stationary series. Random-walk and random-trend models, autoregressive models, and exponential smoothing models are all special cases of ARIMA models.
A nonseasonal ARIMA model is classified as an "ARIMA(p,d,q)" model, where:
p is the number of autoregressive terms,
d is the number of nonseasonal differences needed for stationarity, and
q is the number of lagged forecast errors in the prediction equation.
The forecasting equation is constructed as follows. First, let y denote the dth difference of Y, which means:
If d=0: yt = Yt
If d=1: yt = Yt - Yt-1
If d=2: yt = (Yt - Yt-1) - (Yt-1 - Yt-2) = Yt - 2Yt-1 + Yt-2
Note that the second difference of Y (the d=2 case) is not the difference from 2 periods ago. Rather, it is the first-difference-of-the-first difference, which is the discrete analog of a second derivative, i.e., the local acceleration of the series rather than its local trend.
In terms of y, the general forecasting equation is:
ŷt = μ + ϕ1 yt-1 +…+ ϕp yt-p - θ1et-1 -…- θqet-q
Here the moving average parameters (θ's) are defined so that their signs are negative in the equation, following the convention introduced by Box and Jenkins. Some authors and software (including the R programming language) define them so that they have plus signs instead. When actual numbers are plugged into the equation, there is no ambiguity, but it's important to know which convention your software uses when you are reading the output. Often the parameters are denoted there by AR(1), AR(2), …, and MA(1), MA(2), … etc..
To identify the appropriate ARIMA model for Y, you begin by determining the order of differencing (d) needing to stationarize the series and remove the gross features of seasonality, perhaps in conjunction with a variance-stabilizing transformation such as logging or deflating. If you stop at this point and predict that the differenced series is constant, you have merely fitted a random walk or random trend model. However, the stationarized series may still have autocorrelated errors, suggesting that some number of AR terms (p ≥ 1) and/or some number MA terms (q ≥ 1) are also needed in the forecasting equation.
The process of determining the values of p, d, and q that are best for a given time series will be discussed in later sections of the notes (whose links are at the top of this page), but a preview of some of the types of nonseasonal ARIMA models that are commonly encountered is given below.