How to Do Time Series Analysis in Q

In this article, we explore ways to analyze time series data by smoothing line charts.

Requirements

Your data and table need to use a variable of a Date format structure.

Method

The various smoothing options are all kept within the question, so if you smooth the data in a particular table and use the same questions to construct a time series chart, the same smoothing is automatically applied.

These options control how a data series is smoothed when charted in a time series, or when cross-tabulated with a date question.

All charts and tables using a particular question share the same smoothing parameters (i.e., whatever settings you make are remembered in the question as Value Attributes).

Smoothing different statistics

Smoothing is applied to any statistic that is used to describe a pattern in data, such as averages, percentages, and coefficients. It is not applied to sample size and population statistics (e.g., n, Column Population), nor to statistics used in significance testing (e.g., standard errors, p-values, or the arrows used to denote significance testing).

Significance testing with smoothers

Smoothing does not have any impact on significance tests. The significance testing is always conducted on the unsmoothed data. If a cell on a table is shown as being significant, but seems to contain no data, it is probably because the unsmoothed data is significant, but the smoothed data has not been displayed due to other options (such as End treatment set as Missing).

Buttons, options, and fields

Moving average:

No Computes an average within each Aggregation period (see Setting Time Periods for Date Questions).

Lagged Computes a moving average, also known as a rolling average. For example, if average satisfaction for September, October and November respectively are 6.401, 6.608 and 6.727 the average of these three values, which is the three month moving average, is 6.579. When Lagged is selected, the value of 6.579 will be shown as the value for November.

Uniform (centered) Computes a moving average in the same way as for Lagged except that the value is reported for the center of the time period (e.g., 6.579 would be reported as the average for October).

Tricubic (centered) A weighted average that gives a higher weight towards observations in the middle of the "window" (inverted u-shape). With a three period moving average, this results in the middle period receiving a weight of 1 while the two time periods on either side receive a weight of 0.67. The chart below shows the tricubic weights that occur for 11, 7 and 3 week moving averages (see Cleveland, William. S. 1993. Visualizing Data. Summit, New Jersey: Hobart Press).

Bandwidth The width of the window used to construct the moving averages. If Moving average is selected a Uniform(centered) moving average is computed and the Bandwidth must be an odd number. Use Lagged to use both odd and even numbers and report the value for the final period in the window.

A bandwidth of 1 with monthly data means that one month of data is used (i.e., no moving average is employed).

A bandwidth of 2 takes two months of data (i.e., there is a 2 month window - applies only to lagged).

A bandwidth of 3 takes three months of data (when centered this means the month of the data point, and 1 month either side).

Care must be taken in setting the bandwidth if data has not been recorded accurately (e.g., the first date of every month).

Degree The degree of smoothing to be employed in the local polynomial regression. Where degree is 0, the estimate for a time period is the average of all of the data in the window (this is a weighted average if Tricubic Density is used). Where degree is 1, a linear regression is fitted to the data in the window and the value that is charted is the predicted value for the time period from the regression. Where degree is 2, a quadratic regression is used, where degree is 3 a polynomial regression is used, etc.

Weight periods equally When checked, each time period is given the same weight, irrespective of its sample size. When not checked, each time period is weighted in any computations according to its weighted sample size (i.e., Column Population). If a tricubic moving average is used, the two weights are multiplied together when computing the moving average.

Minimum N Please note that this setting is ignored in versions of Q before Q 4.8. This setting specifies the smallest sample size to use when charting data. When Weight periods equally is checked, nothing will be charted for time periods where the sample size is less than Minimum N. If you want to interpolate values in time periods containing no data, you will need to set Minimum n to 0. When Weight periods equally is not checked, the sample size of the window is used instead.

End treatment:

This setting is not used when the Moving average setting is set to Lagged.
Stretch window Ensures that the Bandwidth is always used, which has the consequence of causing the end-points to be extrapolated to the end of the chart (as a horizontal line if Degree is 0). For example, in the table below, the September three month moving average is computed using the three months of September, October and November. With a moving average, this results in the September and October figures being the same and March and April figures being the same; with more complex types of smoothing, which are discussed later in this chapter, this does not occur (Stretch Window is best used with these more complex forms of smoothing).
Missing Charts no data at the end periods (e.g., if a three period moving average is specified, the NaN is shown whenever there are fewer than three periods).
Standard window Uses the available data (e.g., if a three period moving average is specified, but only two periods are available, the moving average is computed with the data from the two periods). When using a standard moving average (i.e., with Degree of 0, such as in these examples, it is generally most appropriate to set End treatment as Missing, as the other options are biased (i.e., they do not produce the most accurate estimates available given the data).