This article describes how to interpret Column n with Missing Data.
The bottom row of the table above shows Column n selected from Statistics - Below. At first glance, the numbers appear to be wrong. There is clearly data in April, but the table shows a Column n of 0, which on face value makes no sense.
The Column n statistic shown in the cells (i.e., from Statistics - Cells) reveals the cause of the problem. The first statement was not asked in April. So its base size (i.e., Column n) is clearly 0. The second statement was asked of 522 people in April. The third statement was asked of 506 people. Thus, in this example, there is clearly no correct base size which accurately reflects all of the data in the first column (i.e., each of 0, 522 and 506 is correct for one and only one row).
In the fourth row the NET is shown. It shows a Column % of NaN and a Column n of 0. The way that this is computed is that it represents all of the people who have data in one or more of the rows above, but excludes all people with missing data. As everybody has missing data for the first row, the base is 0, so it is not possible to compute a percentage. Looking at the Column n in the bottom row, we can see that it is different again. It is 0 in the same place as with the NET row, but has a lower value in June. This is because in this example, some of the rows of the table have been hidden, and the NET has been computed using only the non-hidden rows.
There are a number of "remedies" to this problem:
- Do not show Column n in Statistics - Below on tables that contain missing values. This is the recommended approach, as Column n is misleading with such data.
- Automate > Browse Online Library > Modify Tables or Plots - Show Maximum Column n in Statistics Below.
- Create a Custom Rule using whatever logic is appropriate for computing the column n in your given situation.
How to Recode Missing Values in Q
How to Replace Missing Values with their Average
Article is closed for comments.