Principal Components Analysis (PCA) is a technique for taking many variables and creating a new, smaller set of variables which aims to capture as much of the variation in the data as possible. Principle Components Analysis is a data reduction technique, often used as a preliminary to Regression or Cluster Analysis. This article describes how run a Principal Component Analysis in Q.
- A data set containing several Numeric variables that you want to combine and reduce down to a smaller number of variables or components. For this example, we'll use a series of attitudinal questions about mobile device attributes on a 1-5 scale where 1="Strongly agree" and 5="Strongly disagree".
- Select Create > Dimension Reduction > Principal Components Analysis.
- In the Object Inspector on the right side of the screen, choose the variables that you want to analyze in the Variables box.
- Tick Automatic, which ensures the PCA will remain up to date when the data changes or when you change the settings.
4. Click Calculate to run the PCA analysis.
The output from the PCA is what is known as a “loadings table”. This table shows one row for each of my original mobile-phone statement variables (there are 23). Each of the 8 new variables identified by the PCA appears in the columns. The cells of the table show figures referred to as “loadings”.
The results are as follows:
Determining the number of components
In the analysis above, the PCA automatically generated 8 variables. It did this using a heuristic know as the “Kaiser rule”, an option in the Rule for selecting components section. This is a commonly-used rule, but you can also choose to use two different methods:
- Number of components. Choose this option if you want to choose the number of components to keep.
- Eigenvalues over. Eigenvalues are numbers associated with each component, and these are listed at the top of each column. This setting lets you specify the cut-off value for components.
The analysis above used a technique called a Varimax rotation, Q’s default option in the Rotation method drop-down. The concept of the rotation can be a bit abstract to talk about without getting into the mathematics of the technique. Putting it simply, the PCA problem can have an infinite number of solutions which all capture the same amount of variation in the data. The rotation tries to find which of those many solutions is the easiest to write down an interpretation for, by writing them in a way so that as many loadings are close to zero (or to a value of 1) as possible.
If you have a favorite rotation method to use then the menu contains several other options. They are all described in mathematical terms, so discussing them here would not add much value if you don’t already have a preferred technique. In my experience, the Varimax seems to be the most popular.
To use the results of the PCA in another analysis you need to save the variables into your data set. To do so:
- Have your PCA output selected on the screen.
- Click Create > Dimension Reduction > Save Variables. This will add the variables and show them in a table.
- (Optional) Right-click on the row labels in the table and Rename them, to make the components more recognizable.
Q will show a new table of your components. The table will be full of 0’s, indicating that the average score of each is zero. Don’t be alarmed! This occurs because the variables are standardized – with a mean of zero and a standard deviation of 1 – which is the standard technique. If you create a cross-tab with another question, then the variation between variables will become more apparent. For instance, I renamed my components and created a table with the Age groups from the study:
Rather unsurprisingly, the younger people have higher scores on the “Wanting technology” and “Cost-sensitivity” components, and a much lower score on the “Only used the basics” component.
These new variables can be used just like any other in Q. Once you are happy with your new components, go back to the PCA output and untick the Automatic box. This will prevent any changes to the components. If you modify your PCA later on and change the number of components in the solution, you should delete the saved variables and run Create > Dimension Reduction > Save Variables again.
How to do Latent Class Regression
Article is closed for comments.