This article describes how to create a correlation matrix that visualizes the correlations between variables as a heat map.
Requirements
A data set containing two or more variables which have a Structure set as Numeric or Numeric- Multi. Note that binary categorical variables can also be used as inputs but will be converted to numeric when calculating the correlation coefficients.
Method
Creating a Correlation Matrix in Q
In Q, the correlation matrix is one of many built-in analytic functions. This particular function relies on an R library to calculate and create the correlation matrix. R is a powerful statistical analysis software and is integrated with Q.
To create a correlation matrix in Q, from the menus, select Create > Correlation > Correlation Matrix. In the Report Tree you will see an object labeled correlation.matrix.
With the correlation.matrix object in the Report Tree selected, the Object Inspector for the correlation matrix will appear to the right. This is where you provide all inputs and options for the correlation matrix.
Data Preparation
In most cases, the variables supplied to the correlation matrix feature should be variables that belong to one or more Number or Number – Multi questions. This ensures that the analysis ‘sees’ the underlying numeric values in the data, rather than categories. Change the question type for the data via the Question Type drop down menu in the Variables and Questions tab. More on changing the Question Type is here. If you need to recode the data before calculating correlations, see How to Recode Numeric Data.
Input Options
There are several built-in input options for the Correlation Matrix as well as various output options.
Input Type – select Variables, Questions or Table.
- Variables: When you select Variables as Input Type a drop-down menu will appear; select the Variables you want to use as inputs to the correlation matrix.
- Questions: When you select Questions as the Input Type a drop-down menu will appear; select the Questions to use as inputs to the correlation matrix. All variables included in a selected multi-variable Question will be used as inputs. You can select multiple questions.
- Table: When you select Table as the Input Type a drop down menu will appear; select a single Q Table to use as the correlation matrix input.
Variable names: when you select Variables as the Input Type a checkbox will appear; if checked, Q will display the variable Name instead of the variable Label in the correlation matrix output.
Missing data dictates how the function deals with missing input values. If set to:
- Error if missing data, the function will return an error if there are any missing input values.
- Exclude cases with missing data the function will run but will exclude any cases where input variables have missing data.
- Use partial data (default option), the function will run with whatever values are present and will ignore any missing input values.
Spearman correlations – if checked, the correlation matrix function will calculate Spearman correlation instead of Pearson correlation which is the default. Use Spearman correlation if your input data is ordinal or continuous.
Show cell values – If set to Yes, Q will display the correlation coefficients in the matrix cells.
Show row/column labels –Shows or removes the labels.
Generating the Correlation Matrix
Once you have selected all the input parameters, click the Calculate button at the top of the Object Inspector to run the correlation matrix function with the provided inputs. You can also check the Automatic checkbox (next to the Calculate button) which will cause the correlation matrix to be rerun whenever any of the inputs are changed.
Using a sample Technology MaxDiff survey data file, the following shows the Pearson correlations between various owned devices. Note that by default, Q will display the correlation matrix as a blue-red scale Heatmap.
A quick inspection of the results suggests that the correlations are reasonable as illustrated by the negative correlation between Other mobile phone ownership and Nokia mobile phone ownership. There is also a relatively high correlation between iPhone ownership with iPad and iPod ownership.
Additional Properties
In Q, the correlation matrix function uses a library specifically designed to generate the Heatmap output. If, however, you prefer to have a table of correlation coefficients, you can create a separate R output and reference the correlation.matrix object coefficient values.
First, create an R output by selecting Create > R Output. To generate an R data frame (table) of the correlation coefficients, enter the following code into the R CODE section of the Object Inspector and click Calculate.
1
|
correlation.matrix$cor |
From our technology example above, the following output is generated.
Other correlations
Finally, tables of correlation coefficients can also be created in Q by selecting Number or Number – Multi questions in both the blue and brown drop-down menu on a table in the Outputs tab. This option has the advantage of being able to display correlations between two different questions.
Next
How to Create a Distance Matrix