This article describes how to troubleshoot issues in your data's meta data, which will ensure your data is properly analyzed by Q. You can also try to identify certain issues proactively by running our automation to Check for Errors in Data File Construction. You should use this article to look into issues when:
- Variable labels and groupings appear off
- Variables are missing codes/possible responses
- Multiple response questions show incorrect percentages
- Other tables show percentages or averages that appear off
- Variable is given a Text structure on import
- Tables show percentages when they should be an average and vice versa
- Merged data has inconsistencies
- Understanding of what metadata.
- A Data Set loaded into your document is the best possible data file you can get.
- If possible, the data file is set up using the guidelines in either How to Format an SPSS File for Use in Q or How to Use Excel and CSV Files in Q.
Variable labels and groupings appear off
Q uses a few different things to group together variables and label variables appropriately.
- Variable Names - If using a file with metadata, variables in the same set should have names with bits that are similar and sequential, such as Q1_1, Q1_2, Q1_3.
- Variable Labels - If using a csv or Excel file, these are the headers in your first row. Variable labels should contain both the question (which will become the Question Name) and the option label (which will become the individual variable label and be used in tables). Commonly these are separated by a special character like -, :, etc. For example: Q1. Flavor rating - Coke, Q1. Flavor rating - Pepsi, Q1. Flavor rating - RC Cola. The example extends to grid-style questions and also requires a consistent order: Glasses drank - Out - Coke, Glasses drank - Home - Coke, Glasses drank - Out - Pepsi, Glasses drank - Home - Pepsi. The part of the label that matches EXACTLY across all variables in the question will become the Question Name and the remaining part that is unique will become the individual variable labels.
- Value Attributes - Possible codes/responses for all variables would ideally match and contain the complete code frame for a question, regardless of whether respondents selected a response. If using a csv or Excel file, you can add dummy respondents to fill in for responses that were shown but not selected in the survey, and then delete those dummy respondents after importing.
- Question Type - Applicable to files with metadata. All Multiple Response Sets created in SPSS are honored in Q, which may cause Extra Variables Not in the Raw Data if a variable is in more than one Multiple Response Set. If a variable is formatted as a numeric variable with a code frame but others are string variables, they will not be grouped together. These types of things would need to be fixed either in the file before importing into Q or manually reformatted/combined within Q after import.
Any one of the above may impact how variables are grouped together after importing a data file. Even if things aren't grouped or labeled accordingly on import and you can't fix the issue with the raw data file, you can always Set Question (to combine), Rename, and change the Question Type of variables in Q . There are some automations that can help with this as well such as: How to Suggest Better Variable Names from Source Labels and How to Remove Truncated Text from Variable Labels.
Variables are missing codes/possible responses
This occurs most often when using a data file without metadata, as Q only knows the codes for categories or choices that were selected by respondents. If using a file with metadata, this should be fixed by your data provider or using software that can edit the file before importing it into Q. For either file type, you can still add a code to the value attributes manually in Q after import, but it does so by creating a new constructed variable, which can impact performance if done across 50+ variables. To fix this in Q, see How to Add Empty Categories to a Question or contact support for a quick way to add a category using back-coding.
Multiple response questions show incorrect percentages
Make sure your Value Attributes are set up appropriately, see How to set Value Attributes for a Pick Any and Pick Any-Grid. If your NET is not 100%, some respondents didn't select any options and you can Add a None of These Option to the question. Or if you'd like the percentages shown in the table to be based on those who selected an option in the question, you can How to Rebase Questions. If those solutions don't resolve the incorrect percentages, see other things to look into What To Do When a Table Looks Wrong.
Other tables show percentages or averages that appear off
Confirm the missing data settings and underlying values that are used in the average in the Value Attributes. Proceed through other things to look into What To Do When a Table Looks Wrong.
Variable is given a Text structure on import
This will be because the metadata says it is in a String or Character format or because you are using a data file without metadata and Q doesn't pick up on the fact the responses are categories or numbers rather than text. You just need to change the Question Type of the variable to be able to analyze it as it should be. Keep in mind, if changing to a Number type, change the Question Type directly from the text variable and NOT a Pick One version of the variable which will assign values in sequential order.
Tables show percentages when they should be an average and vice versa
Tables using questions with a Number Question Type will show averages, while Pick One and Pick Anys will show percentages for proportions. You can also Calculate an Average Value from Categorical Data in Q.
Merged data has inconsistencies
If working with metadata issues with merged data or data from a tracker, see the Fixing Errors section in Tracking Study Best Practices.
How to Use Scripts to Automate Data Checking and Cleaning
Article is closed for comments.