Introduction
Most data cleaning of surveys analyzed in Q is performed in one or more of:
- The data collection program.
- A text editor.
- Excel.
- SPSS.
- Q.
If working with Q, it is generally most efficient to only perform the cleaning in the data collection program and/or Q. That is, in the vast majority of instances, it is inadvisable to perform any cleaning in a text editor, Excel or SPSS. Where the data collection program has no internal tools for data cleaning, it is generally best to do all the data cleaning in Q.
Note that this issue is entirely about efficiency and quality control. From a technical perspective, there is no reason that you cannot perform the data cleaning in a text editor, Excel or SPSS.
Problems with performing data cleaning in text editors, Excel and SPSS
- It is time consuming. Data cleaning operations performed in text editors, Excel and SPSS, are performed manually. Even if using syntax or macros, the user still needs to manually modify the syntax/macros for specific projects.
- Difficulty of repeating. Where a multiple data files need to be extract from the same project, the time consuming processes need to be repeated. Or, users need to take the time to create syntax and macros and review these to address any modifications in the data collection processes.
- Voluntary documentation. Any documentation that exists needs to be manually created by whoever is performing the data cleaning. If the person performing the cleaning is in a rush, lazy, or error prone, there will be inadequate documentation.
- Lack of transparency. When cleaning the data in text editors, Excel and SPSS, changes are made in the actual data itself, and it is impossible for whoever is using the data to review what has been done, without going back to the original data.
The net effect of all of these is that the data cleaning process is inevitably either error prone or very time consuming.
Benefits of performing data cleaning in Q
- When importing data files into Q, Q automatically examines the data file, attempts to identify the data collection program used to create the file and automatically performs various rudimentary data cleaning tasks known to be applicable to that data collection program (e.g., fixing labels, identify question types, fixing missing values problems peculiar to specific data collection programs).
- Additional QScripts have been developed specifically for data cleaning purposes. For example, there are QScripts for identifying and removing outliers, creating tables showing don't knows, reversing scales, capping, and identifying flat-lining. See Data Cleaning QScripts.
- You can create your own QScripts, tailoring them specifically to your needs, either from scratch or by modifying an existing QScript (instructions for modifying QScripts can be found on each QScript).
- When updating data files, all data cleaning will automatically be reapplied to both existing and new respondents. See How to Update Projects with New or Revised Data.
- As all changes are stored within Q, you can always audit all the data cleaning and return any data to its original state. See Tools for Auditing Projects.
- In situations where you do not want users to review the cleaning, you can instead adopt a two step process which retains all the other benefits except for ease of auditing. This works as follows:
- You create one Q Project and do all the cleaning in it.
- You create a new cleaned SPSS data file, which you provide to the end-users. See Converting Other Files Types into SPSS or CSV Data Files.
Next
Basic Workflow For Checking and Cleaning a Project