The steps described in Basic Workflow For Checking and Cleaning a Project and How to Construct Variables to Make Analysis Easier will be sufficient for most in terms of preparing your data. However, where the data is to be used by groups that are relatively inexperienced in data analysis, or, those under extreme time pressure, it can be useful to tidy your data further.
There are a number of strategies you can use to tidy your data:
Which questions go where
Overall structure
Where the users are very familiar with the questionnaire and its structure, it is usually best to have the data file reflect the order of the questionnaire. In other situations, the following structures can be better:
- Ordering data according to how often it will be used, with the most regularly used data at the top (e.g., demographics and segments).
- General-to-specific. For instance, category questions could be listed first, then brand questions. And, in the case of trackers, you may choose to have questions that are current and consistently asked at the top and historic or ad-hoc questions at the bottom.
- WHO, WHAT, WHERE, WHEN, and WHY.
- SITUATION (when, who with, and other aspects of context), BEHAVIOR (i.e., action), and PERSON (personality, values, demographics, etc.).
- INFORMATION SEARCH, AWARENESS, CONSIDERATION, TRIAL, USAGE FREQUENCY and SATISFACTION.
- DEMOGRAPHICS, MEDIA, ATTITUDES, CATEGORY BEHAVIOUR, BRAND BEHAVIOUR.
Data can be re-ordered on the Variables and Questions tab by dragging and dropping and via the Variables and Questions Tab Toolbar Buttons.
Standard analysis variables
Any standard analysis variables should be included in the data file or created. These will typically vary by client and industry. For example:
- In packaged goods and financial services studies "family lifestage" is usually relevant.
- In media studies, Age-by-Gender is often valuable.
- Medical studies typically create variables on patient attributes.
Hiding uninteresting data
Creating section headings in the data
Although Q does not allow you to create folders of variables, you can get a similar effect by inserting variables as section headings. Sections can be created in Q by inserting a new Binary Variable and giving it a distinctive design to separate blocks of questions in your data. For instance, you may make it indented and in capitals such as NEEDS AND WANTS, and show the proportion of people to complete the section, with the label showing a description of the sample.
Tidying questions
Names and labels
Often the names shown in the Question and Label fields of the Variables and Questions tab are messy, containing strange programming characters and truncated question wordings. It is generally a good idea to:
- Tidy them.
- Abbreviate them, so that when they appear in menus and exports they are easy to read.
- If the questionnaire has been ordered by question number, include the question number in the name of the question (e.g., Q1. Age). Note that you can include the full wording of the question in the footer (see below).
The following can be useful ways of quickly tidying up names and labels:
- Ensuring that they are created in a neat and organized way in the original data file (e.g., see How to Format an SPSS File for Use in Q).
- Modifications can be made to label using Find/Replace, which supports wildcards (see How to Find and Replace in Q).
- Copying and pasting the Label column into Excel, modifying in Excel, and pasting back into Q again (by right-clicking the first variable and selecting Paste Labels).
- How to Suggest Better Question Names From Source Labels
- How to Remove Truncated Text from Variable Labels
Sorting categories within a question
How to Sort in Q can either be done manually, or by dragging and dropping, but there are also several options for automatic sorting. These can be found by typing the word sorting into the Search features and data box at the top of the Q window. The options are:
- Select Sort from Highest to Lowest (Does Not Update When Data Changes) to sort all questions in the project once according to the data in their SUMMARY tables.
- Select Sort Rows (Automatically Updates when Data Changes) to apply a table Rule to any selected tables which sorts them according to the results currently shown in the table.
Merging small categories
Merge categories with small counts (e.g., collapsing age categories and brands with less than 2% market share). This is often best done by:
- Select Automate > Browse Online Library Preliminary Project Setup > Create Tables for Data Checking. This creates tables containing data with small counts.
- Merging categories by dragging and dropping.
Removing irrelevant SUM and NET categories
Sometimes the SUM and NET categories are unhelpful. For example, if using a Number - Multi question to represent rating scales. They can be removed by right-clicking on them and selecting Hide.
Creating a report "shell"
It is sometimes useful when setting up a project to create the "shell" of a report, which can be modified as per requirements by users.
Creating tables
The Report tree in Q is a useful way of setting out the most important findings in the data, or for providing an overview against key groups such as segments, countries, or targets. Depending on the user, this will either be:
- A set of summary tables or charts, that can provide a starting point for the user to use in exploring the project.
- A set of crosstabs with all the tables crossed by a few standard questions.
Although there are lots of tools within Q for quickly creating a number of overview tables, the most straightforward approach may be to use one of the following scripts:
- Select Automate > Browse Online Library.
- Select and run whichever seems most appropriate of:
It is often useful to create Folders of tables in the Report tree to make it easier for users to navigate.
Folders
Tables and charts can be grouped into folders. The folders are created by right-clicking on Report and selecting Add Folder.
Filters
Quickly create lots of filters using Automate > Browse Online Library > Filtering > Filters from Selected Data. See Filters for a description of other ways to create filters.
Where a question has only been asked to a subset of the sample, it can be useful to create a relevant footer and apply it to the appropriate tables.
Footers can be customized. In most cases, this is best done using Table Options. However, if adding footnotes containing question wordings, this is best done using Automate > Browse Online Library > Modify Footers > Description of Selected Data (e.g., Question Name, Skips, Filtering).
Changing the Appearance of Charts and Tables
See Creating And Modifying Tables.
Sample size warnings and automated data hiding
There are a variety of QScripts and Rules that can be used to give appropriate warnings and to hide data. To apply these:
- Type the words sample size into the Search features and data box in the top of the Q window
- Select the desired option from the QScripts and Rules section of the results. For example:
Transposing tables
Sometimes tables of Pick One - Multi and grid questions are easier to read if transposed (i.e., right-click on a column or row and select Swap Rows and Columns).
Statistics
Statistics can be placed on multiple tables at one time by either:
- Multi-selecting lots of tables and using whichever is appropriate of Statistics - Cells, Statistics - Right, and Statistics - Below.
- Rules (e.g., Modifying The Whole Table or Plot > Always Show Sample Size).
With tables involving Pick One - Multi and Pick One questions, it is often a good idea to use Statistics - Right and Statistics - Below when setting up the project, as many users will not discover these on their own.
Customizing the names of statistics
Statistics can be renamed (e.g., changing Average to Mean or Net Promoter Score), by either:
- Edit > Project Options > Customize > Output Text which will rename the statistics for the entire project.
- Edit > Table Options > Output Text which will rename the statistics for the selected tables.
- Using Rules (e.g., Modify Headers > Automatically Rename Row Labels).
However, in general, it is often a bad idea to rename statistics, as it can make it hard for users to understand how Q works, as it will cause the version they are using to appear different from the version that appears in all of Q's documentation.
Next
Basic Workflow For Checking and Cleaning a Project
Constructing Variables to Make Analysis Easier