This article describes how to automatically combine categories together. For example, the number of SMS messages sent in a typical week combined into percentiles.
Requirements
A data set loaded into a Q project
Method
There are four different methods available for combining numeric values in this tool:
- Tidy categories divides the range of values into tidy categories, which are intervals of 2, 5, 10, 20, 50, and so on.
- Percentiles divides the range up into percentiles. This is useful when you want to create categories which contain even proportions of the cases in your data.
- Equally spaced categories divides the range of values into categories that all have the same range. This is similar to Tidy categories, but has additional options for customization.
- Custom categories divides the range of values up into ranges of your choice. This is useful when you want to specify uneven ranges of values.
Automatically Combine Categories into Percentiles
Let's say you want to compute percentiles for the variable How many SMS sent in a typical week. Ensure that the variable you use is numeric.
- In the Variables and Questions tab, select the variable.
- Select Automate > Browse Online Library > Create New Variables > Automatically Combine Categories > By Value > Percentiles. A new variable with each case assigned to a percentile will automatically be added to your data set.
- Under Values, click the to see what the new variable looks like after the categories have been combined.
- Rename the variable (if desired)
- Switch to the Outputs tab to view the table of the new variable. The table is automatically added to your Report Tree.
Note: It is import to note that when working with percentiles, if the distribution of data is clumpy the new categories will not always contain the exact proportions requested.
Modifying the settings
- If the variable doesn't look right, you can change it. For instance maybe you wanted the variable to be divided into 5 rather than 10 percentiles.
- To change how the categories are combined:
- Select the new variable or question in the Variables and Questions tab.
- Right-click and select Edit R Variable.
- Choose the desired options in the Inputs section on the right. In this example, we changed the Percentages from 10 to 20.
- Click Update R Variable.
- Under Values, click the to check your results.
The results are as follows:
Automatically Combine Categories into Equally Spaced Categories
Let's say you want to compute equally spaced categories for the variable How many SMS sent in a typical week. Ensure that the variable you use is numeric. This time we will use the variable hover method instead of the Anything menu
- In the Variables and Questions tab, select the variable.
- Select Automate > Browse Online Library > Create New Variables > Automatically Combine Categories > By Value > Equally Spaced Categories. A new variable will be added to your data set.
- Under Values, click the to see what the new variable looks like after the categories have been combined.
The default number of categories is 2.
- To increase the number of categories from two to four categories:
- Select the new variable or question in the Variables and Questions tab.
- Right-click and select Edit R Variable.
- Choose the desired options in the Inputs section on the right. In this example, we changed the Number of categories setting to 4.
- Click Update R Variable.
The results are as follows:
Automatically Divide Categories into Tidy Categories
Let's say you want to compute equally spaced categories for the variable How many SMS sent in a typical week. Tidy categories, mean that the ranges of values in the categories are always 2, 5, or 10 (or multiples of 10 of these), with the same range used for each new category.Ensure that the variable you use is numeric. This time we will use the variable hover method instead of using the Anything menu
- In the Variables and Questions tab, select the variable.
- Select Automate > Browse Online Library > Create New Variables > Automatically Combine Categories > By Value > Tidy Categories. A new variable will be added to your data set.
- Under Values, click the to see what the new variable looks like after the categories have been combined.
The default number of categories is 2. - To increase the number of categories from two to four categories:
- Select the new variable or question in the Variables and Questions tab.
- Right-click and select Edit R Variable.
- Choose the desired options in the Inputs section on the right. In this example, we changed the Number of categories setting to 3. Also, we changed the Label style to Inequality notation.
The results are as follows:
Automatically Combine Categories into Custom Categories
The Custom categories method breaks the range of values into ranges of values that are specified in the Cut points setting. It allows you full control over how the categories are formed. This is particularly useful when the distribution of values is very uneven. In such cases the other methods are unsuitable becuase they tend to produce more even intervals with some categories containing too many or too few cases to be useful for analysis.
- In the Variables and Questions tab, select the variable.
- Select Automate > Browse Online Library > Create New Variables > Automatically Combine Categories > By Value > Custom Categories. A new variable will be added to your data set.
- To set custom cut-points, right-click the new variable and select Edit R Variable.
- In the Cut points box, specify the values you wnat. In this example, we changed the Cut points to 0,25,50,100 (comma separated) and the Category boundary to Start of range.
- Click Update R Variable.
The results are as follows:
Next
Automatically Combine Categories - By Value - Custom Categories
Automatically Combine Categories - By Value - Equally Spaced Categories
Automatically Combine Categories - By Value - Percentiles
Create New Variables - Automatically Combine Categories - By Value - Tidy Categories