Introduction
This article describes how to select a random sample in Q using JavaScript. If you're more familiar with R, please see How to Randomly Choose A Subset of Cases From a Data Set using R.
There are occasions when you have collected more records than necessary for a survey and you want to randomly remove the surplus or you simply want to select a random subset of records to do something with. This article will show you how to select a random sample of respondents in your data set based on a variable. This could either include the whole set or be filtered to a specific group.
Method
You have completed fieldwork for an important survey and you have gone over quota for males but you don’t want to just delete the last records. Instead, you want to randomly select 30 male records to remove. You have opened the data set in Q but don’t know how to proceed. Don’t worry! The solution is to use some JavaScript to generate the random allocation.
Create a filter
First, we need to create a filter for males.
- Bring up a table for Gender in the Outputs tab
- Right-click the percentage field for the ‘Male’ row and click Create filter. The Question name for filter should default to the selected category (in this case “Male”), but you may want to alter it as needed.
- Deselect Apply to the current table before pressing OK.
- Click OK.
This has now created a variable at the top of the Variables and Questions tab. In the Name column, change the randomly allocated name to Males to make it easier to remember later.
Use a JavaScript formula
Now we want to add the code to randomly select respondents based on this filter. Right-click on any row in the Variables and Questions tab and select Insert Variable(s) > JavaScript Formula > Numeric.
You will be presented with an Expression box to paste JavaScript code into. Paste in the below code:
var _filter = Males; // Enter the variable name of the filter variable here
var _sample_number = 30; // Enter the number of people to identify
var _index = [];
for (var i = 0; i < N; i++)
_index.push(i);
var _rand = _index.map(function (_x) {
return {_ind: _x, _val: Math.random() * _filter[_x] }
});
_rand = _rand.sort(function (a, b) { return b._val - a._val; });
var _max_vals = _rand.map(function (_x) { return _x._ind; });
_max_vals = _max_vals.slice(0, _sample_number);
_results = _index.map(function (x) { return _max_vals.indexOf(x) > -1 });
In the code:
- We specify the name of the filter variable and the random sample size
- We create a second array called _rand using object names that maps the indices to a random number which will return 0 if not included in the filter variable
- We sort _rand and return the indices so that we can select the 30 records using the slice function
- Finally, we map these 30 records to _results, returning a 1 for the selected records and 0 for the rest
Hard-code the random selection
Remember that this code is dynamic and that it continually re-evaluates unless we fix the output. To do this,
- Ensure your unique ID variable is selected in the Case IDs drop-down at the top of your Data tab.
- Return to the Variables and Questions tab, highlight the random sample variable and select Copy and Paste Variable(s) > As Values to hard-code the random selection so that it doesn’t re-calculate in the future.
- You can then change the Variable Type to Categorical and click the F in the Tags column for that row to turn this into a filter.
[OPTIONAL] Now that we have this filter, we can remove these respondents by applying the filter to the Data tab. Then, right-click any row and select Delete Rows Matching Filter (Green).
Next
How to Randomly Select a Sub-Sample
How To Set The Weighted Sample Size