Introduction
There are occasions when you have collected more records than necessary for a survey and you want to randomly remove the surplus or you simply want to select a random subset of records to do something with. This article describes how to create a filter for a random sample of respondents in your data set. You can then use this filter to filter tables or analysis and remove cases from the data if needed.
Requirements
- A Q project with a data set imported.
- For the second Method - a variable that can be used to filter the selection to the sub-group to sample. In this example, we have a variable called Males 25-29.
Method - Random filter across all respondents
1. On the Variables and Questions tab, right click and select Insert Variable(s) > R Variable.
2. Paste the below in the R CODE:
##code to modify
#specify any variable in the dataset (this used to calculate how many respondents are in the data)
id = UniqueID
#specify number of respondents to randomly select
select = 10
##standard code
#set the seed so randomization doesn't change if calculated again later
set.seed(123)
#calculate total sample size
ss = length(id)
#select the random respondents/rows in the data
indices = sample.int(ss, select)
#create an empty filter
filter = rep(0, ss)
#change the random selection values in the filter to 1
filter[indices] = 1
#return the final filter
filter
3. Press the play button to run the code and get a preview of results.
4. Give your new variable a Question Name like Random sample.
5. Click Add R Variable.
6. In the Tags column click F to make the variable usable as a filter.
7. [OPTIONAL]: Use this filter to delete those random selections see How to Delete Cases (Observations).
8. [OPTIONAL]: Use this filter to filter in only that random selection into your table or analysis by using the Filter(s) dropdown.
9. [OPTIONAL]: If you want to create the filter to filter out the random selection from a table or analysis change lines 14-17 to:
#create a filter including everyone
filter = rep(1, ss)
#change the random selection values in the filter to 0 to filter out
filter[indices] = 0
Method - Random filter across a subgroup of respondents
1. On the Variables and Questions tab, right click and select Insert Variable(s) > R Variable.
2. Paste the below in the R CODE:
##code to modify
#specify a filter variable of your subgroup
subgroup = `Males 25-29`
#specify label of those selected in subgroup variable
selected = "Selected"
#specify number of respondents to randomly select
select = 10
##standard code
#set the seed so randomization doesn't change if calculated again later
set.seed(123)
#get the list of rows of the subgroup in the data
subgroup_rows = which(subgroup == selected)
#select the random respondents/rows from those rows
indices = sample(subgroup_rows, select)
#create an empty filter
filter = rep(0, length(subgroup))
#change the random selection values in the filter to 1
filter[indices] = 1
#return the final filter
filter
3. Press the play button to run the code and get a preview of results.
4. Give your new variable a Question Name like Random sample of Males 25-29.
5. Click Add R Variable.
6. In the Tags column click F to make the variable usable as a filter.
7. [OPTIONAL]: Use this filter to delete those random selections see How to Delete Cases (Observations).
8. [OPTIONAL]: Use this filter to filter in only that random selection into your table or analysis by using the Filter(s) dropdown.
9. [OPTIONAL]: If you want to create the filter to filter out the random selection from a table or analysis change lines 16-19 to:
#create a filter including everyone
filter = rep(1, length(subgroup))
#change the random selection values in the filter to 0 to filter out
filter[indices] = 0
See Also
How to Select a Random Sample in Q
How to Randomly Select a Sub-Sample
How to Delete Cases (Observations)