This article describes how to use a built-in QScript to check the selected numeric data for outliers and creates new copies of the data with the outliers removed. Outliers are defined as values that are not within a certain number of standard deviations from the variable mean. You can choose how many standard deviations are used to determine which values are considered to be outliers. The default value is 3 standard deviations. The new copies of data will have the outlier values replaced with missing values. Data that does not contain outliers will not be copied.
Requirements
A data file loaded in Q
Method
To run the script:
- Select Automate > Browse Online Library > Create New Variables > Variables(s) with Outliers Removed.
- A list of questions with a Question Type of Number, Number - Multi and Number - Grid will appear. Select the questions from the list that you want to check for outliers and click OK.
- Enter a cut-off value to identify cases whose standard deviations are not within that value. The default value is 3.
A new folder will be created in the report tree that contains tables for the selected data and any new copies of data with the outliers removed.
The new copies of variables use a JavaScript formula to assign respondents with outlying values with a new value of NaN. The means and standard deviations are determined when this script is run. As a result, the definition of an outlier in variables where the outliers have been removed will not be updated if the underlying data is changed or updated.
Next
How to Use Scripts to Automate Data Checking and Cleaning
How to Check for Errors in Data File Construction
How to Identify Questions with Straight-Lining/Flat-Lining
How to Hide Uninteresting Data
How to Remove Truncated Text from Variable Labels
How to Reverse Scales in Questions
How to Suggest Better Question Names from Source Labels
Comments
0 comments
Article is closed for comments.