This article describes how to use a built-in QScript to check the selected numeric data for outliers and creates new copies of the data with the outliers removed. Outliers are defined as values that are not within a certain number of standard deviations from the variable mean. You can choose how many standard deviations are used to determine which values are considered to be outliers. The default value is 3 standard deviations. The new copies of data will have the outlier values replaced with missing values. Data that does not contain outliers will not be copied.
A data file loaded in Q
To run the script:
- Select Automate > Browse Online Library > Create New Variables > Variables(s) with Outliers Removed.
- A list of questions with a Question Type of Number, Number - Multi and Number - Grid will appear. Select the questions from the list that you want to check for outliers and click OK.
- Enter a cut-off value to identify cases whose standard deviations are not within that value. The default value is 3.
A new folder will be created in the report tree that contains tables for the selected data and any new copies of data with the outliers removed.
How to Use Scripts to Automate Data Checking and Cleaning
How to Check for Errors in Data File Construction
How to Identify Questions with Straight-Lining/Flat-Lining
How to Hide Uninteresting Data
How to Remove Truncated Text from Variable Labels
How to Reverse Scales in Questions
How to Suggest Better Question Names from Source Labels
How to Create Tables for Data Checking
Article is closed for comments.