There are some situations when your survey data file may contain duplicate values in your record identifier field. This can occur, for example, when some respondents have data in more than one row of data in the file. In this case, there may still be a need to have a unique identifier for each of these cases. There may also be times when you have duplicate records that need to be removed from the file.
To delete duplicate cases in an SPSS .sav file, take these steps:
- Save your project.
- Select File > New Project.
- Select File > Data Sets > Add to Project > From File.
- In the Data Import Window:
- Select Use original data file structure.
- Untick Tidy Up Variable Labels and Strip HTML from Labels.
- Click OK.
- Set the Case IDs on the Data tab to Use Case Number.
- Delete any duplicate rows on the Data tab (right-click on the row numbers to see the options for deleting).
- If you are not sure which ones are duplicates, create a Pick One question from the id variable, create a SUMMARY table from it in the Outputs Tab, and sort the percentages from highest to lowest.
- You can also remove these using a filter automatically see: How to Identify Duplicates in Q Using Code.
- Tools > Save Data as SPSS/CSV File and save the file somewhere.
- Open your existing project.
- File > Data Sets > Update.
- Select the data file created earlier, and press Open. Read any notifications and, if they seem OK, press Accept.
- Go to the Data tab.
- Right-click on any row number and select Revert Deleted Rows.
- Choose Check/uncheck all (at the bottom).
- Press Revert.
- Select the variable you wish to use for Case IDs.