This article describes how to link two data sets together by performing data fusion. Data fusion, also known as statistical matching, involves combining the data from two data files, where the samples for the data files are not overlapping. For example, if there is one study looking at customer satisfaction and a completely separate study that looks at brand attitudes, data fusion can be used to combine the data.
Requirements
- Two data sets loaded in Q via File > Data Sets > Add to Project > From File.
- In each of the files that you wish to fuse, you will need to have a micro-segment variable which has the following properties:
- It is a Nominal or Ordinal variable.
- It has the same Variable Name, Question Name, and Question Type in each data file.
- It has the same unique values in each file. For example, if in one file all respondents have values of 1, 2, 3, ..., 100, then the same must be true in the other file. Importantly, there cannot be a situation where a value appears in one file but not the other.
- The unique values represent small segments. The assumption of the analysis is that:
- The people in a data file in one of these segments are broadly similar to those in the other data file of the same value.
- The segments explain the differences between people in both data files. For example, if brand attitudes are fused with customer satisfaction data, and age is the key determinant of both brand attitudes and customer satisfaction, then you could use age as the variable. More commonly, creating an index representing multiple variables will be appropriate.
Method
- Select File > Data Sets > Edit Relationships.
- Click New Relationship and then OK.
- Select the names of each data set to link in the File dropdowns.
- Set the micro-segment variable that appears in both data sets to match on in the Variable dropdowns.
- Set the Relationship type to Many to many.
- Choose what to do When a value is not found in the other data file.
- Select which data file is the Recipient.
- Press OK to save the relationship.
Please note the following:
- The sample size of the combined data will be that of the Recipient data file specified in Edit Data File Relationships.
- All of the respondents in the recipient sample are kept and used in analyses.
- The respondents in the other data file are probabilistically selected to match the same number of respondents in the recipient, for each matching value in the micro-segment variable. For example, if the micro-segment variable is "Gender" and there are 10 Males in the recipient data file and 20 Males in the other data file, 10/20 Males are probabilistically selected from the other data file to be used in analyses.
- The other data file's weights, if any, are an input to probabilistically selecting its respondents.
- For filters to work on tables that use variables from both files in a many-to-many relationship, there must be a filter variable in each file. The filters must have identical variable names.