When an analysis is conducted using variables in multiple data files, Q needs to work out how the observations in the data files relate to each other. This allows users to crosstab questions from two different data files, provided those files have a data file relationship that tells Q how the observations relate to each other. The link connects common respondents/observations in both data sets together so if you want to analyze them, the appropriate value(s) from each data set lines up for the corresponding observations.
Linking data sets is commonly done when you have unstacked data and want to analyze it against stacked data from the same survey, or when you have related data from two different data sources. However, this should not be used as a substitution for merging new variables into your data, see Method 2 Merging in new variables, nor will it work for analyzing wave-on-wave data, see Method 1 Merging in new cases.
There are also some limitations to working with variables across data sets:
- They cannot be used in the same banner, and
- filters created can only apply to an output that uses data from the same data set as the filter.
Requirements
- Two different data files loaded in Q.
- There must be a variable in each file that contains the same type of data (text, categorical, dates, etc.) and has some values that match.
Method
To establish a relationship between two files:
- Data file relationships can be edited by selecting File > Data Sets > Edit Relationships.
- This window shows a list of any existing data file relationships. In order to crosstab questions from two different data files, those two data files must have a relationship that appears in this list.
To add a new relationship:
- Click New Relationship....
- Use the two File dropdowns to select the two files, and the two Variable dropdowns to choose the common variable you are using to match the files, e.g., Week. Note, that both variables must contain the same type of data (text, categorical, date, etc.).
Selecting the relationship type:
- Use the Relationship menu to specify how the data should be matched across the data sets (in this example, the match is One to one):
- One to one: Each single value from the first data set's variable matches exactly to a single value in the second data set's variable.
- One to many: A single value from the first data set's variable matches multiple values in the second data set's variable.
- Many to one: Multiple values from the first data set's variable match a single value in the second data set's variable. This is the same type of relationship as One to many, with first and second data sets swapped.
- Many to many: Multiple values from the first data set's variable match multiple values in the second data set's variable, resulting in Data Fusion.
- Use the When a value is not found in the other data set menu to specify how you want to treat values that exist in one file but not the other. The choices are:
- Show a warning message (default) - When a respondent's value in the first data set's variable cannot be found in the second data set's variable (or the other way round), a warning is shown and you will not be able to proceed with the crosstab until you either fix the data or come back to this screen and select another option.
- Insert a missing values into the matched data - If a respondent's value in the first data set's variable cannot be found in the second data set's variable (or the other way round), the respondent is included in the sample as missing data (NaN) rather than their actual response data.
- Exclude cases from the matched data - If a respondent's value in the first data set's variable cannot be found in the second data set's variable (or the other way round), the respondent is excluded from the sample.
- Use the Match dates that fall in the same: menu for ways to treat dates for when a case in the first data set's variable falls in the same year (or month, week or day) as the date for a case in the second data set's variable.
- Use the Recipient menu to define the recipient when the relationship type between the two data sets is Many to many.
- Click OK.
Note: if you receive a warning message, the message will tell you how to fix the problem before proceeding. For example, this message indicates that I need to either update the file, or click File > Data Sets > Edit relationships and choose another option for when a value is found in one file but not the other.
To fix this particular problem, I will update the Relationship to Many to many and set Match dates that fall into the same to Week and then click OK.
The results are as follows:
Diagnostics button
- Use the Diagnostics button in the Data File Relationship window, if you want to be warned in advance of problems you should fix prior to matching the files.
Editing or viewing existing data file relationships
To see more detail about an existing relationship, go to File > Data Sets > Edit Relationships, select the relationship in the list, and then click Edit to view the Data Set Relationship details:
You can also remove an existing relationship by selecting it from the list and then clicking Delete.
Performing an analysis between variables in different files
Now that the files are linked, I can use variables from both files in the same table. In this example, the Brand Awareness variable comes from the Awareness file, and the Week variable comes from the Ad Spend file.
Using Weights and Filters
When you have two data files with no relationship between them, Q expects the filter or weight variable to have identical names/labels and will allow users to choose a filter and it automatically works out which data set’s filter to use.
However, if you have a data file relationship, Q expects the weight/filter to only appear in one of the files. This is because if they appear in both, you end up with unresolvable logical problems.
If you wish to filter the table, and the relationship is Many-to-many, there must be a filter variable with the same Name and Label in both data files. See How to Create Variables Across Data Files Using JavaScript.
Charting a Time Series With Multiple Data Sources
Once you have a project with two data files, you have the ability to chart a time series that shows data from both data files.
The following prerequisites must be met:
- There must be a Date question in both data files with:
- The same Variable Name
- The same Question Name
- The same Question Type (Date)
- If you wish to filter the Time Series chart, there must be a variable with the same Name and Label in both data files. The
tag must be turned on.
- If you wish to weight the Time Series chart, there must be a variable with the same Name and Label in both data files. The
tag must be turned on.
With the above prerequisites met, you may create a Time Series chart with multiple data sources by following these steps:
- Click Create > Charts > Visualization > Time Series > Time Series with Dynamic Window.
- From the object inspector, go to Inputs > DATA SOURCE, and select the type of data source you wish to use to create the Time Series Graph.
- If you wish to use an existing table go to Output and select the desired table from the drop-down menu.
- If you wish to use variables go to Variables and first select the Date/Time structured variable, followed by the Numeric variable. Alternatively, you can drag and drop the variables from the Data Sets tree into the menu itself.
- To input the data manually, select Paste or type table. A new dialog box window will open where you can paste or type in your data. Note that the table must meet the requirements as described above.