Introduction
This article describes how to use simulated data to check choice model experimental designs using Q.
Running a survey can be expensive and time-consuming. Luckily you can use simulated data to check and compare your survey design, saving you time and money. In this article I will explain how to use simulated data to check choice model experimental designs using Q.
Requirements
To fit a model with simulated data you will first need to create a Choice Modeling – Latent Class Analysis output, by clicking on Automate > Browse Online Library > Choice Modeling > Latent Class Analysis. You need to supply the choice model design in any of the following formats:
- an Experiment Question
- a choice model experiment design output generated from Choice Modeling – Experimental Design
- a Sawtooth CHO file
- a Sawtooth dual format design file
- a JMP design file
For the example in this article, I will be using a choice model experiment design with eggs attributes (Weight, Quality, Price) generated using the Efficient algorithm.
Method
Input Priors
Once you have supplied the design,
- Choose Simulated choices from priors in the Data source.
This indicates to Q that we want to use simulated data instead of real survey data. A red button called Enter priors will appear. - Click Enter priors to bring up a spreadsheet data editor.
- Enter the priors here, in the format I’ve shown below, where Weight, Quality, and Price are the attribute names, and the cells below these names are the attribute levels.
Make sure that the columns to the right of the attributes are titled either “mean” or “sd”, and contain the means and standard deviations of the levels. Note that the first levels need to be zero due to the coding of the design. It is not necessary to include all the attributes or even any attributes at all. Attributes that are not present will have prior means and standard deviations of zero.
Alternatively, if you have supplied a choice model experiment design output, there is an additional Prior source control which has the option Use priors from design. To do so:- Under EXPERIMENTAL DESIGN, from the Design source menu, select Experimental design R output
- Under RESPONDENT DATA, from the Prior Source menu, select Use priors from design
As the name suggests, Q will use any priors provided in the design output to generate the simulated data. If no priors are present, then Q will use priors of zero. This will cause the simulated choices to be random and independent of attributes.
For the purpose of checking the design, I would recommend specifying non-zero priors for all attribute levels in the design, so that we can see how the design performs for each level.
Inputs: simulated sample size
Next, you will need to choose the simulated sample size which is the number of respondents to simulate. The default is 300, but generally, this should be larger if there are many parameters in the model, especially if you find that estimated parameters do not match the model.
Inputs: others
For the purpose of checking the design, I would generally first run a single-class latent class analysis without any questions left out, which is the default, but you may choose different settings or even run Hierarchical Bayes instead. Once you are done,
- Click on the Calculate button to run the analysis with simulated data.
Model output
The next table shows the output from a Hierarchical Bayes analysis on the same data. The means are much closer to the priors than with latent class analysis, although the standard deviations are smaller than the priors and the respondent distributions are skewed. This problem is related to the model rather than an issue with the design. The prediction accuracies are higher, due to the flexibility of Hierarchical Bayes model. Overall, there do not seem to be any issues with the design. I would be concerned if the model failed to converge, if parameter estimates were vastly different from the priors, or if prediction accuracies were low.
Standard errors
The last table displays parameter statistics from the latent class analysis above. This was created by
- Selecting the model output and clicking on Automate > Browse Online Library > Choice Modeling > Diagnostic > Parameter Statistics.
The parameters for which a prior was specified all have small standard errors relative to the coefficients and hence a high level of significance. It would be worth investigating the design if any of these parameters were not significant, as a potential issue may be that the design fails to adequately cover some levels. You can also compare different designs with the same specifications and settings against each other, with lower standard errors indicating a better design.
See Also
How to Do Choice Modeling in Q