Introduction
This article describes how to create a customized Bubble chart to be used in a Correspondence Analysis.
Requirements
A crosstab that can be used to generate a Correspondence Analysis
Method
The code below creates a correspondence analysis, and then presents this using a bubble chart. To reproduce a similar visualization with your own data:
- Create a table in Q that contains the data you want to analyze. This is no different to when you would normally do correspondence analysis.
- Right-click on the table in the Report tree, select Reference name, and copy its contents. When I did this, the reference name of my table was table.Q9.
- Duplicate the table by pressing + Duplicate.
- Right-click on the table and select Statistics – Cells and choose z-Statistic. Repeat this process to remove any other statistics (or, hold down Ctrl on your keyboard and do this in a single step).
- Right-click on the table in the Report tree, select Reference name, and change the reference name to table.zScores (or anything else you want).
- From the Create menu, select R Output and paste in the code below, modifying the first 12 lines as per your needs. In the first line you replace table.Q9 with the name of your table (see step 2). In the 3rd line you replace Egypt with the name of the row that contains the standardized residuals that you wish to use, filling in the other rows with the labels that you wish to have appear on the final visualization.
x = table.Q9
z = table.zScores
row.to.use = "Egypt"
row.label = "Country"
column.label = "Concern"
title = "Traveler's concerns about different countries (bubbles relate to Egypt)"
legend.title = "Strength of relationship"
# Removing rows and columns to be ignored
remove = c("NET", "Total")
x = x[!rownames(x) %in% remove, !colnames(x) %in% remove]
z = z[row.to.use, !colnames(z) %in% remove]
colnames(x) = paste0(colnames(x), ": ", round(x[row.to.use,]), "%")
# Default circle size (this is relative to the z-scores)
z[abs(z) <= 1.96] <- 0 #This turns off the significance.
default.size = 0.1 # Minimum circle size
my.ca = ca::ca(x)
coords = flipDimensionReduction::CANormalization(my.ca, "Principal")
n.rows = nrow(coords$row.coordinates)
n.columns = nrow(coords$column.coordinates)
coords = rbind(coords$row.coordinates, coords$column.coordinates)
# Creating the 'group' variable
n = n.rows + n.columns
groups <- rep("No association", n.columns) groups[z > 0] = paste0("Weakness of ", row.to.use)
groups[z < 0] = paste0("Strength of ", row.to.use)
groups <- c(rep(row.label, n.rows), groups)
# Setting bubble size
bubble.size <- c(rep(default.size, n.rows), abs(z))
# Labeling the dimensions
singular.values <- round(my.ca$sv^2, 6)
variance.explained <- paste(as.character(round(100 * prop.table(singular.values), 1)), "%", sep = "")[c(1, 2)]
column.labels <- paste("Dimension", c(1, 2), paste0("(", variance.explained, ")"))
bubble.size[bubble.size < default.size] <- default.size
rhtmlLabeledScatter::LabeledScatter(X = coords[, 1],
Y = coords[, 2],
Z = bubble.size,
label = rownames(coords),
label.alt = rownames(coords),
group = groups,
colors = c("Black", "Purple", "#FA614B", "#3E7DCC"),
fixed.aspect = TRUE,
title = title,
x.title = column.labels[1],
y.title = column.labels[2],
z.title = legend.title,
axis.font.size = 10,
labels.font.size = 14,
title.font.size = 20,
legend.font.size = 15,
y.title.font.size = 16,
x.title.font.size = 16)
Optional
Turning off the significance testing
The visualization below is the same as the one above, except that the significance testing has been turned off. This was achieved by:
- Commenting out line 14 (i.e., typing a # at the very beginning of the line, which prevents that line of code being run.
- Removing , “purple” from line 40 and swapping around the order of the two last colors ( “#3E7DCC”, “#FA614B”). This is where you customize the colors. You can type in a color code, or a color name, such as “Red” or “Blue”.
Only showing only the positive residuals
The next plot shows only the positive residuals (i.e., the concerns about Egypt that have the strongest relationship), it was created by:
- Removing the three letters abs from line 28.
- Commenting out line 25.
- In line 40, replacing #3M7DCC with Purple.
Taking the data values off the chart
Lastly, to remove the percentages from the visualization, comment out line 12, which leaves us with the visualization below.
More advanced customizations
If you hover your mouse over the word LabeledScatter in Properties > R CODE (line 34), a tooltip shows all the definitions of the parameters in this function, which allow further customization to be performed.
Next
How to Do Traditional Correspondence Analysis
How to Do a Multiple Correspondence Analysis