This post describes the basic process for creating such a Word Cloud in Q. Please read How to Show Sentiment in Word Clouds for a more general discussion of the logic behind the code below.
Requirements
A data file that contains a variable with the phrases you wish to use to create the Word Cloud.
This article uses data from tweets by President Trump
Method
Step 1: Importing the data
This post assumes that you have already imported a data file and this data file contains a variable that contains the phrases that you wish to use to create the Word Cloud. If you have the data in some other format, instead use Create > R Output and use the code and instructions described in How to Show Sentiment in Word Clouds using R.
If you want to reproduce the Word Cloud form above, you can do so using File > Data Sets > Add to Project > From R, and:
- Set the Name to trumpTweats
- Enter the code below.
- Press the play button (the blue triangle).
- Press Add data set and OK.
load(url("http://varianceexplained.org/files/trump_tweets_df.rda"))
trump_tweets_df$text = gsub("http.*", "", trump_tweets_df$text, useBytes = TRUE)
trump_tweets_df
Step 2: Extracting the words
- Create > Text Analysis > Advanced > Setup Text Analysis
- Select the Text Variable as text (this is the name of the variable containing the tweets)
- Check the Automatic option at the top
Step 3: Sentiment for the phrases (tweets)
- Go to the Variables and Questions tab
- Select the first variable (it is called text)
- Create > Text Analysis > Sentiment
Step 4: Sentiment for each word
- Create > R Output
- Paste in the code below
- Press Calculate and you will have the Word Cloud!
As discussed in How to Show Sentiment in Word Clouds, your Word Cloud may look a bit different and you do need to perform a check to make sure no long words are missing. Also, if you have tried these steps a few times in the same project, you will need to update the variable, R Output, and question names to make everything work.
# Sentiment analysis of the phrases
phrase.sentiment = `Sentiment scores from text.analysis.setup`
phrase.sentiment[phrase.sentiment >= 1] = 1
phrase.sentiment[phrase.sentiment <= -1] = -1
# Sentiment analysis of the words
final.tokens = text.analysis.setup$final.tokens
td = t(vapply(flipTextAnalysis:::decodeNumericText(text.analysis.setup$transformed.tokenized), function(x) {
as.integer(final.tokens %in% x)
}, integer(length(final.tokens))))
counts = text.analysis.setup$final.counts
phrase.word.sentiment = sweep(td, 1, phrase.sentiment, "*")
phrase.word.sentiment[td == 0] = NA # Setting missing values to Missing
word.mean = apply(phrase.word.sentiment,2, FUN = mean, na.rm = TRUE)
word.sd = apply(phrase.word.sentiment,2, FUN = sd, na.rm = TRUE)
word.n = apply(!is.na(phrase.word.sentiment),2, FUN = sum, na.rm = TRUE)
word.se = word.sd / sqrt(word.n)
word.z = word.mean / word.se
word.z[word.n <= 3 | is.na(word.se)] = 0
words = text.analysis.setup$final.tokens
x = data.frame(word = words,
freq = counts,
"Sentiment" = word.mean,
"Z-Score" = word.z,
Length = nchar(words))
word.data = x[order(counts, decreasing = TRUE), ]
# Working out the colors
n = nrow(word.data)
colors = rep("grey", n)
colors[word.data$Z.Score < -1.96] = "Red"
colors[word.data$Z.Score > 1.96] = "Green"
# Creating the word cloud
library(wordcloud2)
wordcloud2(data = word.data[, -3], color = colors, size = 0.4)
The results are as follows:
Technical Notes
Note that the above code uses the function wordcloud2()
to create the wordcloud, and if you are plotting a lot of words in a small space, some words may be left out for the cloud to fit the area. There isn’t a warning if words are removed, and words removed are not necessarily ones with a small frequency. You may notice that the colors are wrong in your wordcloud when this happens. So, care should be taken to set the font size in the code or resize the output to one that will surely fit all of the words you would like to show.
An alternative to using the wordcloud2()
function is to use the wordcloud()
function instead. There are less customizations (such as custom shapes, backgrounds, custom rotations, font family, and some foreign languages), but you will be warned if words are left out of the cloud. You can also set a max number of words to show within the function (rather than manipulating the data beforehand). To use this wordcloud function you can replace all of the code underneath the # Creating the word cloud
section with below:
#load the R package with the wordcloud function
library(wordcloud)
#create the wordcloud
wordcloud(words = word.data$word, freq = word.data$freq, #provide the words and counts
random.order = FALSE, #keep words in order
scale=c(7,.5), #set the high and low font size parameters
rot.per = .1, #percentage of words that are vertical
min.freq=0, #minimum threshold for counts for plotting words
max.words=Inf, #maximum number of words to plot
colors = colors, #vector of colors for plot
ordered.colors=T) #TRUE means that colors above should correspond to order of words provided
Documentation and more examples are available for wordcloud2() and wordcloud() online.
NEXT
How to Automatically Extract Entities and Sentiment from Text
How to Calculate Sentiment Scores for Open-Ended Responses in Q