What The Candidates Said
Interactive visualization of the most spoken words by the US presidential candidates in their speeches during 2016 rally campaigns

The above visualization shows the most frequent words used by Hillary Clinton, Donald Trump and Bernie Sanders in their 2016 presidential campaigns. Words which were not in the list of top 50 most used were excluded to save space.

It is important to note that the total number of words in the data source is very different for each candidate:

Hillary Clinton - 97,457

Ted Cruz - 72,243

Bernie Sanders - 166,292

Donald Trump - 318,134

Due to this, the circle radius is based on the percentage of volume the word is being used relative to the total number of words used for each candidate. This is to avoid the visualization from being skewed by the difference in the data source, as well as provide for a greater understanding of the data as a whole.

When analysing big amounts of data, it is necessary to look beyond static visualization, in order to bring out the key facts. Interactivity allows us to significantly broaden our toolset and the way in which we show data.

In this case, in order to make the data visualization visually appealing and able to convey meaning to the viewers, it was necessary to find a more compact representation of the source data. Some words were significantly more frequently used by candidates and hence were comparatively significantly larger than other words when represented as a circle directly. However, such manipulation of data can be misleading: even though the data is proportionally the same because the same operation is performed on each data point, it appears as though some data points, like the most used words, are less important. Similarly, data points for words which are used far less often appear to be more significant. In order to prevent this, interactive data visualization technologies like D3.js allow us to use things like tooltips, which appear only on user action, hence showing only what is needed by the viewer at a given moment. In our case this allows viewers to “calibrate” the value they attribute to circle size and gives them more information which is needed to make an informed decision of how they view the visualization and what conclusions they draw from it.

As Mike Bostock rightly puts it in his keynote from CSVConf 2017: “Data visualization is about insight…”- Interactive data visualization gives us the necessary tools to obtain this insight into our data.

Data taken from : https://github.com/kanarinka/What-Do-The-2016-Candidates-Say

Backend word counting and data processing done in Python

Frontend visualization done in D3.js, following example by Chris Tuft

Candidate icons borrowed from Wall Street Journal.