Big data analysis on US presidency election

Matthew Leung
2 min readOct 31, 2020

--

I would like to use big data to analyse the US presidency election, in particular, how the recent email-gate scandal affect the public view.

Since I just got the Twitter development account, I stay with Twitter data first. I searched the tweets of last 2 weeks. Due to the limitations of the Twitter api, I can only downloaded the top 100 tweets having the keyword “Biden” per day. I compared it with those having the keyword “trump”.

By using the machine learning sentiment analysis model on the truncated content of the each tweets I got, I can determine whether the comment is positive or negative. Then I will multiply the sentiment value(1 or -1) with the retweet count to come up with the final impression score.

The more positive tweets with more retweets, the higher the final score. Similarly, the more negative tweets with high retweet count, the lower the score ( more negative of the score). Below is the graph of the impression score of Biden and Trump over the last 2 weeks.

The graph showed that Biden in general is higher than Trump. But start from Oct 14, when the email-gate scandal broke out, Biden’s score dropped sharply. Later on, it recovered. It may be because Twitter prohibited people from retweeting the scandal news.

Analyzing with google trends, the graph below showed the trends of the search of the keyword “Hunter Biden emails”. After the peak on 14-Oct, the search count dropped rapidly and stay at low level. If we looked at the distribution across different regions, I found that the attention is only focused in certain states, not many. In conclusion for these 2 posts, the impact of the scandal may not be as worse as people expect.

--

--