Democratic Primary Debate Analysis

MCIT 591 Final Project by: Joanne Crean, Juan Goleniowski, and Federica Pelzel.

The following analysis looks at over 200k Tweets between November 13th and November 27th, and analyses them for sentiment by candidate and state. Sentiment is scored from 0 to 4, 0 being extremely negative and 4 extremely positive.

The Candidates

We decided to look at the top 4 polling candidates

Card image cap

Joe Biden

Total Tweets: 37,235

Avg Sentiment: 1.53 / 4.0

Sentiment split:
48% positive / 51% negative

Top Positive Words:
Personal, True, Full, Funny, High.

Top Negative Words:
Corrupt, Guilty, Fake, Obvious, False.

Card image cap

Elizabeth Warren

Total Tweets: 62,391

Avg Sentiment: 1.47 / 4.0

Sentiment split:
43% positive / 56% negative

Top Positive Words:
True, Rich, Free, Full, Good.

Top Negative Words:
Fake, Poor, Racist, Sad, Obvious.

Card image cap

Bernie Sanders

Total Tweets: 44,067

Avg Sentiment: 1.60 / 4.0

Sentiment split:
52% positive / 47% negative

Top Positive Words:
True, Free, Open, Full, Good.

Top Negative Words:
Corrupt, Questionable, Racist, Poor, Negative.

Card image cap

Pete Buttigieg

Total Tweets: 37,669

Avg Sentiment: 1.56 / 4.0

Sentiment split:
48% positive / 51% negative

Top Positive Words:
High, True, Strong, Smart, Interesting.

Top Negative Words:
Racist, Older, Fake, Terrible, Poor.

Analysis of Sentiment by State

Click the drop icon in each candidate widget to color the map according to their sentiment analysis results

Map created using the generated "TweetsByState" CSV from the static debate analysis

About the Project

This project was created to analyze tweets mentioning top democratic primary candidates and get a average sentiment for each of them around the 5th Democratic Primary Debate which took place on November 20th, 2019. The program can also be used in real-time to get a sentiment analysis around any keyword.

Sentiment Analysis

The sentiment analysis was performed using the Stanford CoreNLP toolkit. The Stanford NLP Group makes some of our Natural Language Processing software available for anyone to use.

Sentiment is computed using a deep learning model which builds up a representation of whole sentences based on the sentence structure, instead of looking at individual words in isolation. Their model was trained on a dataset of 11,855 sentences taken from movie reviews. This includes 215,154 unique phrases, each annotated by 3 different people.

The sentiment scores returned are: 0 = very negative, 1 = negative, 2 = neutral, 3 = positive, 4 = very positive.

More information can be found on the Stanford NLP Group website.

Twitter4J API

The Twitter search uses the Twitter4j API. In order to execute the search, we opened a developer account with Twitter and obtained keys to integrate our Java application with the Twitter service.

As search term, we used the candidates last name and limited the results to dates before and after the Nov20th debate and to texts in Engligh. Once the search results were retrieved, the code filters out retweeted tweets and tweets that don't contain the name of candidate in the main text. For tweets saved in the text file, the code also checks the id of the latest retrieved tweet to ensure tweets are not saved twice in the file.

More details on Twitter4j API can be found on twitter4j.org
More details on Twitter developer account: developer.twitter.com

Modalities

Static Analysis of Debate Tweets

The static analysis was done on tweets from Nov 13 - Nov 20th and Nov 22nd - 30th. Batches of tweets were collected and added to a TweetArchive text file. Analysis consisted on reading tweets from file and generating Tweet objects for each, creating ArrayLists of tweets, for each candidate, natural language location assignment, and analysis for sentiment at the state and candidate level.

2 CSVs (TweetsByState and TweetsByCandidate) as well as a .txt report are generated.

Real-time user flow

Takes in user input for a keyword. Returns an ArrayList of Tweet obejcts up to 18,000 tweets or 7 days before the query date (whichever is hit first). Pre-process each tweet's text for sentiment analysis: remove urls, hashtags, user mentions. Get sentiment for tweet using Stanford CoreNlP. The resulting ArrayList is run through the data analyzer similarly to the static modality.

Returns console data analysis and .txt files containing tweets.

Made by: Joanne Crean, Juan Goleniowski, and Federica Pelzel.
Visit the project on GitHub