A.I. & Optimization

Advanced Machine Learning, Data Mining, and Online Advertising Services

Predicting US Presidential Election using Sentiment Analysis



Project Contributors: Kazem Jahanbakhsh, Yumi Moon, Ajay Promodh Sridharan, Tim Song, and Priyanka Gupta.

In this post, we describe our research work in the last two years in the area of Opinion Mining for predicting interesting socio-economic events such as political elections.

Sentiment Analysis of Tweets

In September 2012, we attended the Amazon hackathon where we worked on Twheat Map app. The basic idea was to collect tweets in real-time and use machine learning to detect the sentiment of tweets (i.e. happy or sad mood). And finally visualize the moods of US cities in real-time using a heatmap. You can learn more about the app from here: Twheat Map App.

You can also watch our app demo at Amazon Hackathon below:



Obama and Romney 2012 US election
Fig 1. - US 2012 Electtion (source: becomingitalianwordbyword.typepad.com)

After the hackathon, we got interested in the problem of "predicting US 2012 presidential election" by analyzing political tweets posted by people on twitter. We continued collecting political tweets and wrote an article about our initial findings regard to candidates popularity trend on Twitter. Initially our technical work focus was on running sentiment analysis algorithms to compute each candidate's popularity. You can read our initial analysis results from here: Obama or Romney: that's the question!.

The Predictive Power of Social Media

In July 2014, we submitted a technical paper to a data mining conference on predicting US 2012 presidential election by analyzing 40M tweets. Below you can find the abstract of the paper:

Twitter as a new form of social media potentially contains useful information that opens new opportunities for content analysis on tweets. This paper examines the predictive power of Twitter regarding the US presidential election of 2012. For this study, we analyzed 32 million tweets regarding the US presidential election by employing a combination of machine learning techniques.

We devised an advanced classifier for sentiment analysis in order to increase the accuracy of Twitter content analysis. We carried out our analysis by comparing Twitter results with traditional opinion polls. In addition, we used the Latent Dirichlet Allocation model to extract the underlying topical structure from the selected tweets. Our results show that we can determine the popularity of candidates by running sentiment analysis. We can also uncover candidates popularities in the US states by running the sentiment analysis algorithm on geo-tagged tweets.

To the best of our knowledge, no previous work in the field has presented a systematic analysis of a considerable number of tweets employing a combination of analysis techniques by which we conducted this study. Thus, our results aptly suggest that Twitter as a well-known social medium is a valid source in predicting future events such as elections. This implies that understanding public opinions and trends via social media in turn allows us to propose a cost- and time-effective way not only for spreading and sharing information, but also for predicting future events.

You can read the full paper from arxiv website: The Predictive Power of Social Media: On the Predictability of U.S. Presidential Elections using Twitter.

ML/NLP library

We also open sourced our Java ML/NLP library. We used this library for mining tweets and building predictive models. The predictive models are used for analyzing elections and mining public opinions from social media. Below are a list of algorithms that we have implemented:

  1. LDA Algorithm: an implentation of Latent Dirichlet Allocation algorithm used for topic modeling.
  2. Advanced Naive Bayes Classifier: a customized version of Naive Bayes classifier for running sentiment analysis on tweets.
  3. TextAnalysis: a class for performing various text analysis such as computing word frequencies.
  4. TweetsStatistics: provides functionalities for computing basic statistics from tweets.
You can pull the code from github: Twitter Mining.

Our Twitter Dataset is Publicly Available for Researchers

We have released our dataset containing 40M political tweets for 2012 presidential election for academic and industrial researchers. Each tweet includes tweet content, tweet author id (anonymized), tweet time, tweet location, and source device parameters. If you want to access to this dataset, contact us at info@AIOptify.com.

If you use our dataset in your work, we encourage you to cite the following paper:


@article{DBLP:journals/corr/JahanbakhshM14,
  author    = {Kazem Jahanbakhsh and
               Yumi Moon},
  title     = {The Predictive Power of Social Media: On the Predictability of {U.S.}
               Presidential Elections using Twitter},
  journal   = {CoRR},
  volume    = {abs/1407.0622},
  year      = {2014},
  url       = {http://arxiv.org/abs/1407.0622},
  timestamp = {Fri, 01 Aug 2014 13:50:01 +0200},
  biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/JahanbakhshM14},
  bibsource = {dblp computer science bibliography, http://dblp.org}
}
  


The prediction paper has been featured on Forbes

In April 2016, Nelson Granados has published an article on Forbes on the predictive power of social media in live competitions, featuring the English Premier League.

Nelson cited our study as an example of Twitter’s predictive power in politics. You can read the Forbes article here:

Social Media Predicts Leicester Cinderella Story In English Premier League


Machine Learning Meetup 2013, Vancouver

In Fall 2013, Kazem Jahanbakhsh gave a talk on "Using Machine Learning to Predict the US Election" at Hootsuite office in Vanocuver. You can watch the talk below: