Using Jetstream to Enable Large-Scale Text Analysis of Tweets
TimeTuesday, July 246:30pm - 8:30pm
DescriptionPerforming large-scale text analysis is becoming increasingly common across a wide variety of research domains – from computer science to the humanities to psychology. The aim of large-scale text analysis is to reveal patterns which would not be readily discernible by a human alone. One of the tools of text analysis which is becoming increasingly popular is sentiment analysis. Generally, in sentiment analysis, scores (either positive or negative are assigned) to a piece of text in an automatic fashion without the “human in the loop”. There are three general sentiment detection methods which have been studied: dictionary-based methods, supervised learning methods, and unsupervised/deep learning methods. In this work, we focus on comparing several dictionary-based methods (VADER, Sent140Lex, Hu/Liu) with Twitter data. To do this, we use >2,000,000 Tweets from IU’s OSoMe Twitter collection (10% random sample of public tweets going back to August 1, 2016). We store data on the Jetstream Cloud Computing system, and process the tweets with each sentiment dictionary. We then compare bi-grams, stream graphs, word counts, and similarity measures to assess the dictionary robustness.