Assessing the Coverage of Data Collection Campaigns on Twitter: A Case Study Full text

Vassilis Plachouras, Yannis Stavrakas, Athanasios Andreou
Proceedings of International Workshop on Social Media Semantics (SMS 2013), OTM 2013 Workshops, LNCS 8186, pp 598-607
Abstract. Online social networks provide a unique opportunity to access and analyze the reactions of people as real-world events unfold. The quality of any analysis task, however, depends on the appropriateness and quality of the collected data. Hence, given the spontaneous nature of user-generated content, as well as the high speed and large volume of data, it is important to carefully define a data-collection campaign about a topic or an event, in order to maximize its coverage (recall). Motivated by the development of a social-network data management platform, in this work we evaluate the coverage of data collection campaigns on Twitter. Using an adaptive language model, we estimate the coverage of a campaign with respect to the total number of relevant tweets. Our findings support the development of adaptive methods to account for unexpected real-world developments, and hence, to increase the recall of the data collection processes.