Discovering geographical topics from social media

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
University of Alabama Libraries

Traditional query-based search engines such as Google are often not able to discover real-time, contextual information such as traffic accidents or severe weather situations. As an alternative, social media can often provide relevant information to a user about important events that are occurring in their environment. However, to obtain this knowledge, a user may be required to wade through a large amount of irrelevant data. In this dissertation, we describe our research goals for providing relevant contextual information to a user by mining social media. We describe the implementation of our system, GeoContext, which consists of a geotopical clustering system that discovers topics appearing in a social media stream and analyzes where the topics are centered geographically. GeoContext also includes a method for filtering a social media stream by keywords and location coordinates in order to provide more specific topics. In order to find the geographical location of topics, GeoContext must also predict the location of each social media post. However, due to privacy concerns, many social media users do not share their exact geographical coordinates. For this reason, GeoContext includes a technique that predicts locations of posts that are not associated with explicit coordinates, a process called geolocation. Existing research has utilized the content of a post as well as the post author’s social media relationships with other users to estimate location. Our research provides a novel approach to geolocation by combining multiple techniques, as well as adding a new technique: estimating location by clustering social media posts of similar topics that are centered in a geographical area. We evaluate the geotopical clustering portion of GeoContext against a common topic modeling algorithm often used in geotopical clustering, Latent Dirichlet Allocation. We also evaluate the parameters and threshold values implemented within GeoContext. In addition, we evaluate the geolocation portion of GeoContext by collecting geotagged social media posts (posts explicitly tagged with geographical coordinates) and comparing the predicted location from GeoContext against the actual coordinates.

Electronic Thesis or Dissertation
Computer science