Sparse regression of textual analysis

Carter, Phylisicia N.

Sparse regression of textual analysis

Files

file_1.pdf (742.06 KB)

Date

2018

Authors

Carter, Phylisicia N.

Publisher

University of Alabama Libraries

Abstract

We consider sparse regression techniques as tools for classification of sentiment within Twitter posts. Analysis of Twitter usage suffers from several unique challenges. For example, the 140-character limit severely limits the amount of information contained in each post; this causes most tweets to contain an extremely small subset of the dictionary, presenting challenges for learning schemes based on dictionary usage. To remedy this undersampling issue, we propose usage of penalized regression. Here, we employ logistic regularization to avoid any degeneracy caused by the sparse usage of the dictionary in each tweet, while simultaneously learning which terms are most associated with each sentiment. Accelerated sparse discriminant analysis is also used to combat the issues of degeneracy and overfitting of the training data while providing dimension reduction. As illustrative examples, we employ sparse logistic regression to classify tweets based on the users’ perception of a connection between vaccination and autism, and we examine the Twitter users' sentiment of the use of autonomous cars.

Description

Electronic Thesis or Dissertation

Keywords

Applied mathematics

URI

http://ir.ua.edu/handle/123456789/5276

Collections

Theses and Dissertations
Theses and Dissertations - Department of Mathematics

Full item page