American Sign Language Recognition Using Adversarial Learning in a Multi-Frequency RF Sensor Network

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
University of Alabama Libraries

Human computer interaction (HCI) based technologies, such as Alexa or Siri, have become increasingly prevalent in the daily lives of American citizens. However, access to these technologies is gated by the need to communicate with spoken commands, which subsequently precludes the Deaf community from benefitting from the quality of life improvements they provide. Current approaches to providing HCI technologies or ASL conversion largely rely on video and image processing techniques, haptic gloves, and wifi based systems. However, wearables restrict users from engaging in their normal daily activities, while video raises privacy concerns. To help ASL compatible HCI technology, we propose a multi-frequency RF sensing network for the recognition of a basic lexicon of signs. When validating the network on a daily activities dataset, we had good performance, with classification accuracy's around 90% or higher. For ASL data, we have 2 datasets: native and imitation. The native dataset is small, and was collected from Deaf individuals. Our imitation dataset is larger, and was collected from hearing individuals prompted by copysigning videos. We show that imitation data cannot be used in lieu of native ASL data for training and benchmarking classifiers because the two datasets possess disparate distributions in feature space. Alternatively, we investigate adversarial learning as a means for mitigating the challenge of insufficient training data. Cross frequency training is one option for augmenting the training dataset which suffers from severe performance degradation when data from one frequency is used to pre-train a network for classification of data at another frequency. We show that data synthesized using Generative Adversarial Networks (GANs) can be used to reduce but not completely eliminate cross-frequency training degradation. An auxiliary conditional generative adversarial network (ACGAN) with kinematic sifting is used to augment and classify human activity data and recognize ASL signs. While the proposed network performed well with daily activities, its performance could not be adequately validated on ASL data due to sparsity of native ASL data and statistical inconsistencies of imitation signing data. Future directions for overcoming these challenges and extending the proposed techniques to ASL recognition are discussed.

Electronic Thesis or Dissertation