Laughter and Filler Detection in Naturalistic Audio

DSpace/Manakin Repository

Laughter and Filler Detection in Naturalistic Audio

Show full item record

Title: Laughter and Filler Detection in Naturalistic Audio
Author(s):
Kaushik, Lakshmish (UT Dallas).;
Sangwan, Abhijeet (UT Dallas);
Hansen, John H. L. (UT Dallas)
Item Type: article
Keywords: Neural networks (Computer science)--Convolutional
Neural networks (Computer science)--Deep
Transmutation (Linguistics)
Nonverbal communication
Abstract: Laughter and fillers are common phenomenon in speech, and play an important role in communication. In this study, we present Deep Neural Network (DNN) and Convolutional Neural Network (CNN) based systems to classify non-verbal cues (laughter and fillers) from verbal speech in naturalistic audio. We propose improvements over a deep learning system proposed in 1]. Particularly, we propose a simple method to combine spectral features with pitch information to capture prosodic and spectral cues for filler/laughter. Additionally, we propose using a wider time context for feature extraction so that the time evolution of the spectral and prosodic structure can also be exploited for classification. Furthermore, we propose to use CNN for classification. The new method is evaluated on conversational telephony speech (CTS, drawn from Switchboard and Fisher) data and UT-Opinion corpus. Our results shows that the new system improves the AUC (area under the curve) metric by 8.15% and 11.9% absolute for laughters, and 4.85% and 6.01% absolute for fillers, over the baseline system, for CTS and UT-Opinion data, respectively. Finally, we analyze the results to explain the difference in performance between traditional CTS data and naturalistic audio (UT-Opinion), and identify challenges that need to be addressed to make systems perform better for practical data. Copyright
Publisher: International Speech and Communication Association
ISSN: 2308-457X
Persistent Link: http://hdl.handle.net/10735.1/5058
Bibliographic Citation: Kaushik, L., A. Sangwan, and J. H. L. Hansen. 2015. "Laughter and filler detection in naturalistic audio." Interspeech 2015 (16th Annual Conference of the International Speech Communication Association), pp. 2509-2513.
Terms of Use: ©2015 ISCA. All rights reserved.
Sponsors: Partially supported by AFRL (contract # FA8750-12-0188) and NSF (grant # 1218159)

Files in this item

Files Size Format View
JECS-3626-4639.10.pdf 696.5Kb PDF View/Open Article

This item appears in the following Collection(s)


Show full item record