Visualization of Speech Prosody and Emotion in Captions: Accessibility for Deaf and Hard-of-Hearing Users
Document Type
Conference Proceeding
Publication Date
4-19-2023
Abstract
Speech is expressive in ways that caption text does not capture, with emotion or emphasis information not conveyed. We interviewed eight Deaf and Hard-of-Hearing (dhh) individuals to understand if and how captions' inexpressiveness impacts them in online meetings with hearing peers. Automatically captioned speech, we found, lacks affective depth, lending it a hard-to-parse ambiguity and general dullness. Interviewees regularly feel excluded, which some understand is an inherent quality of these types of meetings rather than a consequence of current caption text design. Next, we developed three novel captioning models that depicted, beyond words, features from prosody, emotions, and a mix of both. In an empirical study, 16 dhh participants compared these models with conventional captions. The emotion-based model outperformed traditional captions in depicting emotions and emphasis, with only a moderate loss in legibility, suggesting its potential as a more inclusive design for captions.
Identifier
85160009689 (Scopus)
ISBN
[9781450394215]
Publication Title
Conference on Human Factors in Computing Systems Proceedings
External Full Text Location
https://doi.org/10.1145/3544548.3581511
Grant
1954284
Fund Ref
National Science Foundation
Recommended Citation
De Lacerda Pataca, Caluã; Watkins, Matthew; Peiris, Roshan; Lee, Sooyeon; and Huenerfauth, Matt, "Visualization of Speech Prosody and Emotion in Captions: Accessibility for Deaf and Hard-of-Hearing Users" (2023). Faculty Publications. 1776.
https://digitalcommons.njit.edu/fac_pubs/1776