SAFARI: Speech-Associated Facial Authentication for AR/VR Settings via Robust VIbration Signatures
Document Type
Conference Proceeding
Publication Date
12-9-2024
Abstract
In AR/VR devices, the voice interface, serving as one of the primary AR/VR control mechanisms, enables users to interact naturally using speeches (voice commands) for accessing data, controlling applications, and engaging in remote communication/meetings. Voice authentication can be adopted to protect against unauthorized speech inputs. However, existing voice authentication mechanisms are usually susceptible to voice spoofing attacks and are unreliable under the variations of phonetic content. In this work, we propose SAFARI, a spoofing-resistant and text-independent speech authentication system that can be seamlessly integrated into AR/VR voice interfaces. The key idea is to elicit phonetic-invariant biometrics from the facial muscle vibrations upon the headset. During speech production, a user’s facial muscles are deformed for articulating phoneme sounds. The facial deformations associated with the phonemes are referred to as visemes. They carry rich biometrics of the wearer’s muscles, tissue, and bones, which can propagate through the head and vibrate the headset. SAFARI aims to derive reliable facial biometrics from the viseme-associated facial vibrations captured by the AR/VR motion sensors. Particularly, it identifies the vibration data segments that contain rich viseme patterns (prominent visemes) less susceptible to phonetic variations. Based on the prominent visemes, SAFARI learns on the correlations among facial vibrations of different frequencies to extract biometric representations invariant to the phonetic context. The key advantages of SAFARI are that it is suitable for commodity AR/VR headsets (no additional sensors) and is resistant to voice spoofing attacks as the conductive property of the facial vibrations prevents biometric disclosure via the air media or the audio channel. To mitigate the impacts of body motions in AR/VR scenarios, we also design a generative diffusion model trained to reconstruct the viseme patterns from the data distorted by motion artifacts. We conduct extensive experiments with two representative AR/VR headsets and 35 users under various usage and attack settings. We demonstrate that SAFARI can achieve over 96% true positive rate on verifying legitimate users while successfully rejecting different kinds of spoofing attacks with over 97% true negative rates.
Identifier
85211791756 (Scopus)
ISBN
[9798400706363]
Publication Title
CCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security
External Full Text Location
https://doi.org/10.1145/3658644.3670358
First Page
153
Last Page
167
Grant
IIS2311596
Fund Ref
National Science Foundation
Recommended Citation
Zhang, Tianfang; Ji, Qiufan; Ye, Zhengkun; Akanda, Md Mojibur Rahman Redoy; Mahdad, Ahmed Tanvir; Shi, Cong; Wang, Yan; Saxena, Nitesh; and Chen, Yingying, "SAFARI: Speech-Associated Facial Authentication for AR/VR Settings via Robust VIbration Signatures" (2024). Faculty Publications. 12.
https://digitalcommons.njit.edu/fac_pubs/12