Reconstructing room scales with a single sound for augmented reality displays

Document Type

Article

Publication Date

1-1-2023

Abstract

Perception and reconstruction of our 3D physical environment is an essential task with broad applications for Augmented Reality (AR) displays. For example, reconstructed geometries are commonly leveraged for displaying 3D objects at accurate positions. While camera-captured images are a frequently used data source for realistically reconstructing 3D physical surroundings, they are limited to line-of-sight environments, requiring time-consuming and repetitive data-capture techniques to capture a full 3D picture. For instance, current AR devices require users to scan through a whole room to obtain its geometric sizes. This optical process is tedious and inapplicable when the space is occluded or inaccessible. Audio waves propagate through space by bouncing from different surfaces, but are not 'occluded' by a single object such as a wall, unlike light. In this research, we aim to ask the question ‘can one hear the size of a room?’. To answer that, we propose an approach for inferring room geometries only from a single sound, which we define as an audio wave sequence played from a single loud speaker, leveraging deep learning for decoding implicitly-carried spatial information from a single speaker-and-microphone system. Through a series of experiments and studies, our work demonstrates our method's effectiveness at inferring a 3D environment's spatial layout. Our work introduces a robust building block in multi-modal layout reconstruction.

Identifier

85142302052 (Scopus)

Publication Title

Journal of Information Display

External Full Text Location

https://doi.org/10.1080/15980316.2022.2145377

e-ISSN

21581606

ISSN

15980316

First Page

1

Last Page

12

Issue

1

Volume

24

Grant

2225861

Fund Ref

National Science Foundation

This document is currently not available here.

Share

COinS