KAIST College of Engineering, 카이스트 공과대학 (전산학부, 기계공학과, 항공우주공학과, 전기및전자공학부, 건설및환경공학과, 바이오및뇌공학과, 산업디자인학과, 산업및시스템공학과, 생명화학공학과, 신소재공학과, 원자력및양자공학과, 조천식녹색교통대학원, EEWS대학원, AI대학원)

Improving speech intelligibility with Privacy-Preserving Augmented Reality

Post Date 2022-02-08

Privacy-Preserving AR system can augment the speaker's speech with real-life subtitles to overcome the loss of contextual cues caused by mask-wearing and social distancing during the COVID-19 pandemic.

Degraded speech intelligibility induces face-to-face conversation participants to speak louder and more distinctively, exposing the content to potential eavesdroppers. Similarly, people with face masks deteriorate their speech intelligibility, especially during the post-covid-19 crisis. Augmented Reality (AR) can serve as an effective tool to visualise people conversations and promote speech intelligibility, known as speech augmentation. However, visualised conversations without proper privacy management can expose AR users to privacy risks.

An international research team of Prof. Lik-Hang LEE in the Department of Industrial and Systems Engineering at KAIST and Prof. Pan HUI in Computational Media and Arts at Hong Kong University of Science and Technology employed a conversation-oriented Contextual Integrity (CI) principle to develop a privacy-preserving AR framework for speech augmentation. At its core, the framework, namely Theophany, establishes ad-hoc social networks between relevant conversation participants to exchange contextual information and improve speech intelligibility in real-time.

< Figure 1: A real-life subtitle application with AR headsets >

Theophany has been implemented as a real-life subtitle application in AR to improve speech intelligibility in daily conversations (Figure 1). This implementation leverages a multi-modal channel, such as eye-tracking, camera, and audio. Theophany transforms the user's speech into text and estimates the intended recipients through gaze detection. The CI Enforcer module evaluates the sentences' sensitivity. If the sensitivity meets the speaker's privacy threshold, the sentence is transmitted to the appropriate recipients (Figure 2).

< Figure 2: Multi-modal Contextual Integrity Channel >

Based on the principles of Contextual Integrity (CI), parameters of privacy perception are designed for privacy-preserving face-to-face conversations, such as topic, location, and participants. Accordingly, Theophany operation depends on the topic and session. Figure 3 demonstrates several illustrative conversation sessions: (a) the topic is not sensitive and transmitted to everybody in the user's gaze. (b) the topic is work-sensitive and only transmitted to the coworker. (c) the topic is sensitive and only transmitted to the friend in the user's gaze. A new friend entering the user's gaze only gets the textual transcription once a new session (topic) starts (d). (e) the topic is highly sensitive, and nobody gets the textual transcription.

< Figure 3: Speech Augmentation in five illustrative sessions >

Theophany within a prototypical AR system augments the speaker's speech with real-life subtitles to overcome the loss of contextual cues caused by mask-wearing and social distancing during the COVID-19 pandemic. The research was published in ACM Multimedia under the title of ＇Theophany: Multi-modal Speech Augmentation in Instantaneous Privacy Channels' (DOI: 10.1145/3474085.3475507), being selected as one of the best paper award candidates (Top 5). Note that the first author is an alumnus from the Industrial and Systems Engineering Department at KAIST.

Short Bio:

Lik-Hang Lee received a PhD degree from SyMLab, Hong Kong University of Science and Technology, and the Bachelor's and M.Phil. degrees from the University of Hong Kong. He is currently an assistant professor (tenure-track) with the Korea Advanced Institute of Science and Technology (KAIST), South Korea, and the head of the Augmented Reality and Media Laboratory, KAIST. He has built and designed various human-centric computing specializing in augmented and virtual realities (AR/VR). In recent years, he has published more than 30 research papers on AR/VR at prestigious conferences such as ACM WWW, ACM IMWUT, ACM Multimedia, ACM CSUR, IEEE Percom, and so on. He also serves the research community, as TPCs, PCs and workshop organizers, at some prestigious venues, such as AAAI, IJCAI, IEEE PERCOM, ACM CHI, ACM Multimedia, ACM IMWUT, IEEE VR, etc.

Photo:

LIST

Kaist

News

Improving speech intelligibility with Privacy-Preserving Augmented Reality