When:
November 25, 2025 @ 10:45 am – 12:00 pm
2025-11-25T10:45:00-05:00
2025-11-25T12:00:00-05:00
Where:
Malone Hall 107, Johns Hopkins University

Title: Socially Pertinent Robots in Gerontological Healthcare: The Audio Pipeline

Abstract: Social robots are autonomous machines designed to interact with humans using social cues such as voice, gestures, and facial expressions. They have remained uncommon in public spaces, despite expectations raised more than 25 years ago. Apart from basic robotic skills like safe navigation and obstacle avoidance, they need to facilitate natural verbal and nonverbal communication with multiple people in dynamic, noisy environments. Deployments in museums, airports, libraries, shopping malls, bars, and hospitals have reported positive results. However, many studies relied on a Wizard-of-Oz setup where researchers controlled the navigation and dialogue. Systems that operated independently were often stationary, limited to closed domains, or tuned for single-user interaction. These limitations did not align with the demands of public spaces.

The EU H2020 SPRING project addressed these challenges by developing software that allows social robots to operate effectively in complex, unstructured public environments. The team comprised eight groups, including a gerontology hospital with research facilities. The software, including the audio pipeline, was installed on the humanoid ARI robot from PAL Robotics, which stands 1.65 meters tall.

In this talk, I will discuss the audio pipeline developed in SPRING and its role in supporting operations in public spaces. The pipeline includes noise reduction, direction-of-arrival estimation, multi-microphone beamforming, single-channel concurrent speaker separation, speaker extraction with enrollment, speaker identification, audio-visual concurrent speaker detection, emotion recognition from audio and video, and voice generation from video. I will conclude the talk with results from real deployments, demonstrating perceived usefulness and acceptance when robustness and adaptability are maintained. Project page: https://spring-h2020.eu/

Bio: Sharon Gannot is a Full Professor and Vice Dean in the Faculty of Engineering at Bar-Ilan University, where he heads the Data Science Program. He received the B.Sc. (summa cum laude) from the Technion and the M.Sc. (cum laude) and Ph.D. from Tel-Aviv University, followed by a postdoctoral fellowship at KU Leuven. His research focuses on statistical signal processing and machine learning for speech and audio, and he has authored more than 350 peer-reviewed publications on these topics. Among his editorial roles, he is Editor-in-Chief of Speech Communication, serves on the Senior Editorial Board of IEEE Signal Processing Magazine, is an Associate Editor for the IEEE-SPS Education Center, and has served as Senior Area Chair for IEEE/ACM TASLP (2013–2017; 2020–2025). Among his leadership roles, he chaired the IEEE-SPS Audio and Acoustic Signal Processing Technical Committee (2017–2018) and leads the SPS Data Science Initiative (since 2022); he also served as General Co-Chair of IWAENC 2010, WASPAA 2013, and Interspeech 2024. His recognitions include 13 best-paper awards, BIU teaching and research prizes, the 2018 Rector Innovation Award, the 2022 EURASIP Group Technical Achievement Award, and IEEE Fellow.

Zoom: https://wse.zoom.us/j/94133581517?pwd=pP3O0IlX4O90PyMKBaH3Egq2vADjso.1
Meeting ID: 941 3358 1517
Passcode: 065467