Patents by Inventor Michael IUZZOLINO

Michael IUZZOLINO has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Audio-visual speech enhancement

Patent number: 11244696

Abstract: Example speech enhancement systems include a spatio-temporal residual network configured to receive video data containing a target speaker and extract visual features from the video data, an autoencoder configured to receive input of an audio spectrogram and extract audio features from the audio spectrogram, and a squeeze-excitation fusion block configured to receive input of visual features from a layer of the spatio-temporal residual network and input of audio features from a layer of the autoencoder, and to provide an output to the decoder of the autoencoder. The decoder is configured to output a mask configured based upon the fusion of audio features and visual features by the squeeze-excitation fusion block, and the instructions are executable to apply the mask to the audio spectrogram to generate an enhanced magnitude spectrogram, and to reconstruct an enhanced waveform from the enhanced magnitude spectrogram.

Type: Grant

Filed: February 5, 2020

Date of Patent: February 8, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kazuhito Koishida, Michael Iuzzolino
AUDIO-VISUAL SPEECH ENHANCEMENT

Publication number: 20210134312

Abstract: Example speech enhancement systems include a spatio-temporal residual network configured to receive video data containing a target speaker and extract visual features from the video data, an autoencoder configured to receive input of an audio spectrogram and extract audio features from the audio spectrogram, and a squeeze-excitation fusion block configured to receive input of visual features from a layer of the spatio-temporal residual network and input of audio features from a layer of the autoencoder, and to provide an output to the decoder of the autoencoder. The decoder is configured to output a mask configured based upon the fusion of audio features and visual features by the squeeze-excitation fusion block, and the instructions are executable to apply the mask to the audio spectrogram to generate an enhanced magnitude spectrogram, and to reconstruct an enhanced waveform from the enhanced magnitude spectrogram.

Type: Application

Filed: February 5, 2020

Publication date: May 6, 2021

Applicant: Microsoft Technology Licensing, LLC

Inventors: Kazuhito KOISHIDA, Michael IUZZOLINO

Audio-visual speech enhancement

AUDIO-VISUAL SPEECH ENHANCEMENT