Patents by Inventor Joseph Caroselli

Joseph Caroselli has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Context aware beamforming of audio data

Patent number: 11798533

Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.

Type: Grant

Filed: April 2, 2021

Date of Patent: October 24, 2023

Assignee: GOOGLE LLC

Inventors: Joseph Caroselli, Jr., Yiteng Huang, Arun Narayanan
Microphone Array Configuration Invariant, Streaming, Multichannel Neural Enhancement Frontend for Automatic Speech Recognition

Publication number: 20230298612

Abstract: A multichannel neural frontend speech enhancement model for speech recognition includes a speech cleaner, a stack of self-attention blocks each having a multi-headed self attention mechanism, and a masking layer. The speech cleaner receives, as input, a multichannel noisy input signal and a multichannel contextual noise signal, and generates, as output, a single channel cleaned input signal. The stack of self-attention blocks receives, as input, at an initial block of the stack of self-attention blocks, a stacked input including the single channel cleaned input signal and a single channel noisy input signal, and generates, as output, from a final block of the stack of self-attention blocks, an un-masked output. The masking layer receives, as input, the single channel noisy input signal and the un-masked output, and generates, as output, enhanced input speech features corresponding to a target utterance.

Type: Application

Filed: February 20, 2023

Publication date: September 21, 2023

Applicant: Google LLC

Inventors: Joseph Caroselli, Arun Narayanan, Tom O'malley
Adaptive multichannel dereverberation for automatic speech recognition

Patent number: 11699453

Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

Type: Grant

Filed: August 28, 2020

Date of Patent: July 11, 2023

Assignee: GOOGLE LLC

Inventors: Joseph Caroselli, Arun Narayanan, Izhak Shafran, Richard Rose
CONTEXT AWARE BEAMFORMING OF AUDIO DATA

Publication number: 20220319498

Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.

Type: Application

Filed: April 2, 2021

Publication date: October 6, 2022

Inventors: Joseph Caroselli, JR., Yiteng Huang, Arun Narayanan
ADAPTIVE MULTICHANNEL DEREVERBERATION FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20200395029

Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

Type: Application

Filed: August 28, 2020

Publication date: December 17, 2020

Inventors: Joseph Caroselli, Arun Narayanan, Izhak Shafran, Richard Rose
Adaptive multichannel dereverberation for automatic speech recognition

Patent number: 10762914

Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

Type: Grant

Filed: July 11, 2018

Date of Patent: September 1, 2020

Assignee: GOOGLE LLC

Inventors: Joseph Caroselli, Arun Narayanan, Izhak Shafran, Richard Rose
ADAPTIVE MULTICHANNEL DEREVERBERATION FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20190272840

Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

Type: Application

Filed: July 11, 2018

Publication date: September 5, 2019

Inventors: Joseph Caroselli, Arun Narayanan, Izhak Shafran, Richard Rose
Serial data link using decision feedback equalization

Patent number: 7440497

Abstract: A multi-phase adaptive decision feedback equalizer minimizes post-cursor inter-symbol interference in a current data bit based on values of subsequent data bits in a data communication system. In one form, the receiver includes a plurality of modules each having a respective adaptive decision feedback equalizer. A processor responsive to output signals from each of the plurality of modules generates a plurality of coefficient values. The adaptive decision feedback equalizer has a plurality of taps receiving a respective output signal from one of the modules and a respective coefficient value to generate a respective correction signal. The correction signals are summed with the data signal and processed to recover the data. Pre-calculation of coefficients permits rapid selection of data. Multi-phase operation permits higher data frequencies.

Type: Grant

Filed: November 1, 2004

Date of Patent: October 21, 2008

Assignee: LSI Corporation

Inventors: Vishnu Balan, Joseph Caroselli, Jr., Ye Liu, Chintan M. Desai, Jenn-Gang Chern
Serial data link using decision feedback equalization

Publication number: 20060093028

Abstract: A multi-phase adaptive decision feedback equalizer minimizes post-cursor inter-symbol interference in a current data bit based on values of subsequent data bits in a data communication system. In one form, the receiver includes a plurality of modules each having a respective adaptive decision feedback equalizer. A processor responsive to output signals from each of the plurality of modules generates a plurality of coefficient values. The adaptive decision feedback equalizer has a plurality of taps receiving a respective output signal from one of the modules and a respective coefficient value to generate a respective correction signal. The correction signals are summed with the data signal and processed to recover the data. Pre-calculation of coefficients permits rapid selection of data. Multi-phase operation permits higher data frequencies.

Type: Application

Filed: November 1, 2004

Publication date: May 4, 2006

Applicant: LSI Logic Corporation

Inventors: Vishnu Balan, Joseph Caroselli, Ye Liu, Chintan Desai, Jenn-Gang Chern