Patents by Inventor Yiteng Huang

Yiteng Huang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Context aware beamforming of audio data

Patent number: 11798533

Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.

Type: Grant

Filed: April 2, 2021

Date of Patent: October 24, 2023

Assignee: GOOGLE LLC

Inventors: Joseph Caroselli, Jr., Yiteng Huang, Arun Narayanan
Reference free acoustic echo cancellation

Patent number: 11741934

Abstract: A multi-microphone device that can perform acoustic echo cancellation (AEC) without an external reference signal. The device uses the audio data from one of its microphones as a reference for purposes of AEC and acoustic noise cancellation (ANC). The device determines filter coefficients for an adaptive filter for ANC when cancelling one microphone signal from another microphone's signal. Those filter coefficients are buffered and delayed and then used for AEC operations cancelling one microphone signal from another microphone's signal. When desired audio (such as a wakeword, speech, or the like) is detected, the device may freeze the coefficients for purposes of performing AEC until the desired audio is complete. The device may then continue adapting and using the coefficients.

Type: Grant

Filed: March 30, 2022

Date of Patent: August 29, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Tao Zhang, Yiteng Huang
Cascade Architecture for Noise-Robust Keyword Spotting

Publication number: 20230097197

Abstract: A method (400) includes receiving, at a first processor (110) of a user device (102), streaming multi-channel audio (118) captured by an array of microphones (107), each channel (119) including respective audio features. For each channel, the method also includes processing, by the first processor, using a first stage hotword detector (210), the respective audio features to determine whether a hotword is detected. When the first stage hotword detector detects the hotword, the method also includes the first processor providing chomped raw audio data (212) to a second processor that processes, using a first noise cleaning algorithm (250), the chomped raw audio data to generate a clean monophonic audio chomp (260). The method also includes processing, by the second processor using a second stage hotword detector (220), the clean monophonic audio chomp to detect the hotword.

Type: Application

Filed: April 8, 2020

Publication date: March 30, 2023

Applicant: Google LLC

Inventors: Yiteng Huang, Alexander H. Gruenstein
Small Footprint Multi-Channel Keyword Spotting

Publication number: 20230022800

Abstract: A method (800) to detect a hotword in a spoken utterance (120) includes receiving a sequence of input frames (210) characterizing streaming multi-channel audio (118). Each channel (119) of the streaming multi-channel audio includes respective audio features (510) captured by a separate dedicated microphone (107). For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer (302) of a memorized neural network (300), the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation (420) based on a concatenation of the respective audio features (344). The method also includes generating, using sequentially-stacked SVDF layers (350), a probability score (360) indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device (102).

Type: Application

Filed: January 15, 2020

Publication date: January 26, 2023

Applicant: Google LLC

Inventors: Jilong Wu, Yiteng Huang
SELECTIVE ADAPTATION AND UTILIZATION OF NOISE REDUCTION TECHNIQUE IN INVOCATION PHRASE DETECTION

Publication number: 20220392441

Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.

Type: Application

Filed: August 12, 2022

Publication date: December 8, 2022

Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
CONTEXT AWARE BEAMFORMING OF AUDIO DATA

Publication number: 20220319498

Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.

Type: Application

Filed: April 2, 2021

Publication date: October 6, 2022

Inventors: Joseph Caroselli, JR., Yiteng Huang, Arun Narayanan
Selective adaptation and utilization of noise reduction technique in invocation phrase detection

Patent number: 11417324

Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.

Type: Grant

Filed: May 28, 2020

Date of Patent: August 16, 2022

Assignee: GOOGLE LLC

Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
SELECTIVE ADAPTATION AND UTILIZATION OF NOISE REDUCTION TECHNIQUE IN INVOCATION PHRASE DETECTION

Publication number: 20200294496

Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.

Type: Application

Filed: May 28, 2020

Publication date: September 17, 2020

Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
Selective adaptation and utilization of noise reduction technique in invocation phrase detection

Patent number: 10706842

Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. Various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.

Type: Grant

Filed: January 14, 2019

Date of Patent: July 7, 2020

Assignee: GOOGLE LLC

Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
SELECTIVE ADAPTATION AND UTILIZATION OF NOISE REDUCTION TECHNIQUE IN INVOCATION PHRASE DETECTION

Publication number: 20200066263

Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.

Type: Application

Filed: January 14, 2019

Publication date: February 27, 2020

Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
Bi-magnitude processing framework for nonlinear echo cancellation in mobile devices

Patent number: 10045137

Abstract: Techniques of performing acoustic echo cancellation involve providing a bi-magnitude filtering operation that performs a first filtering operation when a magnitude of an incoming audio signal to be output from a loudspeaker is less than a specified threshold and a second filtering operation when the magnitude of the incoming audio signal is greater than the threshold. The first filtering operation may take the form of a convolution between the incoming audio signal and a first impulse response function. The second filtering operation may take the form of a convolution between a nonlinear function of the incoming audio signal and a second impulse response function. For such a convolution, the bi-magnitude filtering operation involves providing, as the incoming audio signal, samples of the incoming audio signal over a specified window of time. The first and second impulse response functions may be determined from an input signal input into a microphone.

Type: Grant

Filed: June 30, 2017

Date of Patent: August 7, 2018

Assignee: Google LLC

Inventors: Jan Skoglund, Yiteng Huang, Alejandro Luebs
BI-MAGNITUDE PROCESSING FRAMEWORK FOR NONLINEAR ECHO CANCELLATION IN MOBILE DEVICES

Publication number: 20180007482

Abstract: Techniques of performing acoustic echo cancellation involve providing a bi-magnitude filtering operation that performs a first filtering operation when a magnitude of an incoming audio signal to be output from a loudspeaker is less than a specified threshold and a second filtering operation when the magnitude of the incoming audio signal is greater than the threshold. The first filtering operation may take the form of a convolution between the incoming audio signal and a first impulse response function. The second filtering operation may take the form of a convolution between a nonlinear function of the incoming audio signal and a second impulse response function. For such a convolution, the bi-magnitude filtering operation involves providing, as the incoming audio signal, samples of the incoming audio signal over a specified window of time. The first and second impulse response functions may be determined from an input signal input into a microphone.

Type: Application

Filed: June 30, 2017

Publication date: January 4, 2018

Inventors: Jan Skoglund, Yiteng Huang, Alejandro Luebs
GLOBALLY OPTIMIZED LEAST-SQUARES POST-FILTERING FOR SPEECH ENHANCEMENT

Publication number: 20170221502

Abstract: Existing post-filtering methods for microphone array speech enhancement have two common deficiencies. First, they assume that noise is either white or diffuse and cannot deal with point interferers. Second, they estimate the post-filter coefficients using only two microphones at a time, performing averaging over all the microphones pairs, yielding a suboptimal solution. The provided method describes a post-filtering solution that implements signal models which handle white noise, diffuse noise, and point interferers. The method also implements a globally optimized least-squares approach of microphones in a microphone array, providing a more optimal solution than existing conventional methods. Experimental results demonstrate the described method outperforming conventional methods in various acoustic scenarios.

Type: Application

Filed: February 3, 2016

Publication date: August 3, 2017

Applicant: Google Inc.

Inventors: Yiteng HUANG, Alejandro LUEBS, Jan SKOGLUND, Willem Bastiaan KLEIJN
Globally optimized least-squares post-filtering for speech enhancement

Patent number: 9721582

Abstract: Existing post-filtering methods for microphone array speech enhancement have two common deficiencies. First, they assume that noise is either white or diffuse and cannot deal with point interferers. Second, they estimate the post-filter coefficients using only two microphones at a time, performing averaging over all the microphones pairs, yielding a suboptimal solution. The provided method describes a post-filtering solution that implements signal models which handle white noise, diffuse noise, and point interferers. The method also implements a globally optimized least-squares approach of microphones in a microphone array, providing a more optimal solution than existing conventional methods. Experimental results demonstrate the described method outperforming conventional methods in various acoustic scenarios.

Type: Grant

Filed: February 3, 2016

Date of Patent: August 1, 2017

Assignee: GOOGLE INC.

Inventors: Yiteng Huang, Alejandro Luebs, Jan Skoglund, Willem Bastiaan Kleijn
System and method for single-channel speech noise reduction

Patent number: 8583429

Abstract: A system and method may receive a single-channel speech input captured via a microphone. For each current frame of speech input, the system and method may (a) perform a time-frequency transformation on the input signal over L (L>1) frames including the current frame to obtain an extended observation vector of the current frame, data elements in the extended observation vector representing the coefficients of the time-frequency transformation of the L frames of the speech input, (b) compute second-order statistics of the extended observation vector and of noise, and (c) construct a noise reduction filter for the current frame of the speech input based on the second-order statistics of the extended observation vector and the second-order statistics of noise.

Type: Grant

Filed: February 1, 2011

Date of Patent: November 12, 2013

Assignee: Wevoice Inc.

Inventors: Jacob Benesty, Yiteng Huang
SYSTEM AND METHOD FOR SINGLE-CHANNEL SPEECH NOISE REDUCTION

Publication number: 20120197636

Abstract: A system and method may receive a single-channel speech input captured via a microphone. For each current frame of speech input, the system and method may (a) perform a time-frequency transformation on the input signal over L (L>1) frames including the current frame to obtain an extended observation vector of the current frame, data elements in the extended observation vector representing the coefficients of the time-frequency transformation of the L frames of the speech input, (b) compute second-order statistics of the extended observation vector and of noise, and (c) construct a noise reduction filter for the current frame of the speech input based on the second-order statistics of the extended observation vector and the second-order statistics of noise.

Type: Application

Filed: February 1, 2011

Publication date: August 2, 2012

Inventors: Jacob Benesty, Yiteng Huang
Data-driven method and apparatus for real-time mixing of multichannel signals in a media server

Publication number: 20050286664

Abstract: An apparatus for mixing audio signals in a voice-over-IP teleconferencing environment comprises a preprocessor, a mixing controller, and a mixing processor. The preprocessor is divided into a media parameter estimator and a media preprocessor. The media parameter estimator estimates signal parameters such as signal-to-noise ratios, energy levels, and voice activity (i.e., the presence or absence of voice in the signal), which are used to control how different channels are mixed. The media preprocessor employs signal processing algorithms such as silence suppression, automatic gain control, and noise reduction, so that the quality of the incoming voice streams is optimized. Based on a function of the estimated signal parameters, the mixing controller specifies a particular mixing strategy and the mixing processor mixes the preprocessed voice streams according the strategy provided by the controller.

Type: Application

Filed: June 24, 2004

Publication date: December 29, 2005

Inventors: Jingdong Chen, Yiteng Huang, Thomas Woo
Method and apparatus for passive acoustic source localization for video camera steering applications

Patent number: 6826284

Abstract: A real-time passive acoustic source localization system for video camera steering advantageously determines the relative delay between the direct paths of two estimated channel impulse responses. The illustrative system employs an approach referred to herein as the “adaptive eigenvalue decomposition algorithm” (AEDA) to make such a determination, and then advantageously employs a “one-step least-squares algorithm” (OSLS) for purposes of acoustic source localization, providing the desired features of robustness, portability, and accuracy in a reverberant environment. The AEDA technique directly estimates the (direct path) impulse response from the sound source to each of a pair of microphones, and then uses these estimated impulse responses to determine the time delay of arrival (TDOA) between the two microphones by measuring the distance between the first peaks thereof (i.e., the first significant taps of the corresponding transfer functions).

Type: Grant

Filed: February 4, 2000

Date of Patent: November 30, 2004

Assignee: Agere Systems Inc.

Inventors: Jacob Benesty, Gary Wayne Elko, Yiteng Huang