Patents by Inventor Yiteng Huang

Yiteng Huang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11798533
    Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.
    Type: Grant
    Filed: April 2, 2021
    Date of Patent: October 24, 2023
    Assignee: GOOGLE LLC
    Inventors: Joseph Caroselli, Jr., Yiteng Huang, Arun Narayanan
  • Patent number: 11741934
    Abstract: A multi-microphone device that can perform acoustic echo cancellation (AEC) without an external reference signal. The device uses the audio data from one of its microphones as a reference for purposes of AEC and acoustic noise cancellation (ANC). The device determines filter coefficients for an adaptive filter for ANC when cancelling one microphone signal from another microphone's signal. Those filter coefficients are buffered and delayed and then used for AEC operations cancelling one microphone signal from another microphone's signal. When desired audio (such as a wakeword, speech, or the like) is detected, the device may freeze the coefficients for purposes of performing AEC until the desired audio is complete. The device may then continue adapting and using the coefficients.
    Type: Grant
    Filed: March 30, 2022
    Date of Patent: August 29, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Tao Zhang, Yiteng Huang
  • Publication number: 20230097197
    Abstract: A method (400) includes receiving, at a first processor (110) of a user device (102), streaming multi-channel audio (118) captured by an array of microphones (107), each channel (119) including respective audio features. For each channel, the method also includes processing, by the first processor, using a first stage hotword detector (210), the respective audio features to determine whether a hotword is detected. When the first stage hotword detector detects the hotword, the method also includes the first processor providing chomped raw audio data (212) to a second processor that processes, using a first noise cleaning algorithm (250), the chomped raw audio data to generate a clean monophonic audio chomp (260). The method also includes processing, by the second processor using a second stage hotword detector (220), the clean monophonic audio chomp to detect the hotword.
    Type: Application
    Filed: April 8, 2020
    Publication date: March 30, 2023
    Applicant: Google LLC
    Inventors: Yiteng Huang, Alexander H. Gruenstein
  • Publication number: 20230022800
    Abstract: A method (800) to detect a hotword in a spoken utterance (120) includes receiving a sequence of input frames (210) characterizing streaming multi-channel audio (118). Each channel (119) of the streaming multi-channel audio includes respective audio features (510) captured by a separate dedicated microphone (107). For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer (302) of a memorized neural network (300), the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation (420) based on a concatenation of the respective audio features (344). The method also includes generating, using sequentially-stacked SVDF layers (350), a probability score (360) indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device (102).
    Type: Application
    Filed: January 15, 2020
    Publication date: January 26, 2023
    Applicant: Google LLC
    Inventors: Jilong Wu, Yiteng Huang
  • Publication number: 20220392441
    Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
    Type: Application
    Filed: August 12, 2022
    Publication date: December 8, 2022
    Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
  • Publication number: 20220319498
    Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.
    Type: Application
    Filed: April 2, 2021
    Publication date: October 6, 2022
    Inventors: Joseph Caroselli, JR., Yiteng Huang, Arun Narayanan
  • Patent number: 11417324
    Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
    Type: Grant
    Filed: May 28, 2020
    Date of Patent: August 16, 2022
    Assignee: GOOGLE LLC
    Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
  • Publication number: 20200294496
    Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
    Type: Application
    Filed: May 28, 2020
    Publication date: September 17, 2020
    Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
  • Patent number: 10706842
    Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. Various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
    Type: Grant
    Filed: January 14, 2019
    Date of Patent: July 7, 2020
    Assignee: GOOGLE LLC
    Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
  • Publication number: 20200066263
    Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
    Type: Application
    Filed: January 14, 2019
    Publication date: February 27, 2020
    Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
  • Patent number: 10045137
    Abstract: Techniques of performing acoustic echo cancellation involve providing a bi-magnitude filtering operation that performs a first filtering operation when a magnitude of an incoming audio signal to be output from a loudspeaker is less than a specified threshold and a second filtering operation when the magnitude of the incoming audio signal is greater than the threshold. The first filtering operation may take the form of a convolution between the incoming audio signal and a first impulse response function. The second filtering operation may take the form of a convolution between a nonlinear function of the incoming audio signal and a second impulse response function. For such a convolution, the bi-magnitude filtering operation involves providing, as the incoming audio signal, samples of the incoming audio signal over a specified window of time. The first and second impulse response functions may be determined from an input signal input into a microphone.
    Type: Grant
    Filed: June 30, 2017
    Date of Patent: August 7, 2018
    Assignee: Google LLC
    Inventors: Jan Skoglund, Yiteng Huang, Alejandro Luebs
  • Publication number: 20180007482
    Abstract: Techniques of performing acoustic echo cancellation involve providing a bi-magnitude filtering operation that performs a first filtering operation when a magnitude of an incoming audio signal to be output from a loudspeaker is less than a specified threshold and a second filtering operation when the magnitude of the incoming audio signal is greater than the threshold. The first filtering operation may take the form of a convolution between the incoming audio signal and a first impulse response function. The second filtering operation may take the form of a convolution between a nonlinear function of the incoming audio signal and a second impulse response function. For such a convolution, the bi-magnitude filtering operation involves providing, as the incoming audio signal, samples of the incoming audio signal over a specified window of time. The first and second impulse response functions may be determined from an input signal input into a microphone.
    Type: Application
    Filed: June 30, 2017
    Publication date: January 4, 2018
    Inventors: Jan Skoglund, Yiteng Huang, Alejandro Luebs
  • Publication number: 20170221502
    Abstract: Existing post-filtering methods for microphone array speech enhancement have two common deficiencies. First, they assume that noise is either white or diffuse and cannot deal with point interferers. Second, they estimate the post-filter coefficients using only two microphones at a time, performing averaging over all the microphones pairs, yielding a suboptimal solution. The provided method describes a post-filtering solution that implements signal models which handle white noise, diffuse noise, and point interferers. The method also implements a globally optimized least-squares approach of microphones in a microphone array, providing a more optimal solution than existing conventional methods. Experimental results demonstrate the described method outperforming conventional methods in various acoustic scenarios.
    Type: Application
    Filed: February 3, 2016
    Publication date: August 3, 2017
    Applicant: Google Inc.
    Inventors: Yiteng HUANG, Alejandro LUEBS, Jan SKOGLUND, Willem Bastiaan KLEIJN
  • Patent number: 9721582
    Abstract: Existing post-filtering methods for microphone array speech enhancement have two common deficiencies. First, they assume that noise is either white or diffuse and cannot deal with point interferers. Second, they estimate the post-filter coefficients using only two microphones at a time, performing averaging over all the microphones pairs, yielding a suboptimal solution. The provided method describes a post-filtering solution that implements signal models which handle white noise, diffuse noise, and point interferers. The method also implements a globally optimized least-squares approach of microphones in a microphone array, providing a more optimal solution than existing conventional methods. Experimental results demonstrate the described method outperforming conventional methods in various acoustic scenarios.
    Type: Grant
    Filed: February 3, 2016
    Date of Patent: August 1, 2017
    Assignee: GOOGLE INC.
    Inventors: Yiteng Huang, Alejandro Luebs, Jan Skoglund, Willem Bastiaan Kleijn
  • Patent number: 8583429
    Abstract: A system and method may receive a single-channel speech input captured via a microphone. For each current frame of speech input, the system and method may (a) perform a time-frequency transformation on the input signal over L (L>1) frames including the current frame to obtain an extended observation vector of the current frame, data elements in the extended observation vector representing the coefficients of the time-frequency transformation of the L frames of the speech input, (b) compute second-order statistics of the extended observation vector and of noise, and (c) construct a noise reduction filter for the current frame of the speech input based on the second-order statistics of the extended observation vector and the second-order statistics of noise.
    Type: Grant
    Filed: February 1, 2011
    Date of Patent: November 12, 2013
    Assignee: Wevoice Inc.
    Inventors: Jacob Benesty, Yiteng Huang
  • Publication number: 20120197636
    Abstract: A system and method may receive a single-channel speech input captured via a microphone. For each current frame of speech input, the system and method may (a) perform a time-frequency transformation on the input signal over L (L>1) frames including the current frame to obtain an extended observation vector of the current frame, data elements in the extended observation vector representing the coefficients of the time-frequency transformation of the L frames of the speech input, (b) compute second-order statistics of the extended observation vector and of noise, and (c) construct a noise reduction filter for the current frame of the speech input based on the second-order statistics of the extended observation vector and the second-order statistics of noise.
    Type: Application
    Filed: February 1, 2011
    Publication date: August 2, 2012
    Inventors: Jacob Benesty, Yiteng Huang
  • Publication number: 20050286664
    Abstract: An apparatus for mixing audio signals in a voice-over-IP teleconferencing environment comprises a preprocessor, a mixing controller, and a mixing processor. The preprocessor is divided into a media parameter estimator and a media preprocessor. The media parameter estimator estimates signal parameters such as signal-to-noise ratios, energy levels, and voice activity (i.e., the presence or absence of voice in the signal), which are used to control how different channels are mixed. The media preprocessor employs signal processing algorithms such as silence suppression, automatic gain control, and noise reduction, so that the quality of the incoming voice streams is optimized. Based on a function of the estimated signal parameters, the mixing controller specifies a particular mixing strategy and the mixing processor mixes the preprocessed voice streams according the strategy provided by the controller.
    Type: Application
    Filed: June 24, 2004
    Publication date: December 29, 2005
    Inventors: Jingdong Chen, Yiteng Huang, Thomas Woo
  • Patent number: 6826284
    Abstract: A real-time passive acoustic source localization system for video camera steering advantageously determines the relative delay between the direct paths of two estimated channel impulse responses. The illustrative system employs an approach referred to herein as the “adaptive eigenvalue decomposition algorithm” (AEDA) to make such a determination, and then advantageously employs a “one-step least-squares algorithm” (OSLS) for purposes of acoustic source localization, providing the desired features of robustness, portability, and accuracy in a reverberant environment. The AEDA technique directly estimates the (direct path) impulse response from the sound source to each of a pair of microphones, and then uses these estimated impulse responses to determine the time delay of arrival (TDOA) between the two microphones by measuring the distance between the first peaks thereof (i.e., the first significant taps of the corresponding transfer functions).
    Type: Grant
    Filed: February 4, 2000
    Date of Patent: November 30, 2004
    Assignee: Agere Systems Inc.
    Inventors: Jacob Benesty, Gary Wayne Elko, Yiteng Huang