Patents by Inventor Yiteng Huang
Yiteng Huang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11798533Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.Type: GrantFiled: April 2, 2021Date of Patent: October 24, 2023Assignee: GOOGLE LLCInventors: Joseph Caroselli, Jr., Yiteng Huang, Arun Narayanan
-
Patent number: 11741934Abstract: A multi-microphone device that can perform acoustic echo cancellation (AEC) without an external reference signal. The device uses the audio data from one of its microphones as a reference for purposes of AEC and acoustic noise cancellation (ANC). The device determines filter coefficients for an adaptive filter for ANC when cancelling one microphone signal from another microphone's signal. Those filter coefficients are buffered and delayed and then used for AEC operations cancelling one microphone signal from another microphone's signal. When desired audio (such as a wakeword, speech, or the like) is detected, the device may freeze the coefficients for purposes of performing AEC until the desired audio is complete. The device may then continue adapting and using the coefficients.Type: GrantFiled: March 30, 2022Date of Patent: August 29, 2023Assignee: Amazon Technologies, Inc.Inventors: Tao Zhang, Yiteng Huang
-
Publication number: 20230097197Abstract: A method (400) includes receiving, at a first processor (110) of a user device (102), streaming multi-channel audio (118) captured by an array of microphones (107), each channel (119) including respective audio features. For each channel, the method also includes processing, by the first processor, using a first stage hotword detector (210), the respective audio features to determine whether a hotword is detected. When the first stage hotword detector detects the hotword, the method also includes the first processor providing chomped raw audio data (212) to a second processor that processes, using a first noise cleaning algorithm (250), the chomped raw audio data to generate a clean monophonic audio chomp (260). The method also includes processing, by the second processor using a second stage hotword detector (220), the clean monophonic audio chomp to detect the hotword.Type: ApplicationFiled: April 8, 2020Publication date: March 30, 2023Applicant: Google LLCInventors: Yiteng Huang, Alexander H. Gruenstein
-
Publication number: 20230022800Abstract: A method (800) to detect a hotword in a spoken utterance (120) includes receiving a sequence of input frames (210) characterizing streaming multi-channel audio (118). Each channel (119) of the streaming multi-channel audio includes respective audio features (510) captured by a separate dedicated microphone (107). For each input frame, the method includes processing, using a three-dimensional (3D) single value decomposition filter (SVDF) input layer (302) of a memorized neural network (300), the respective audio features of each channel in parallel and generating a corresponding multi-channel audio feature representation (420) based on a concatenation of the respective audio features (344). The method also includes generating, using sequentially-stacked SVDF layers (350), a probability score (360) indicating a presence of a hotword in the audio. The method also includes determining whether the probability score satisfies a threshold and, when satisfied, initiating a wake-up process on a user device (102).Type: ApplicationFiled: January 15, 2020Publication date: January 26, 2023Applicant: Google LLCInventors: Jilong Wu, Yiteng Huang
-
Publication number: 20220392441Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.Type: ApplicationFiled: August 12, 2022Publication date: December 8, 2022Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
-
Publication number: 20220319498Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.Type: ApplicationFiled: April 2, 2021Publication date: October 6, 2022Inventors: Joseph Caroselli, JR., Yiteng Huang, Arun Narayanan
-
Patent number: 11417324Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.Type: GrantFiled: May 28, 2020Date of Patent: August 16, 2022Assignee: GOOGLE LLCInventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
-
Publication number: 20200294496Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.Type: ApplicationFiled: May 28, 2020Publication date: September 17, 2020Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
-
Patent number: 10706842Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. Various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.Type: GrantFiled: January 14, 2019Date of Patent: July 7, 2020Assignee: GOOGLE LLCInventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
-
Publication number: 20200066263Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.Type: ApplicationFiled: January 14, 2019Publication date: February 27, 2020Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
-
Patent number: 10045137Abstract: Techniques of performing acoustic echo cancellation involve providing a bi-magnitude filtering operation that performs a first filtering operation when a magnitude of an incoming audio signal to be output from a loudspeaker is less than a specified threshold and a second filtering operation when the magnitude of the incoming audio signal is greater than the threshold. The first filtering operation may take the form of a convolution between the incoming audio signal and a first impulse response function. The second filtering operation may take the form of a convolution between a nonlinear function of the incoming audio signal and a second impulse response function. For such a convolution, the bi-magnitude filtering operation involves providing, as the incoming audio signal, samples of the incoming audio signal over a specified window of time. The first and second impulse response functions may be determined from an input signal input into a microphone.Type: GrantFiled: June 30, 2017Date of Patent: August 7, 2018Assignee: Google LLCInventors: Jan Skoglund, Yiteng Huang, Alejandro Luebs
-
Publication number: 20180007482Abstract: Techniques of performing acoustic echo cancellation involve providing a bi-magnitude filtering operation that performs a first filtering operation when a magnitude of an incoming audio signal to be output from a loudspeaker is less than a specified threshold and a second filtering operation when the magnitude of the incoming audio signal is greater than the threshold. The first filtering operation may take the form of a convolution between the incoming audio signal and a first impulse response function. The second filtering operation may take the form of a convolution between a nonlinear function of the incoming audio signal and a second impulse response function. For such a convolution, the bi-magnitude filtering operation involves providing, as the incoming audio signal, samples of the incoming audio signal over a specified window of time. The first and second impulse response functions may be determined from an input signal input into a microphone.Type: ApplicationFiled: June 30, 2017Publication date: January 4, 2018Inventors: Jan Skoglund, Yiteng Huang, Alejandro Luebs
-
Publication number: 20170221502Abstract: Existing post-filtering methods for microphone array speech enhancement have two common deficiencies. First, they assume that noise is either white or diffuse and cannot deal with point interferers. Second, they estimate the post-filter coefficients using only two microphones at a time, performing averaging over all the microphones pairs, yielding a suboptimal solution. The provided method describes a post-filtering solution that implements signal models which handle white noise, diffuse noise, and point interferers. The method also implements a globally optimized least-squares approach of microphones in a microphone array, providing a more optimal solution than existing conventional methods. Experimental results demonstrate the described method outperforming conventional methods in various acoustic scenarios.Type: ApplicationFiled: February 3, 2016Publication date: August 3, 2017Applicant: Google Inc.Inventors: Yiteng HUANG, Alejandro LUEBS, Jan SKOGLUND, Willem Bastiaan KLEIJN
-
Patent number: 9721582Abstract: Existing post-filtering methods for microphone array speech enhancement have two common deficiencies. First, they assume that noise is either white or diffuse and cannot deal with point interferers. Second, they estimate the post-filter coefficients using only two microphones at a time, performing averaging over all the microphones pairs, yielding a suboptimal solution. The provided method describes a post-filtering solution that implements signal models which handle white noise, diffuse noise, and point interferers. The method also implements a globally optimized least-squares approach of microphones in a microphone array, providing a more optimal solution than existing conventional methods. Experimental results demonstrate the described method outperforming conventional methods in various acoustic scenarios.Type: GrantFiled: February 3, 2016Date of Patent: August 1, 2017Assignee: GOOGLE INC.Inventors: Yiteng Huang, Alejandro Luebs, Jan Skoglund, Willem Bastiaan Kleijn
-
Patent number: 8583429Abstract: A system and method may receive a single-channel speech input captured via a microphone. For each current frame of speech input, the system and method may (a) perform a time-frequency transformation on the input signal over L (L>1) frames including the current frame to obtain an extended observation vector of the current frame, data elements in the extended observation vector representing the coefficients of the time-frequency transformation of the L frames of the speech input, (b) compute second-order statistics of the extended observation vector and of noise, and (c) construct a noise reduction filter for the current frame of the speech input based on the second-order statistics of the extended observation vector and the second-order statistics of noise.Type: GrantFiled: February 1, 2011Date of Patent: November 12, 2013Assignee: Wevoice Inc.Inventors: Jacob Benesty, Yiteng Huang
-
Publication number: 20120197636Abstract: A system and method may receive a single-channel speech input captured via a microphone. For each current frame of speech input, the system and method may (a) perform a time-frequency transformation on the input signal over L (L>1) frames including the current frame to obtain an extended observation vector of the current frame, data elements in the extended observation vector representing the coefficients of the time-frequency transformation of the L frames of the speech input, (b) compute second-order statistics of the extended observation vector and of noise, and (c) construct a noise reduction filter for the current frame of the speech input based on the second-order statistics of the extended observation vector and the second-order statistics of noise.Type: ApplicationFiled: February 1, 2011Publication date: August 2, 2012Inventors: Jacob Benesty, Yiteng Huang
-
Publication number: 20050286664Abstract: An apparatus for mixing audio signals in a voice-over-IP teleconferencing environment comprises a preprocessor, a mixing controller, and a mixing processor. The preprocessor is divided into a media parameter estimator and a media preprocessor. The media parameter estimator estimates signal parameters such as signal-to-noise ratios, energy levels, and voice activity (i.e., the presence or absence of voice in the signal), which are used to control how different channels are mixed. The media preprocessor employs signal processing algorithms such as silence suppression, automatic gain control, and noise reduction, so that the quality of the incoming voice streams is optimized. Based on a function of the estimated signal parameters, the mixing controller specifies a particular mixing strategy and the mixing processor mixes the preprocessed voice streams according the strategy provided by the controller.Type: ApplicationFiled: June 24, 2004Publication date: December 29, 2005Inventors: Jingdong Chen, Yiteng Huang, Thomas Woo
-
Method and apparatus for passive acoustic source localization for video camera steering applications
Patent number: 6826284Abstract: A real-time passive acoustic source localization system for video camera steering advantageously determines the relative delay between the direct paths of two estimated channel impulse responses. The illustrative system employs an approach referred to herein as the “adaptive eigenvalue decomposition algorithm” (AEDA) to make such a determination, and then advantageously employs a “one-step least-squares algorithm” (OSLS) for purposes of acoustic source localization, providing the desired features of robustness, portability, and accuracy in a reverberant environment. The AEDA technique directly estimates the (direct path) impulse response from the sound source to each of a pair of microphones, and then uses these estimated impulse responses to determine the time delay of arrival (TDOA) between the two microphones by measuring the distance between the first peaks thereof (i.e., the first significant taps of the corresponding transfer functions).Type: GrantFiled: February 4, 2000Date of Patent: November 30, 2004Assignee: Agere Systems Inc.Inventors: Jacob Benesty, Gary Wayne Elko, Yiteng Huang