Patents by Inventor Yongtao Sha

Yongtao Sha has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Concurrent multi-path processing of audio signals for automatic speech recognition systems

Patent number: 12080274

Abstract: A system and method for concurrent multi-path processing of audio signals for automatic speech recognition is presented. Audio information defining a set of audio signals may be obtained (502). The audio signals may convey mixed audio content produced by multiple audio sources. A set of source-specific audio signals may be determined by demixing the mixed audio content produced by the multiple audio sources. Determining the set of source-specific audio signals may comprises providing the set of audio signals to both a first signal processing path and a second signal processing path (504). The first signal processing path may determine a value of a demixing parameter for demixing the mixed audio content (506). The second signal processing path may apply the value of the demixing parameter to the individual audio signals of the set of audio signals (508) to generate the individual source-specific audio signals (510).

Type: Grant

Filed: February 28, 2019

Date of Patent: September 3, 2024

Assignee: Beijing DiDi Infinity Technology and Development Co., Ltd.

Inventors: Yi Zhang, Hui Song, Yongtao Sha, Chengyun Deng
METHOD AND SYSTEM FOR ACOUSTIC ECHO CANCELLATION

Publication number: 20230094630

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for acoustic echo cancellation and suppression are provided. An exemplary method comprises receiving a far-end acoustic signal and a corrupted near-end acoustic signal, wherein the corrupted near-end acoustic signal is generated based on (1) an echo of the far-end acoustic signal and (2) a near-end acoustic signal; feeding the far-end acoustic signal and the corrupted near-end acoustic signal into a neural network as an input to output a time-frequency (TF) mask that suppresses the echo and retains the near-end acoustic signal, and generating an enhanced version of the corrupted near-end acoustic signal by applying the obtained TF mask to the corrupted near-end acoustic signal.

Type: Application

Filed: December 6, 2022

Publication date: March 30, 2023

Applicant: BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD.

Inventors: Yi ZHANG, Chengyun DENG, Shiqian MA, Yongtao SHA, Hui SONG
Multi-task deep network for echo path delay estimation and echo cancellation

Patent number: 11508351

Abstract: A method of echo path delay destination and echo cancellation is described in this disclosure. The method includes: obtaining a reference signal, a microphone signal, and a trained multi-task deep neural network, wherein the multi-task deep neural network comprises a first neural network and a second neural network; generating, using the first neural network of the multi-task deep neural network, an estimated echo path delay based on the reference signal and the microphone signal; updating the reference signal based on the estimated echo path delay; and generating, using the second neural network of the multi-task deep neural network, an enhanced microphone signal based on the microphone signal and the updated reference signal.

Type: Grant

Filed: March 1, 2021

Date of Patent: November 22, 2022

Assignee: Beijing DiDi Infinity Technology and Development Co., Ltd.

Inventors: Yi Zhang, Chengyun Deng, Shiqian Ma, Yongtao Sha, Hui Song
MULTI-TASK DEEP NETWORK FOR ECHO PATH DELAY ESTIMATION AND ECHO CANCELLATION

Publication number: 20220277721

Abstract: A method of echo path delay destination and echo cancellation is described in this disclosure. The method includes: obtaining a reference signal, a microphone signal, and a trained multi-task deep neural network, wherein the multi-task deep neural network comprises a first neural network and a second neural network; generating, using the first neural network of the multi-task deep neural network, an estimated echo path delay based on the reference signal and the microphone signal; updating the reference signal based on the estimated echo path delay; and generating, using the second neural network of the multi-task deep neural network, an enhanced microphone signal based on the microphone signal and the updated reference signal.

Type: Application

Filed: March 1, 2021

Publication date: September 1, 2022

Inventors: Yi ZHANG, Chengyun DENG, Shiqian MA, Yongtao SHA, Hui SONG
Systems and methods for enhancing audio signals

Patent number: 11393488

Abstract: Embodiments of the disclosure provide systems and methods for enhancing audio signals. The system may include a communication interface configured to receive multi-channel audio signals acquired from a common signal source. The system may further include at least one processor. The at least one processor may be configured to separate the multi-channel audio signals into a first audio signal and a second audio signal in a time domain. The at least one processor may be further configured to decompose the first audio signal and the second audio signal in a frequency domain to obtain a first decomposition data and a second decomposition data, respectively. The at least one processor may be also configured to estimate a noise component in the frequency domain based on the first decomposition data and the second decomposition data. The at least one processor may be additionally configured to enhance the first audio signal based on the estimated noise component.

Type: Grant

Filed: April 24, 2020

Date of Patent: July 19, 2022

Assignee: BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD.

Inventors: Yi Zhang, Hui Song, Chengyun Deng, Yongtao Sha
CONCURRENT MULTI-PATH PROCESSING OF AUDIO SIGNALS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS

Publication number: 20220139368

Abstract: A system and method for concurrent multi-path processing of audio signals for automatic speech recognition is presented. Audio information defining a set of audio signals may be obtained (502). The audio signals may convey mixed audio content produced by multiple audio sources. A set of source-specific audio signals may be determined by demixing the mixed audio content produced by the multiple audio sources. Determining the set of source-specific audio signals may comprises providing the set of audio signals to both a first signal processing path and a second signal processing path (504). The first signal processing path may determine a value of a demixing parameter for demixing the mixed audio content (506). The second signal processing path may apply the value of the demixing parameter to the individual audio signals of the set of audio signals (508) to generate the individual source-specific audio signals (510).

Type: Application

Filed: February 28, 2019

Publication date: May 5, 2022

Inventors: Yi ZHANG, Hui SONG, Yongtao SHA, Chengyun DENG
Systems and methods for audio signal processing using spectral-spatial mask estimation

Patent number: 11289109

Abstract: Embodiments of the disclosure provide systems and methods for audio signal processing. An exemplary system may include a communication interface configured to receiving a first audio signal acquired from an audio source through a first channel, and a second audio signal acquired from the same audio source through a second channel. The system may also include at least one processor coupled to the communication interface. The at least one processor may be configured to determine channel features based on the first audio signal and the second audio signal individually and determine a cross-channel feature based on the first audio signal and the second audio signal collectively. The at least one processor may further be configured to concatenate the channel features and the cross-channel feature and estimate spectral-spatial masks for the first channel and the second channel using the concatenated channel features and the cross-channel feature.

Type: Grant

Filed: April 24, 2020

Date of Patent: March 29, 2022

Assignee: BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD.

Inventors: Chengyun Deng, Hui Song, Yi Zhang, Yongtao Sha
Speech communication system and method for improving speech intelligibility

Patent number: 11227622

Abstract: A speech communication system for improving speech intelligibility may comprise one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to perform: determining a cutoff frequency based on an estimation of a spectrum of noise, wherein the cutoff frequency defines a noise dominant region of frequency; lifting a spectrum of a speech above the noise dominant region of frequency, wherein a frequency range of the spectrum of the speech increases by the cutoff frequency; and applying an adaptive filter to the speech to achieve echo cancelation, wherein the adaptive filter is controlled by a volume of the noise.

Type: Grant

Filed: December 6, 2018

Date of Patent: January 18, 2022

Assignee: Beijing DiDi Infinity Technology and Development Co., Ltd.

Inventors: Yi Zhang, Hui Song, Yongtao Sha, Si Qin
SPEECH COMMUNICATION SYSTEM AND METHOD FOR IMPROVING SPEECH INTELLIGIBILITY

Publication number: 20210225388

Abstract: A speech communication system for improving speech intelligibility may comprise one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to perform: determining a cutoff frequency based on an estimation of a spectrum of noise, wherein the cutoff frequency defines a noise dominant region of frequency; lifting a spectrum of a speech above the noise dominant region of frequency, wherein a frequency range of the spectrum of the speech increases by the cutoff frequency; and applying an adaptive filter to the speech to achieve echo cancelation, wherein the adaptive filter is controlled by a volume of the noise.

Type: Application

Filed: December 6, 2018

Publication date: July 22, 2021

Inventors: Yi ZHANG, Hui SONG, Yongtao SHA, Si QIN
SYSTEMS AND METHODS FOR ADUIO SIGNAL PROCESSING USING SPECTRAL-SPATIAL MASK ESTIMATION

Publication number: 20200342891

Abstract: Embodiments of the disclosure provide systems and methods for audio signal processing. An exemplary system may include a communication interface configured to receiving a first audio signal acquired from an audio source through a first channel, and a second audio signal acquired from the same audio source through a second channel. The system may also include at least one processor coupled to the communication interface. The at least one processor may be configured to determine channel features based on the first audio signal and the second audio signal individually and determine a cross-channel feature based on the first audio signal and the second audio signal collectively. The at least one processor may further be configured to concatenate the channel features and the cross-channel feature and estimate spectral-spatial masks for the first channel and the second channel using the concatenated channel features and the cross-channel feature.

Type: Application

Filed: April 24, 2020

Publication date: October 29, 2020

Applicant: BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD.

Inventors: Chengyun Deng, Hui Song, Yi Zhang, Yongtao Sha
SYSTEMS AND METHODS FOR ENHANCING AUDIO SIGNALS

Publication number: 20200342889

Abstract: Embodiments of the disclosure provide systems and methods for enhancing audio signals. The system may include a communication interface configured to receive multi-channel audio signals acquired from a common signal source. The system may further include at least one processor. The at least one processor may be configured to separate the multi-channel audio signals into a first audio signal and a second audio signal in a time domain. The at least one processor may be further configured to decompose the first audio signal and the second audio signal in a frequency domain to obtain a first decomposition data and a second decomposition data, respectively. The at least one processor may be also configured to estimate a noise component in the frequency domain based on the first decomposition data and the second decomposition data. The at least one processor may be additionally configured to enhance the first audio signal based on the estimated noise component.

Type: Application

Filed: April 24, 2020

Publication date: October 29, 2020

Applicant: BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD.

Inventors: Yi Zhang, Hui Song, Chengyun Deng, Yongtao Sha