Patents by Inventor Hakan Erdogan

Hakan Erdogan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Speaker awareness using speaker dependent speech model(s)

Patent number: 11854533

Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.

Type: Grant

Filed: January 28, 2022

Date of Patent: December 26, 2023

Assignee: GOOGLE LLC

Inventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
Low-latency speech separation

Patent number: 11445295

Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.

Type: Grant

Filed: November 17, 2020

Date of Patent: September 13, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Zhuo Chen, Changliang Liu, Takuya Yoshioka, Xiong Xiao, Hakan Erdogan, Dimitrios Basile Dimitriadis
SPEAKER AWARENESS USING SPEAKER DEPENDENT SPEECH MODEL(S)

Publication number: 20220157298

Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.

Type: Application

Filed: January 28, 2022

Publication date: May 19, 2022

Inventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
Speaker awareness using speaker dependent speech model(s)

Patent number: 11238847

Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.

Type: Grant

Filed: December 4, 2019

Date of Patent: February 1, 2022

Assignee: Google LLC

Inventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
SPEAKER AWARENESS USING SPEAKER DEPENDENT SPEECH MODEL(S)

Publication number: 20210312907

Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.

Type: Application

Filed: December 4, 2019

Publication date: October 7, 2021

Inventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
Multi-microphone speech separation

Patent number: 10957337

Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.

Type: Grant

Filed: May 29, 2018

Date of Patent: March 23, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Zhuo Chen, Hakan Erdogan, Takuya Yoshioka, Fileno A. Alleva, Xiong Xiao
LOW-LATENCY SPEECH SEPARATION

Publication number: 20210076129

Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.

Type: Application

Filed: November 17, 2020

Publication date: March 11, 2021

Inventors: Zhuo CHEN, Changliang LIU, Takuya YOSHIOKA, Xiong XIAO, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS
Low-latency speech separation

Patent number: 10856076

Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.

Type: Grant

Filed: April 5, 2019

Date of Patent: December 1, 2020

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Zhuo Chen, Changliang Liu, Takuya Yoshioka, Xiong Xiao, Hakan Erdogan, Dimitrios Basile Dimitriadis
SPEECH EXTRACTION USING ATTENTION NETWORK

Publication number: 20200335119

Abstract: Embodiments are associated with determination of a first plurality of multi-dimensional vectors, each of the first plurality of multi-dimensional vectors representing speech of a target speaker, determination of a multi-dimensional vector representing a speech signal of two or more speakers, determination of a weighted vector representing speech of the target speaker based on the first plurality of multi-dimensional vectors and on similarities between the multi-dimensional vector and each of the first plurality of multi-dimensional vectors, and extraction of speech of the target speaker from the speech signal based on the weighted vector and the speech signal.

Type: Application

Filed: June 7, 2019

Publication date: October 22, 2020

Inventors: Xiong XIAO, Zhuo CHEN, Takuya YOSHIOKA, Changliang LIU, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS, Yifan GONG, James Garnet Droppo, III
LOW-LATENCY SPEECH SEPARATION

Publication number: 20200322722

Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.

Type: Application

Filed: April 5, 2019

Publication date: October 8, 2020

Inventors: Zhuo CHEN, Changliang LIU, Takuya YOSHIOKA, Xiong XIAO, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS
MULTI-MICROPHONE SPEECH SEPARATION

Publication number: 20190318757

Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.

Type: Application

Filed: May 29, 2018

Publication date: October 17, 2019

Applicant: Microsoft Technology Licensing, LLC

Inventors: Zhuo CHEN, Hakan ERDOGAN, Takuya YOSHIOKA, Fileno A. ALLEVA, Xiong XIAO
Method for enhancing audio signal using phase information

Patent number: 9881631

Abstract: A method transforms a noisy audio signal to an enhanced audio signal, by first acquiring the noisy audio signal from an environment. The noisy audio signal is processed by an enhancement network having network parameters to jointly produce a magnitude mask and a phase estimate. Then, the magnitude mask and the phase estimate are used to obtain the enhanced audio signal.

Type: Grant

Filed: February 12, 2015

Date of Patent: January 30, 2018

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Hakan Erdogan, John Hershey, Shinji Watanabe, Jonathan Le Roux
Method for Enhancing Audio Signal using Phase Information

Publication number: 20160111108

Abstract: A method transforms a noisy audio signal to an enhanced audio signal, by first acquiring the noisy audio signal from an environment. The noisy audio signal is processed by an enhancement network having network parameters to jointly produce a magnitude mask and a phase estimate. Then, the magnitude mask and the phase estimate are used to obtain the enhanced audio signal.

Type: Application

Filed: February 12, 2015

Publication date: April 21, 2016

Inventors: Hakan Erdogan, John Hershey, Shinji Watanabe, Jonathan Le Roux
Method for Enhancing Noisy Speech using Features from an Automatic Speech Recognition System

Publication number: 20160111107

Abstract: A method transforms a noisy speech signal to an enhanced speech signal, by first acquiring the noisy speech signal from an environment. The noisy speech signal is processed by an automatic speech recognition system (ASR) to produce ASR features. The the ASR features and noisy speech spectral features are processed using an enhancement network having network parameters to produce a mask. Then, the mask is applied to the noisy speech signal to obtain the enhanced speech signal.

Type: Application

Filed: February 12, 2015

Publication date: April 21, 2016

Inventors: Hakan Erdogan, John Hershey, Shinji Watanabe, Jonathan Le Roux
Semantic language modeling and confidence measurement

Patent number: 7475015

Abstract: A system and method for speech recognition includes generating a set of likely hypotheses in recognizing speech, rescoring the likely hypotheses by using semantic content by employing semantic structured language models, and scoring parse trees to identify a best sentence according to the sentence's parse tree by employing the semantic structured language models to clarify the recognized speech.

Type: Grant

Filed: September 5, 2003

Date of Patent: January 6, 2009

Assignee: International Business Machines Corporation

Inventors: Mark E. Epstein, Hakan Erdogan, Yuqing Gao, Michael A. Picheny, Ruhi Sarikaya
Methods and apparatus for conversational name dialing systems

Patent number: 6925154

Abstract: Techniques for providing an automated conversational name dialing system for placing a call in response to an input by a user. One technique begins with the step of analyzing an input from a user, wherein the input includes information directed to identifying an intended recipient of a telephone call from the user. At least one candidate for the intended recipient is identified in response to the input, wherein the at least one candidate represents at least one potential match between the intended recipient and a predetermined vocabulary. A confidence measure indicative of a likelihood that the at least one candidate is the intended recipient is determined, and additional information is obtained from the user to increase the likelihood that the at least one candidate is the intended recipient, based on the determined confidence measure.

Type: Grant

Filed: May 3, 2002

Date of Patent: August 2, 2005

Assignee: International Business Machines Corproation

Inventors: Yuqing Gao, Bhuvana Ramabhadran, Chengjun Julian Chen, Hakan Erdogan, Michael A. Picheny
Semantic language modeling and confidence measurement

Publication number: 20050055209

Abstract: A system and method for speech recognition includes generating a set of likely hypotheses in recognizing speech, rescoring the likely hypotheses by using semantic content by employing semantic structured language models, and scoring parse trees to identify a best sentence according to the sentence's parse tree by employing the semantic structured language models to clarify the recognized speech.

Type: Application

Filed: September 5, 2003

Publication date: March 10, 2005

Inventors: Mark Epstein, Hakan Erdogan, Yuqing Gao, Michael Picheny, Ruhi Sarikaya
Weighted pair-wise scatter to improve linear discriminant analysis

Patent number: 6567771

Abstract: In general, the present invention determines and applies weights for class pairs. The weights are selected to better separate, in reduced-dimensional class space, the classes that are confusable in normal-dimensional class space. During the dimension-reducing process, higher weights are preferably assigned to more confusable class pairs while lower weights are assigned to less confusable class pairs. As compared to unweighted Linear Discriminant Analysis (LDA), the present invention will result in decreased confusability of class pairs in reduced-dimensional class space. The weights can be assigned through a monotonically decreasing function of distance, which assigns lower weights to class pairs that are separated by larger distances. Additionally, weights may also be assigned through a monotonically increasing function of confusability, in which higher weights would be assigned to class pairs that are more confusable.

Type: Grant

Filed: February 16, 2001

Date of Patent: May 20, 2003

Assignee: International Business Machines Corporation

Inventors: Hakan Erdogan, Yuqing Gao, Yongxin Li
Methods and apparatus for conversational name dialing systems

Publication number: 20020196911

Abstract: Techniques for providing an automated conversational name dialing system for placing a call in response to an input by a user. One technique begins with the step of analyzing an input from a user, wherein the input includes information directed to identifying an intended recipient of a telephone call from the user. At least one candidate for the intended recipient is identified in response to the input, wherein the at least one candidate represents at least one potential match between the intended recipient and a predetermined vocabulary. A confidence measure indicative of a likelihood that the at least one candidate is the intended recipient is determined, and additional information is obtained from the user to increase the likelihood that the at least one candidate is the intended recipient, based on the determined confidence measure.

Type: Application

Filed: May 3, 2002

Publication date: December 26, 2002

Applicant: International Business Machines Corporation

Inventors: Yuqing Gao, Bhuvana Ramabhadran, Chengjun Julian Chen, Hakan Erdogan, Michael A. Picheny
Weighted pair-wise scatter to improve linear discriminant analysis

Publication number: 20020049568

Abstract: In general, the present invention determines and applies weights for class pairs. The weights are selected to better separate, in reduced-dimensional class space, the classes that are confusable in normal-dimensional class space. During the dimension-reducing process, higher weights are preferably assigned to more confusable class pairs while lower weights are assigned to less confusable class pairs. As compared to unweighted Linear Discriminant Analysis (LDA), the present invention will result in decreased confusability of class pairs in reduced-dimensional class space. The weights can be assigned through a monotonically decreasing function of distance, which assigns lower weights to class pairs that are separated by larger distances. Additionally, weights may also be assigned through a monotonically increasing function of confusability, in which higher weights would be assigned to class pairs that are more confusable.

Type: Application

Filed: February 16, 2001

Publication date: April 25, 2002

Applicant: International Business Machines Corporation

Inventors: Hakan Erdogan, Yuqing Gao, Yongxin Li