Patents by Inventor Hakan Erdogan

Hakan Erdogan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11854533
    Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.
    Type: Grant
    Filed: January 28, 2022
    Date of Patent: December 26, 2023
    Assignee: GOOGLE LLC
    Inventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
  • Patent number: 11445295
    Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.
    Type: Grant
    Filed: November 17, 2020
    Date of Patent: September 13, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Zhuo Chen, Changliang Liu, Takuya Yoshioka, Xiong Xiao, Hakan Erdogan, Dimitrios Basile Dimitriadis
  • Publication number: 20220157298
    Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.
    Type: Application
    Filed: January 28, 2022
    Publication date: May 19, 2022
    Inventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
  • Patent number: 11238847
    Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.
    Type: Grant
    Filed: December 4, 2019
    Date of Patent: February 1, 2022
    Assignee: Google LLC
    Inventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
  • Publication number: 20210312907
    Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.
    Type: Application
    Filed: December 4, 2019
    Publication date: October 7, 2021
    Inventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
  • Patent number: 10957337
    Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.
    Type: Grant
    Filed: May 29, 2018
    Date of Patent: March 23, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Zhuo Chen, Hakan Erdogan, Takuya Yoshioka, Fileno A. Alleva, Xiong Xiao
  • Publication number: 20210076129
    Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.
    Type: Application
    Filed: November 17, 2020
    Publication date: March 11, 2021
    Inventors: Zhuo CHEN, Changliang LIU, Takuya YOSHIOKA, Xiong XIAO, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS
  • Patent number: 10856076
    Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.
    Type: Grant
    Filed: April 5, 2019
    Date of Patent: December 1, 2020
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Zhuo Chen, Changliang Liu, Takuya Yoshioka, Xiong Xiao, Hakan Erdogan, Dimitrios Basile Dimitriadis
  • Publication number: 20200335119
    Abstract: Embodiments are associated with determination of a first plurality of multi-dimensional vectors, each of the first plurality of multi-dimensional vectors representing speech of a target speaker, determination of a multi-dimensional vector representing a speech signal of two or more speakers, determination of a weighted vector representing speech of the target speaker based on the first plurality of multi-dimensional vectors and on similarities between the multi-dimensional vector and each of the first plurality of multi-dimensional vectors, and extraction of speech of the target speaker from the speech signal based on the weighted vector and the speech signal.
    Type: Application
    Filed: June 7, 2019
    Publication date: October 22, 2020
    Inventors: Xiong XIAO, Zhuo CHEN, Takuya YOSHIOKA, Changliang LIU, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS, Yifan GONG, James Garnet Droppo, III
  • Publication number: 20200322722
    Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.
    Type: Application
    Filed: April 5, 2019
    Publication date: October 8, 2020
    Inventors: Zhuo CHEN, Changliang LIU, Takuya YOSHIOKA, Xiong XIAO, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS
  • Publication number: 20190318757
    Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.
    Type: Application
    Filed: May 29, 2018
    Publication date: October 17, 2019
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Zhuo CHEN, Hakan ERDOGAN, Takuya YOSHIOKA, Fileno A. ALLEVA, Xiong XIAO
  • Patent number: 9881631
    Abstract: A method transforms a noisy audio signal to an enhanced audio signal, by first acquiring the noisy audio signal from an environment. The noisy audio signal is processed by an enhancement network having network parameters to jointly produce a magnitude mask and a phase estimate. Then, the magnitude mask and the phase estimate are used to obtain the enhanced audio signal.
    Type: Grant
    Filed: February 12, 2015
    Date of Patent: January 30, 2018
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Hakan Erdogan, John Hershey, Shinji Watanabe, Jonathan Le Roux
  • Publication number: 20160111108
    Abstract: A method transforms a noisy audio signal to an enhanced audio signal, by first acquiring the noisy audio signal from an environment. The noisy audio signal is processed by an enhancement network having network parameters to jointly produce a magnitude mask and a phase estimate. Then, the magnitude mask and the phase estimate are used to obtain the enhanced audio signal.
    Type: Application
    Filed: February 12, 2015
    Publication date: April 21, 2016
    Inventors: Hakan Erdogan, John Hershey, Shinji Watanabe, Jonathan Le Roux
  • Publication number: 20160111107
    Abstract: A method transforms a noisy speech signal to an enhanced speech signal, by first acquiring the noisy speech signal from an environment. The noisy speech signal is processed by an automatic speech recognition system (ASR) to produce ASR features. The the ASR features and noisy speech spectral features are processed using an enhancement network having network parameters to produce a mask. Then, the mask is applied to the noisy speech signal to obtain the enhanced speech signal.
    Type: Application
    Filed: February 12, 2015
    Publication date: April 21, 2016
    Inventors: Hakan Erdogan, John Hershey, Shinji Watanabe, Jonathan Le Roux
  • Patent number: 7475015
    Abstract: A system and method for speech recognition includes generating a set of likely hypotheses in recognizing speech, rescoring the likely hypotheses by using semantic content by employing semantic structured language models, and scoring parse trees to identify a best sentence according to the sentence's parse tree by employing the semantic structured language models to clarify the recognized speech.
    Type: Grant
    Filed: September 5, 2003
    Date of Patent: January 6, 2009
    Assignee: International Business Machines Corporation
    Inventors: Mark E. Epstein, Hakan Erdogan, Yuqing Gao, Michael A. Picheny, Ruhi Sarikaya
  • Patent number: 6925154
    Abstract: Techniques for providing an automated conversational name dialing system for placing a call in response to an input by a user. One technique begins with the step of analyzing an input from a user, wherein the input includes information directed to identifying an intended recipient of a telephone call from the user. At least one candidate for the intended recipient is identified in response to the input, wherein the at least one candidate represents at least one potential match between the intended recipient and a predetermined vocabulary. A confidence measure indicative of a likelihood that the at least one candidate is the intended recipient is determined, and additional information is obtained from the user to increase the likelihood that the at least one candidate is the intended recipient, based on the determined confidence measure.
    Type: Grant
    Filed: May 3, 2002
    Date of Patent: August 2, 2005
    Assignee: International Business Machines Corproation
    Inventors: Yuqing Gao, Bhuvana Ramabhadran, Chengjun Julian Chen, Hakan Erdogan, Michael A. Picheny
  • Publication number: 20050055209
    Abstract: A system and method for speech recognition includes generating a set of likely hypotheses in recognizing speech, rescoring the likely hypotheses by using semantic content by employing semantic structured language models, and scoring parse trees to identify a best sentence according to the sentence's parse tree by employing the semantic structured language models to clarify the recognized speech.
    Type: Application
    Filed: September 5, 2003
    Publication date: March 10, 2005
    Inventors: Mark Epstein, Hakan Erdogan, Yuqing Gao, Michael Picheny, Ruhi Sarikaya
  • Patent number: 6567771
    Abstract: In general, the present invention determines and applies weights for class pairs. The weights are selected to better separate, in reduced-dimensional class space, the classes that are confusable in normal-dimensional class space. During the dimension-reducing process, higher weights are preferably assigned to more confusable class pairs while lower weights are assigned to less confusable class pairs. As compared to unweighted Linear Discriminant Analysis (LDA), the present invention will result in decreased confusability of class pairs in reduced-dimensional class space. The weights can be assigned through a monotonically decreasing function of distance, which assigns lower weights to class pairs that are separated by larger distances. Additionally, weights may also be assigned through a monotonically increasing function of confusability, in which higher weights would be assigned to class pairs that are more confusable.
    Type: Grant
    Filed: February 16, 2001
    Date of Patent: May 20, 2003
    Assignee: International Business Machines Corporation
    Inventors: Hakan Erdogan, Yuqing Gao, Yongxin Li
  • Publication number: 20020196911
    Abstract: Techniques for providing an automated conversational name dialing system for placing a call in response to an input by a user. One technique begins with the step of analyzing an input from a user, wherein the input includes information directed to identifying an intended recipient of a telephone call from the user. At least one candidate for the intended recipient is identified in response to the input, wherein the at least one candidate represents at least one potential match between the intended recipient and a predetermined vocabulary. A confidence measure indicative of a likelihood that the at least one candidate is the intended recipient is determined, and additional information is obtained from the user to increase the likelihood that the at least one candidate is the intended recipient, based on the determined confidence measure.
    Type: Application
    Filed: May 3, 2002
    Publication date: December 26, 2002
    Applicant: International Business Machines Corporation
    Inventors: Yuqing Gao, Bhuvana Ramabhadran, Chengjun Julian Chen, Hakan Erdogan, Michael A. Picheny
  • Publication number: 20020049568
    Abstract: In general, the present invention determines and applies weights for class pairs. The weights are selected to better separate, in reduced-dimensional class space, the classes that are confusable in normal-dimensional class space. During the dimension-reducing process, higher weights are preferably assigned to more confusable class pairs while lower weights are assigned to less confusable class pairs. As compared to unweighted Linear Discriminant Analysis (LDA), the present invention will result in decreased confusability of class pairs in reduced-dimensional class space. The weights can be assigned through a monotonically decreasing function of distance, which assigns lower weights to class pairs that are separated by larger distances. Additionally, weights may also be assigned through a monotonically increasing function of confusability, in which higher weights would be assigned to class pairs that are more confusable.
    Type: Application
    Filed: February 16, 2001
    Publication date: April 25, 2002
    Applicant: International Business Machines Corporation
    Inventors: Hakan Erdogan, Yuqing Gao, Yongxin Li