Patents by Inventor Hakan Erdogan
Hakan Erdogan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11854533Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.Type: GrantFiled: January 28, 2022Date of Patent: December 26, 2023Assignee: GOOGLE LLCInventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
-
Patent number: 11445295Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.Type: GrantFiled: November 17, 2020Date of Patent: September 13, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Zhuo Chen, Changliang Liu, Takuya Yoshioka, Xiong Xiao, Hakan Erdogan, Dimitrios Basile Dimitriadis
-
Publication number: 20220157298Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.Type: ApplicationFiled: January 28, 2022Publication date: May 19, 2022Inventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
-
Patent number: 11238847Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.Type: GrantFiled: December 4, 2019Date of Patent: February 1, 2022Assignee: Google LLCInventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
-
Publication number: 20210312907Abstract: Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.Type: ApplicationFiled: December 4, 2019Publication date: October 7, 2021Inventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Li Wan, Alexander Gruenstein, Hakan Erdogan
-
Patent number: 10957337Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.Type: GrantFiled: May 29, 2018Date of Patent: March 23, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Zhuo Chen, Hakan Erdogan, Takuya Yoshioka, Fileno A. Alleva, Xiong Xiao
-
Publication number: 20210076129Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.Type: ApplicationFiled: November 17, 2020Publication date: March 11, 2021Inventors: Zhuo CHEN, Changliang LIU, Takuya YOSHIOKA, Xiong XIAO, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS
-
Patent number: 10856076Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.Type: GrantFiled: April 5, 2019Date of Patent: December 1, 2020Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Zhuo Chen, Changliang Liu, Takuya Yoshioka, Xiong Xiao, Hakan Erdogan, Dimitrios Basile Dimitriadis
-
Publication number: 20200335119Abstract: Embodiments are associated with determination of a first plurality of multi-dimensional vectors, each of the first plurality of multi-dimensional vectors representing speech of a target speaker, determination of a multi-dimensional vector representing a speech signal of two or more speakers, determination of a weighted vector representing speech of the target speaker based on the first plurality of multi-dimensional vectors and on similarities between the multi-dimensional vector and each of the first plurality of multi-dimensional vectors, and extraction of speech of the target speaker from the speech signal based on the weighted vector and the speech signal.Type: ApplicationFiled: June 7, 2019Publication date: October 22, 2020Inventors: Xiong XIAO, Zhuo CHEN, Takuya YOSHIOKA, Changliang LIU, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS, Yifan GONG, James Garnet Droppo, III
-
Publication number: 20200322722Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.Type: ApplicationFiled: April 5, 2019Publication date: October 8, 2020Inventors: Zhuo CHEN, Changliang LIU, Takuya YOSHIOKA, Xiong XIAO, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS
-
Publication number: 20190318757Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.Type: ApplicationFiled: May 29, 2018Publication date: October 17, 2019Applicant: Microsoft Technology Licensing, LLCInventors: Zhuo CHEN, Hakan ERDOGAN, Takuya YOSHIOKA, Fileno A. ALLEVA, Xiong XIAO
-
Patent number: 9881631Abstract: A method transforms a noisy audio signal to an enhanced audio signal, by first acquiring the noisy audio signal from an environment. The noisy audio signal is processed by an enhancement network having network parameters to jointly produce a magnitude mask and a phase estimate. Then, the magnitude mask and the phase estimate are used to obtain the enhanced audio signal.Type: GrantFiled: February 12, 2015Date of Patent: January 30, 2018Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Hakan Erdogan, John Hershey, Shinji Watanabe, Jonathan Le Roux
-
Publication number: 20160111108Abstract: A method transforms a noisy audio signal to an enhanced audio signal, by first acquiring the noisy audio signal from an environment. The noisy audio signal is processed by an enhancement network having network parameters to jointly produce a magnitude mask and a phase estimate. Then, the magnitude mask and the phase estimate are used to obtain the enhanced audio signal.Type: ApplicationFiled: February 12, 2015Publication date: April 21, 2016Inventors: Hakan Erdogan, John Hershey, Shinji Watanabe, Jonathan Le Roux
-
Publication number: 20160111107Abstract: A method transforms a noisy speech signal to an enhanced speech signal, by first acquiring the noisy speech signal from an environment. The noisy speech signal is processed by an automatic speech recognition system (ASR) to produce ASR features. The the ASR features and noisy speech spectral features are processed using an enhancement network having network parameters to produce a mask. Then, the mask is applied to the noisy speech signal to obtain the enhanced speech signal.Type: ApplicationFiled: February 12, 2015Publication date: April 21, 2016Inventors: Hakan Erdogan, John Hershey, Shinji Watanabe, Jonathan Le Roux
-
Patent number: 7475015Abstract: A system and method for speech recognition includes generating a set of likely hypotheses in recognizing speech, rescoring the likely hypotheses by using semantic content by employing semantic structured language models, and scoring parse trees to identify a best sentence according to the sentence's parse tree by employing the semantic structured language models to clarify the recognized speech.Type: GrantFiled: September 5, 2003Date of Patent: January 6, 2009Assignee: International Business Machines CorporationInventors: Mark E. Epstein, Hakan Erdogan, Yuqing Gao, Michael A. Picheny, Ruhi Sarikaya
-
Patent number: 6925154Abstract: Techniques for providing an automated conversational name dialing system for placing a call in response to an input by a user. One technique begins with the step of analyzing an input from a user, wherein the input includes information directed to identifying an intended recipient of a telephone call from the user. At least one candidate for the intended recipient is identified in response to the input, wherein the at least one candidate represents at least one potential match between the intended recipient and a predetermined vocabulary. A confidence measure indicative of a likelihood that the at least one candidate is the intended recipient is determined, and additional information is obtained from the user to increase the likelihood that the at least one candidate is the intended recipient, based on the determined confidence measure.Type: GrantFiled: May 3, 2002Date of Patent: August 2, 2005Assignee: International Business Machines CorproationInventors: Yuqing Gao, Bhuvana Ramabhadran, Chengjun Julian Chen, Hakan Erdogan, Michael A. Picheny
-
Publication number: 20050055209Abstract: A system and method for speech recognition includes generating a set of likely hypotheses in recognizing speech, rescoring the likely hypotheses by using semantic content by employing semantic structured language models, and scoring parse trees to identify a best sentence according to the sentence's parse tree by employing the semantic structured language models to clarify the recognized speech.Type: ApplicationFiled: September 5, 2003Publication date: March 10, 2005Inventors: Mark Epstein, Hakan Erdogan, Yuqing Gao, Michael Picheny, Ruhi Sarikaya
-
Patent number: 6567771Abstract: In general, the present invention determines and applies weights for class pairs. The weights are selected to better separate, in reduced-dimensional class space, the classes that are confusable in normal-dimensional class space. During the dimension-reducing process, higher weights are preferably assigned to more confusable class pairs while lower weights are assigned to less confusable class pairs. As compared to unweighted Linear Discriminant Analysis (LDA), the present invention will result in decreased confusability of class pairs in reduced-dimensional class space. The weights can be assigned through a monotonically decreasing function of distance, which assigns lower weights to class pairs that are separated by larger distances. Additionally, weights may also be assigned through a monotonically increasing function of confusability, in which higher weights would be assigned to class pairs that are more confusable.Type: GrantFiled: February 16, 2001Date of Patent: May 20, 2003Assignee: International Business Machines CorporationInventors: Hakan Erdogan, Yuqing Gao, Yongxin Li
-
Publication number: 20020196911Abstract: Techniques for providing an automated conversational name dialing system for placing a call in response to an input by a user. One technique begins with the step of analyzing an input from a user, wherein the input includes information directed to identifying an intended recipient of a telephone call from the user. At least one candidate for the intended recipient is identified in response to the input, wherein the at least one candidate represents at least one potential match between the intended recipient and a predetermined vocabulary. A confidence measure indicative of a likelihood that the at least one candidate is the intended recipient is determined, and additional information is obtained from the user to increase the likelihood that the at least one candidate is the intended recipient, based on the determined confidence measure.Type: ApplicationFiled: May 3, 2002Publication date: December 26, 2002Applicant: International Business Machines CorporationInventors: Yuqing Gao, Bhuvana Ramabhadran, Chengjun Julian Chen, Hakan Erdogan, Michael A. Picheny
-
Publication number: 20020049568Abstract: In general, the present invention determines and applies weights for class pairs. The weights are selected to better separate, in reduced-dimensional class space, the classes that are confusable in normal-dimensional class space. During the dimension-reducing process, higher weights are preferably assigned to more confusable class pairs while lower weights are assigned to less confusable class pairs. As compared to unweighted Linear Discriminant Analysis (LDA), the present invention will result in decreased confusability of class pairs in reduced-dimensional class space. The weights can be assigned through a monotonically decreasing function of distance, which assigns lower weights to class pairs that are separated by larger distances. Additionally, weights may also be assigned through a monotonically increasing function of confusability, in which higher weights would be assigned to class pairs that are more confusable.Type: ApplicationFiled: February 16, 2001Publication date: April 25, 2002Applicant: International Business Machines CorporationInventors: Hakan Erdogan, Yuqing Gao, Yongxin Li