Patents by Inventor Fileno A. Alleva

Fileno A. Alleva has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multi-microphone speech separation

Patent number: 10957337

Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.

Type: Grant

Filed: May 29, 2018

Date of Patent: March 23, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Zhuo Chen, Hakan Erdogan, Takuya Yoshioka, Fileno A. Alleva, Xiong Xiao
MULTI-MICROPHONE SPEECH SEPARATION

Publication number: 20190318757

Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.

Type: Application

Filed: May 29, 2018

Publication date: October 17, 2019

Applicant: Microsoft Technology Licensing, LLC

Inventors: Zhuo CHEN, Hakan ERDOGAN, Takuya YOSHIOKA, Fileno A. ALLEVA, Xiong XIAO
Hyper-structure recurrent neural networks for text-to-speech

Patent number: 10127901

Abstract: The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.

Type: Grant

Filed: June 13, 2014

Date of Patent: November 13, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Pei Zhao, Max Leung, Kaisheng Yao, Bo Yan, Sheng Zhao, Fileno A. Alleva
HYPER-STRUCTURE RECURRENT NEURAL NETWORKS FOR TEXT-TO-SPEECH

Publication number: 20150364128

Abstract: The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.

Type: Application

Filed: June 13, 2014

Publication date: December 17, 2015

Applicant: MICROSOFT CORPORATION

Inventors: Pei Zhao, Max Leung, Kaisheng Yao, Bo Yan, Sheng Zhao, Fileno A. Alleva
ADVANCED RECURRENT NEURAL NETWORK BASED LETTER-TO-SOUND

Publication number: 20150364127

Abstract: The technology relates to performing letter-to-sound conversion utilizing recurrent neural networks (RNNs). The RNNs may be implemented as RNN modules for letter-to-sound conversion. The RNN modules receive text input and convert the text to corresponding phonemes. In determining the corresponding phonemes, the RNN modules may analyze the letters of the text and the letters surrounding the text being analyzed. The RNN modules may also analyze the letters of the text in reverse order. The RNN modules may also receive contextual information about the input text. The letter-to-sound conversion may then also be based on the contextual information that is received. The determined phonemes may be utilized to generate synthesized speech from the input text.

Type: Application

Filed: June 13, 2014

Publication date: December 17, 2015

Applicant: MICROSOFT CORPORATION

Inventors: Pei Zhao, Kaisheng Yao, Max Leung, Mei-Yuh Hwang, Sheng Zhao, Bo Yan, Geoffrey Zweig, Fileno A. Alleva
Building multi-language processes from existing single-language processes

Patent number: 9098494

Abstract: Processes capable of accepting linguistic input in one or more languages are generated by re-using existing linguistic components associated with a different anchor language, together with machine translation components that translate between the anchor language and the one or more languages. Linguistic input is directed to machine translation components that translate such input from its language into the anchor language. Those existing linguistic components are then utilized to initiate responsive processing and generate output. Optionally, the output is directed through the machine translation components. A language identifier can initially receive linguistic input and identify the language within which such linguistic input is provided to select an appropriate machine translation component.

Type: Grant

Filed: May 10, 2012

Date of Patent: August 4, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Ruhi Sarikaya, Daniel Boies, Fethiye Asli Celikyilmaz, Anoop K. Deoras, Dustin Rigg Hillard, Dilek Z. Hakkani-Tur, Gokhan Tur, Fileno A. Alleva
BUILDING MULTI-LANGUAGE PROCESSES FROM EXISTING SINGLE-LANGUAGE PROCESSES

Publication number: 20130304451

Abstract: Processes capable of accepting linguistic input in one or more languages are generated by re-using existing linguistic components associated with a different anchor language, together with machine translation components that translate between the anchor language and the one or more languages. Linguistic input is directed to machine translation components that translate such input from its language into the anchor language. Those existing linguistic components are then utilized to initiate responsive processing and generate output. Optionally, the output is directed through the machine translation components. A language identifier can initially receive linguistic input and identify the language within which such linguistic input is provided to select an appropriate machine translation component.

Type: Application

Filed: May 10, 2012

Publication date: November 14, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Ruhi Sarikaya, Daniel Boies, Fethiye Asli Celikyilmaz, Anoop K. Deoras, Dustin Rigg Hillard, Dilek Z. Hakkani-Tur, Gokhan Tur, Fileno A. Alleva
Interactive TTS optimization tool

Patent number: 8352270

Abstract: An interactive prompt generation and TTS optimization tool with a user-friendly graphical user interface is provided. The tool accepts HTS abstraction or speech recognition processed input from a user to generate an enhanced initial waveform for synthesis. Acoustic features of the waveform are presented to the user with graphical visualizations enabling the user to modify various parameters of the speech synthesis process and listen to modified versions until an acceptable end product is reached.

Type: Grant

Filed: June 9, 2009

Date of Patent: January 8, 2013

Assignee: Microsoft Corporation

Inventors: Jian-Chao Wang, Lu-Jun Yuan, Sheng Zhao, Fileno A. Alleva, Jingyang Xu, Chiwei Che
INTERACTIVE TTS OPTIMIZATION TOOL

Publication number: 20100312565

Abstract: An interactive prompt generation and TTS optimization tool with a user-friendly graphical user interface is provided. The tool accepts HTS abstraction or speech recognition processed input from a user to generate an enhanced initial waveform for synthesis. Acoustic features of the waveform are presented to the user with graphical visualizations enabling the user to modify various parameters of the speech synthesis process and listen to modified versions until an acceptable end product is reached.

Type: Application

Filed: June 9, 2009

Publication date: December 9, 2010

Applicant: Microsoft Corporation

Inventors: Jian-Chao Wang, Lu-Jun Yuan, Sheng Zhao, Fileno A. Alleva, Jingyang Xu, Chiwei Che
Method and apparatus for constructing and using syllable-like unit language models

Patent number: 7676365

Abstract: A method and computer-readable medium use syllable-like units (SLUs) to decode a pronunciation into a phonetic description. The syllable-like units are generally larger than a single phoneme but smaller than a word. The present invention provides a means for defining these syllable-like units and for generating a language model based on these syllable-like units that can be used in the decoding process. As SLUs are longer than phonemes, they contain more acoustic contextual clues and better lexical constraints for speech recognition. Thus, the phoneme accuracy produced from SLU recognition is much better than all-phone sequence recognition.

Type: Grant

Filed: April 20, 2005

Date of Patent: March 9, 2010

Assignee: Microsoft Corporation

Inventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
Method and apparatus for generating and displaying N-Best alternatives in a speech recognition system

Patent number: 7162423

Abstract: The present invention is directed to a generating alternatives to words indicative of recognized speech. A reference path of recognized words is generated, based upon input speech data. An operator selection input is received and is indicative of a selected portion of the recognized speech, for which alternatives are to be generated. Boundary conditions for alternatives to be generated are calculated based upon bounds of a reference subpath corresponding to the selected portion of the recognized speech. Alternate subpaths satisfying the boundary conditions are constructed from a hypothesis store which corresponds to the input speech data.

Type: Grant

Filed: November 23, 2004

Date of Patent: January 9, 2007

Assignee: Microsoft Corporation

Inventors: Chris Thrasher, Fileno A. Alleva
Method and system for frame alignment and unsupervised adaptation of acoustic models

Patent number: 7016838

Abstract: An unsupervised adaptation method and apparatus are provided that reduce the storage and time requirements associated with adaptation. Under the invention, utterances are converted into feature vectors, which are decoded to produce a transcript and alignment unit boundaries for the utterance. Individual alignment units and the feature vectors associated with those alignment units are then provided to an alignment function, which aligns the feature vectors with the states of each alignment unit. Because the alignment is performed within alignment unit boundaries, fewer feature vectors are used and the time for alignment is reduced. After alignment, the feature vector dimensions aligned to a state are added to dimension sums that are kept for that state. After all the states in an utterance have had their sums updated, the speech signal and the alignment units are deleted. Once sufficient frames of data have been received to perform adaptive training, the acoustic model is adapted.

Type: Grant

Filed: November 12, 2004

Date of Patent: March 21, 2006

Assignee: Microsoft Corporation

Inventors: William H. Rockenbeck, Milind V. Mahajan, Fileno A. Alleva
Method for adding phonetic descriptions to a speech recognition lexicon

Patent number: 6973427

Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.

Type: Grant

Filed: December 26, 2000

Date of Patent: December 6, 2005

Assignee: Microsoft Corporation

Inventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
Disambiguation language model

Patent number: 6934683

Abstract: A language model for a language processing system such as a speech recognition system is formed as a function of associated characters, word phrases and context cues. A method and apparatus for generating the training corpus used to train the language model and a system or module using such a language model is disclosed.

Type: Grant

Filed: January 31, 2001

Date of Patent: August 23, 2005

Assignee: Microsoft Corporation

Inventors: Yun-cheng Ju, Fileno A. Alleva
Method and system for frame alignment and unsupervised adaptation of acoustic models

Patent number: 6917918

Abstract: An unsupervised adaptation method and apparatus are provided that reduce the storage and time requirements associated with adaptation. Under the invention, utterances are converted into feature vectors, which are decoded to produce a transcript and alignment unit boundaries for the utterance. Individual alignment units and the feature vectors associated with those alignment units are then provided to an alignment function, which aligns the feature vectors with the states of each alignment unit. Because the alignment is performed within alignment unit boundaries, fewer feature vectors are used and the time for alignment is reduced. After alignment, the feature vector dimensions aligned to a state are added to dimension sums that are kept for that state. After all the states in an utterance have had their sums updated, the speech signal and the alignment units are deleted. Once sufficient frames of data have been received to perform adaptive training, the acoustic model is adapted.

Type: Grant

Filed: December 22, 2000

Date of Patent: July 12, 2005

Assignee: Microsoft Corporation

Inventors: William H. Rockenbeck, Milind V. Mahajan, Fileno A. Alleva
Method and apparatus for generating and displaying N-best alternatives in a speech recognition system

Patent number: 6856956

Abstract: The present invention is directed to a method and apparatus for generating alternatives to words indicative of recognized speech. A reference path of recognized words is generated, based upon input speech data. An operator selection input is received and is indicative of a selected portion of the recognized speech, for which alternatives are to be generated. Boundary conditions for alternatives to be generated are calculated based upon bounds of a reference subpath corresponding to the selected portion of the recognized speech. Alternate subpaths satisfying the boundary conditions are constructed from a hypothesis store which corresponds to the input speech data.

Type: Grant

Filed: March 12, 2001

Date of Patent: February 15, 2005

Assignee: Microsoft Corporation

Inventors: Chris Thrasher, Fileno A. Alleva
Method and apparatus for the recognition of spelled spoken words

Patent number: 6694296

Abstract: The speech recognizer includes a dictation language model providing a dictation model output indicative of a likely word sequence recognized based on an input utterance. A spelling language model provides a spelling model output indicative of a likely letter sequence recognized based on the input utterance. An acoustic model provides an acoustic model output indicative of a likely speech unit recognized based on the input utterances. A speech recognition component is configured to access the dictation language model, the spelling language model and the acoustic model. The speech recognition component weights the dictation model output and the spelling model output in calculating likely recognized speech based on the input utterance. The speech recognizer can also be configured to confine spelled speech to an active lexicon.

Type: Grant

Filed: November 3, 2000

Date of Patent: February 17, 2004

Assignee: Microsoft Corporation

Inventors: Fileno A. Alleva, Mei-Yuh Hwang, Yun-Cheng Ju
Disambiguation language model

Publication number: 20020128831

Abstract: A language model for a language processing system such as a speech recognition system is formed as a function of associated characters, word phrases and context cues. A method and apparatus for generating the training corpus used to train the language model and a system or module using such a language model is disclosed.

Type: Application

Filed: January 31, 2001

Publication date: September 12, 2002

Inventors: Yun-cheng Ju, Fileno A. Alleva
Method and system for frame alignment and unsupervised adaptation of acoustic models

Publication number: 20020116190

Abstract: An unsupervised adaptation method and apparatus are provided that reduce the storage and time requirements associated with adaptation. Under the invention, utterances are converted into feature vectors, which are decoded to produce a transcript and alignment unit boundaries for the utterance. Individual alignment units and the feature vectors associated with those alignment units are then provided to an alignment function, which aligns the feature vectors with the states of each alignment unit. Because the alignment is performed within alignment unit boundaries, fewer feature vectors are used and the time for alignment is reduced. After alignment, the feature vector dimensions aligned to a state are added to dimension sums that are kept for that state. After all the states in an utterance have had their sums updated, the speech signal and the alignment units are deleted. Once sufficient frames of data have been received to perform adaptive training, the acoustic model is adapted.

Type: Application

Filed: December 22, 2000

Publication date: August 22, 2002

Inventors: William H. Rockenbeck, Milind V. Mahajan, Fileno A. Alleva
Method for adding phonetic descriptions to a speech recognition lexicon

Publication number: 20020082831

Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.

Type: Application

Filed: December 26, 2000

Publication date: June 27, 2002

Inventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss

1 2 next