Patents by Inventor Fileno A. Alleva
Fileno A. Alleva has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10957337Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.Type: GrantFiled: May 29, 2018Date of Patent: March 23, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Zhuo Chen, Hakan Erdogan, Takuya Yoshioka, Fileno A. Alleva, Xiong Xiao
-
Publication number: 20190318757Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.Type: ApplicationFiled: May 29, 2018Publication date: October 17, 2019Applicant: Microsoft Technology Licensing, LLCInventors: Zhuo CHEN, Hakan ERDOGAN, Takuya YOSHIOKA, Fileno A. ALLEVA, Xiong XIAO
-
Patent number: 10127901Abstract: The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.Type: GrantFiled: June 13, 2014Date of Patent: November 13, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Pei Zhao, Max Leung, Kaisheng Yao, Bo Yan, Sheng Zhao, Fileno A. Alleva
-
Patent number: 9263030Abstract: A speech recognition system adaptively estimates a warping factor used to reduce speaker variability. The warping factor is estimated using a small window (e.g. 100 ms) of speech. The warping factor is adaptively adjusted as more speech is obtained until the warping factor converges or a pre-defined maximum number of adaptation is reached. The speaker may be placed into a group selected from two or more groups based on characteristics that are associated with the speaker's window of speech. Different step sizes may be used within the different groups when estimating the warping factor. VTLN is applied to the speech input using the estimated warping factor. A linear transformation, including a bias term, may also be computed to assist in normalizing the speech along with the application of the VTLN.Type: GrantFiled: January 23, 2013Date of Patent: February 16, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Shizhen Wang, Yifan Gong, Fileno Alleva
-
Publication number: 20150364128Abstract: The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.Type: ApplicationFiled: June 13, 2014Publication date: December 17, 2015Applicant: MICROSOFT CORPORATIONInventors: Pei Zhao, Max Leung, Kaisheng Yao, Bo Yan, Sheng Zhao, Fileno A. Alleva
-
Publication number: 20150364127Abstract: The technology relates to performing letter-to-sound conversion utilizing recurrent neural networks (RNNs). The RNNs may be implemented as RNN modules for letter-to-sound conversion. The RNN modules receive text input and convert the text to corresponding phonemes. In determining the corresponding phonemes, the RNN modules may analyze the letters of the text and the letters surrounding the text being analyzed. The RNN modules may also analyze the letters of the text in reverse order. The RNN modules may also receive contextual information about the input text. The letter-to-sound conversion may then also be based on the contextual information that is received. The determined phonemes may be utilized to generate synthesized speech from the input text.Type: ApplicationFiled: June 13, 2014Publication date: December 17, 2015Applicant: MICROSOFT CORPORATIONInventors: Pei Zhao, Kaisheng Yao, Max Leung, Mei-Yuh Hwang, Sheng Zhao, Bo Yan, Geoffrey Zweig, Fileno A. Alleva
-
Patent number: 9098494Abstract: Processes capable of accepting linguistic input in one or more languages are generated by re-using existing linguistic components associated with a different anchor language, together with machine translation components that translate between the anchor language and the one or more languages. Linguistic input is directed to machine translation components that translate such input from its language into the anchor language. Those existing linguistic components are then utilized to initiate responsive processing and generate output. Optionally, the output is directed through the machine translation components. A language identifier can initially receive linguistic input and identify the language within which such linguistic input is provided to select an appropriate machine translation component.Type: GrantFiled: May 10, 2012Date of Patent: August 4, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Ruhi Sarikaya, Daniel Boies, Fethiye Asli Celikyilmaz, Anoop K. Deoras, Dustin Rigg Hillard, Dilek Z. Hakkani-Tur, Gokhan Tur, Fileno A. Alleva
-
Publication number: 20140207448Abstract: A speech recognition system adaptively estimates a warping factor used to reduce speaker variability. The warping factor is estimated using a small window (e.g. 100 ms) of speech. The warping factor is adaptively adjusted as more speech is obtained until the warping factor converges or a pre-defined maximum number of adaptation is reached. The speaker may be placed into a group selected from two or more groups based on characteristics that are associated with the speaker's window of speech. Different step sizes may be used within the different groups when estimating the warping factor. VTLN is applied to the speech input using the estimated warping factor. A linear transformation, including a bias term, may also be computed to assist in normalizing the speech along with the application of the VTLN.Type: ApplicationFiled: January 23, 2013Publication date: July 24, 2014Applicant: Microsoft CorporationInventors: Shizhen Wang, Yifan Gong, Fileno Alleva
-
Publication number: 20130304451Abstract: Processes capable of accepting linguistic input in one or more languages are generated by re-using existing linguistic components associated with a different anchor language, together with machine translation components that translate between the anchor language and the one or more languages. Linguistic input is directed to machine translation components that translate such input from its language into the anchor language. Those existing linguistic components are then utilized to initiate responsive processing and generate output. Optionally, the output is directed through the machine translation components. A language identifier can initially receive linguistic input and identify the language within which such linguistic input is provided to select an appropriate machine translation component.Type: ApplicationFiled: May 10, 2012Publication date: November 14, 2013Applicant: MICROSOFT CORPORATIONInventors: Ruhi Sarikaya, Daniel Boies, Fethiye Asli Celikyilmaz, Anoop K. Deoras, Dustin Rigg Hillard, Dilek Z. Hakkani-Tur, Gokhan Tur, Fileno A. Alleva
-
Patent number: 8352270Abstract: An interactive prompt generation and TTS optimization tool with a user-friendly graphical user interface is provided. The tool accepts HTS abstraction or speech recognition processed input from a user to generate an enhanced initial waveform for synthesis. Acoustic features of the waveform are presented to the user with graphical visualizations enabling the user to modify various parameters of the speech synthesis process and listen to modified versions until an acceptable end product is reached.Type: GrantFiled: June 9, 2009Date of Patent: January 8, 2013Assignee: Microsoft CorporationInventors: Jian-Chao Wang, Lu-Jun Yuan, Sheng Zhao, Fileno A. Alleva, Jingyang Xu, Chiwei Che
-
Publication number: 20100312565Abstract: An interactive prompt generation and TTS optimization tool with a user-friendly graphical user interface is provided. The tool accepts HTS abstraction or speech recognition processed input from a user to generate an enhanced initial waveform for synthesis. Acoustic features of the waveform are presented to the user with graphical visualizations enabling the user to modify various parameters of the speech synthesis process and listen to modified versions until an acceptable end product is reached.Type: ApplicationFiled: June 9, 2009Publication date: December 9, 2010Applicant: Microsoft CorporationInventors: Jian-Chao Wang, Lu-Jun Yuan, Sheng Zhao, Fileno A. Alleva, Jingyang Xu, Chiwei Che
-
Patent number: 7676365Abstract: A method and computer-readable medium use syllable-like units (SLUs) to decode a pronunciation into a phonetic description. The syllable-like units are generally larger than a single phoneme but smaller than a word. The present invention provides a means for defining these syllable-like units and for generating a language model based on these syllable-like units that can be used in the decoding process. As SLUs are longer than phonemes, they contain more acoustic contextual clues and better lexical constraints for speech recognition. Thus, the phoneme accuracy produced from SLU recognition is much better than all-phone sequence recognition.Type: GrantFiled: April 20, 2005Date of Patent: March 9, 2010Assignee: Microsoft CorporationInventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
-
Patent number: 7251600Abstract: A language model for a language processing system such as a speech recognition system is constructed from training corpus formed from associated characters, word phrases and context cues. A method and apparatus for generating the training corpus used to train the language model and a system or module using such a language model is disclosed.Type: GrantFiled: March 29, 2005Date of Patent: July 31, 2007Assignee: Microsoft CorporationInventors: Yun-cheng Ju, Fileno Alleva
-
Patent number: 7162423Abstract: The present invention is directed to a generating alternatives to words indicative of recognized speech. A reference path of recognized words is generated, based upon input speech data. An operator selection input is received and is indicative of a selected portion of the recognized speech, for which alternatives are to be generated. Boundary conditions for alternatives to be generated are calculated based upon bounds of a reference subpath corresponding to the selected portion of the recognized speech. Alternate subpaths satisfying the boundary conditions are constructed from a hypothesis store which corresponds to the input speech data.Type: GrantFiled: November 23, 2004Date of Patent: January 9, 2007Assignee: Microsoft CorporationInventors: Chris Thrasher, Fileno A. Alleva
-
Patent number: 7016838Abstract: An unsupervised adaptation method and apparatus are provided that reduce the storage and time requirements associated with adaptation. Under the invention, utterances are converted into feature vectors, which are decoded to produce a transcript and alignment unit boundaries for the utterance. Individual alignment units and the feature vectors associated with those alignment units are then provided to an alignment function, which aligns the feature vectors with the states of each alignment unit. Because the alignment is performed within alignment unit boundaries, fewer feature vectors are used and the time for alignment is reduced. After alignment, the feature vector dimensions aligned to a state are added to dimension sums that are kept for that state. After all the states in an utterance have had their sums updated, the speech signal and the alignment units are deleted. Once sufficient frames of data have been received to perform adaptive training, the acoustic model is adapted.Type: GrantFiled: November 12, 2004Date of Patent: March 21, 2006Assignee: Microsoft CorporationInventors: William H. Rockenbeck, Milind V. Mahajan, Fileno A. Alleva
-
Patent number: 6973427Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.Type: GrantFiled: December 26, 2000Date of Patent: December 6, 2005Assignee: Microsoft CorporationInventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
-
Publication number: 20050187769Abstract: A method and computer-readable medium use syllable-like units (SLUs) to decode a pronunciation into a phonetic description. The syllable-like units are generally larger than a single phoneme but smaller than a word. The present invention provides a means for defining these syllable-like units and for generating a language model based on these syllable-like units that can be used in the decoding process. As SLUs are longer than phonemes, they contain more acoustic contextual clues and better lexical constraints for speech recognition. Thus, the phoneme accuracy produced from SLU recognition is much better than all-phone sequence recognition.Type: ApplicationFiled: April 20, 2005Publication date: August 25, 2005Applicant: Microsoft CorporationInventors: Mei-Yuh Hwang, Fileno Alleva, Rebecca Weiss
-
Patent number: 6934683Abstract: A language model for a language processing system such as a speech recognition system is formed as a function of associated characters, word phrases and context cues. A method and apparatus for generating the training corpus used to train the language model and a system or module using such a language model is disclosed.Type: GrantFiled: January 31, 2001Date of Patent: August 23, 2005Assignee: Microsoft CorporationInventors: Yun-cheng Ju, Fileno A. Alleva
-
Publication number: 20050171761Abstract: A language model for a language processing system such as a speech recognition system is constructed from training corpus formed from associated characters, word phrases and context cues. A method and apparatus for generating the training corpus used to train the language model and a system or module using such a language model is disclosed.Type: ApplicationFiled: March 29, 2005Publication date: August 4, 2005Applicant: Microsoft CorporationInventors: Yun-cheng Ju, Fileno Alleva
-
Patent number: 6917918Abstract: An unsupervised adaptation method and apparatus are provided that reduce the storage and time requirements associated with adaptation. Under the invention, utterances are converted into feature vectors, which are decoded to produce a transcript and alignment unit boundaries for the utterance. Individual alignment units and the feature vectors associated with those alignment units are then provided to an alignment function, which aligns the feature vectors with the states of each alignment unit. Because the alignment is performed within alignment unit boundaries, fewer feature vectors are used and the time for alignment is reduced. After alignment, the feature vector dimensions aligned to a state are added to dimension sums that are kept for that state. After all the states in an utterance have had their sums updated, the speech signal and the alignment units are deleted. Once sufficient frames of data have been received to perform adaptive training, the acoustic model is adapted.Type: GrantFiled: December 22, 2000Date of Patent: July 12, 2005Assignee: Microsoft CorporationInventors: William H. Rockenbeck, Milind V. Mahajan, Fileno A. Alleva