Patents by Inventor Horacio Franco
Horacio Franco has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10777188Abstract: A computing system determines whether a reference audio signal contains a query. A time-frequency convolutional neural network (TFCNN) comprises a time and frequency convolutional layers and a series of additional layers, which include a bottleneck layer. The computation engine applies the TFCNN to samples of a query utterance at least through the bottleneck layer. A query feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the query utterance. The computation engine also applies the TFCNN to samples of the reference audio signal at least through the bottleneck layer. A reference feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the reference audio signal. The computation engine determines at least one detection score based on the query feature vector and the reference feature vector.Type: GrantFiled: November 14, 2018Date of Patent: September 15, 2020Assignee: SRI InternationalInventors: Julien van Hout, Vikramjit Mitra, Horacio Franco, Emre Yilmaz
-
Publication number: 20200152179Abstract: A computing system determines whether a reference audio signal contains a query. A time-frequency convolutional neural network (TFCNN) comprises a time and frequency convolutional layers and a series of additional layers, which include a bottleneck layer. The computation engine applies the TFCNN to samples of a query utterance at least through the bottleneck layer. A query feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the query utterance. The computation engine also applies the TFCNN to samples of the reference audio signal at least through the bottleneck layer. A reference feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the reference audio signal. The computation engine determines at least one detection score based on the query feature vector and the reference feature vector.Type: ApplicationFiled: November 14, 2018Publication date: May 14, 2020Inventors: Julien van Hout, Vikramjit Mitra, Horacio Franco, Emre Yilmaz
-
Patent number: 9576570Abstract: The present invention relates to a method and apparatus for adding new vocabulary to interactive translation and dialog systems. In one embodiment, a method for adding a new word to a vocabulary of an interactive dialog includes receiving an input signal that includes at least one word not currently in the vocabulary, inserting the word into a dynamic component of a search graph associated with the vocabulary, and compiling the dynamic component independently of a permanent component of the search graph to produce a new sub-grammar, where the permanent component comprises a plurality of words that are permanently part of the search graph.Type: GrantFiled: July 30, 2010Date of Patent: February 21, 2017Assignee: SRI INTERNATIONALInventors: Kristin Precoda, Horacio Franco, Jing Zheng, Michael Frandsen, Victor Abrash, Murat Akbacak, Andreas Stolcke
-
Patent number: 8527270Abstract: The present invention relates to a method and apparatus for enhancing interactive translation and dialogue systems. In one embodiment, a method for conducting an interactive dialogue includes receiving an input signal in a first language, where the input signal includes one or more words, processing the words in accordance with a vocabulary, and adjusting a probability relating to at least one of the words in the vocabulary for an output signal. Subsequently, the method may output a translation of the input signal in a second language, in accordance with the vocabulary. In one embodiment, adjusting the probability involves adjusting a probability that the word will be used in actual conversation.Type: GrantFiled: July 30, 2010Date of Patent: September 3, 2013Assignee: SRI InternationalInventors: Kristin Precoda, Horacio Franco
-
Publication number: 20120029904Abstract: The present invention relates to a method and apparatus for adding new vocabulary to interactive translation and dialogue systems. In one embodiment, a method for adding a new word to a vocabulary of an interactive dialogue includes receiving an input signal that includes at least one word not currently in the vocabulary, inserting the word into a dynamic component of a search graph associated with the vocabulary, and compiling the dynamic component independently of a permanent component of the search graph to produce a new sub-grammar, where the permanent component comprises a plurality of words that are permanently part of the search graph.Type: ApplicationFiled: July 30, 2010Publication date: February 2, 2012Inventors: KRISTIN PRECODA, HORACIO FRANCO, JING ZHENG, MICHAEL FRANDSEN, VICTOR ABRASH, MURAT AKBACAK, ANDREAS STOLCKE
-
Publication number: 20120029903Abstract: The present invention relates to a method and apparatus for enhancing interactive translation and dialogue systems. In one embodiment, a method for conducting an interactive dialogue includes receiving an input signal in a first language, where the input signal includes one or more words, processing the words in accordance with a vocabulary, and adjusting a probability relating to at least one of the words in the vocabulary for an output signal. Subsequently, the method may output a translation of the input signal in a second language, in accordance with the vocabulary. In one embodiment, adjusting the probability involves adjusting a probability that the word will be used in actual conversation.Type: ApplicationFiled: July 30, 2010Publication date: February 2, 2012Inventors: KRISTIN PRECODA, HORACIO FRANCO
-
Patent number: 7756710Abstract: In one embodiment, the present invention is a method and apparatus for error correction in speech recognition applications. In one embodiment, a method for recognizing user speech includes receiving a first utterance from the user, receiving a subsequent utterance from the user, and combining acoustic evidence from the first utterance with acoustic evidence from the subsequent utterance in order to recognize the first utterance. It is assumed that, if the first utterance has been incorrectly recognized on a first attempt, the user will repeat the first utterance (or at least the incorrectly recognized portion of the first utterance) in the subsequent utterance.Type: GrantFiled: July 13, 2006Date of Patent: July 13, 2010Assignee: SRI InternationalInventors: Horacio Franco, Gregory Myers, Jing Zheng, Federico Cesari, Cregg Cowan
-
Publication number: 20100125458Abstract: In one embodiment, the present invention is a method and apparatus for error correction in speech recognition applications. In one embodiment, a method for recognizing user speech includes receiving a first utterance from the user, receiving a subsequent utterance from the user, and combining acoustic evidence from the first utterance with acoustic evidence from the subsequent utterance in order to recognize the first utterance. It is assumed that, if the first utterance has been incorrectly recognized on a first attempt, the user will repeat the first utterance (or at least the incorrectly recognized portion of the first utterance) in the subsequent utterance.Type: ApplicationFiled: July 13, 2006Publication date: May 20, 2010Inventors: Horacio Franco, Gregory Myers, Jing Zheng, Federico Cesari, Cregg Cowan
-
Patent number: 7610199Abstract: The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.Type: GrantFiled: September 1, 2005Date of Patent: October 27, 2009Assignee: SRI InternationalInventors: Victor Abrash, Federico Cesari, Horacio Franco, Christopher George, Jing Zheng
-
Patent number: 7571095Abstract: An apparatus and a concomitant method for recognizing speech in a noisy environment are provided. The present method includes applying a first interpolation weight to a clean speech model to produce a weighted clean speech model, applying a second interpolation weight to a noise model to produce a weighted noise model, and deriving a noisy speech model directly from the weighted clean speech model and the weighted noise model. At least one of the first interpolation weight and the second interpolation weight is computed in a maximum likelihood framework.Type: GrantFiled: August 31, 2005Date of Patent: August 4, 2009Assignee: SRI InternationalInventors: Martin Graciarena, Horacio Franco, Venkata Ramana Rao Gadde
-
Patent number: 7533020Abstract: A method and apparatus are provided for performing speech recognition using observable and meaningful relationships between words within a single utterance and using a structured data source as a source of constraints on the recognition process. Results from a first constrained speech recognition pass can be combined with information about the observable and meaningful word relationships to constrain or simplify subsequent recognition passes. This iterative process greatly reduces the search space required for each recognition pass, making the speech recognition process more efficient, faster and accurate.Type: GrantFiled: February 23, 2005Date of Patent: May 12, 2009Assignee: Nuance Communications, Inc.Inventors: James F. Arnold, Michael W. Frandsen, Anand Venkataraman, Douglas A. Bercow, Gregory K. Myers, David J. Israel, Venkata Ramana Rao Gadde, Horacio Franco, Harry Bratt
-
Publication number: 20060241948Abstract: The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.Type: ApplicationFiled: September 1, 2005Publication date: October 26, 2006Inventors: Victor Abrash, Federico Cesari, Horacio Franco, Christopher George, Jing Zheng
-
Patent number: 7120580Abstract: An apparatus and a concomitant method for speech recognition. In one embodiment, the present method is referred to as a “Dynamic Noise Compensation” (DNC) method where the method estimates the models for noisy speech using models for clean speech and a noise model. Specifically, the model for the noisy speech is estimated by interpolation between the clean speech model and the noise model. This approach reduces computational cycles and does not require large memory capacity.Type: GrantFiled: August 15, 2001Date of Patent: October 10, 2006Assignee: SRI InternationalInventors: Venkata Ramana Rao Gadde, Horacio Franco, John Butzberger
-
Publication number: 20060195317Abstract: An apparatus and a concomitant method for recognizing speech in a noisy environment are provided. The present method includes applying a first interpolation weight to a clean speech model to produce a weighted clean speech model, applying a second interpolation weight to a noise model to produce a weighted noise model, and deriving a noisy speech model directly from the weighted clean speech model and the weighted noise model. At least one of the first interpolation weight and the second interpolation weight is computed in a maximum likelihood framework.Type: ApplicationFiled: August 31, 2005Publication date: August 31, 2006Inventors: Martin Graciarena, Horacio Franco, Venkata Gadde
-
Publication number: 20050234723Abstract: A method and apparatus are provided for performing speech recognition using observable and meaningful relationships between words within a single utterance and using a structured data source as a source of constraints on the recognition process. Results from a first constrained speech recognition pass can be combined with information about the observable and meaningful word relationships to constrain or simplify subsequent recognition passes. This iterative process greatly reduces the search space required for each recognition pass, making the speech recognition process more efficient, faster and accurate.Type: ApplicationFiled: February 23, 2005Publication date: October 20, 2005Inventors: James Arnold, Michael Frandsen, Anand Venkataraman, Douglas Bercow, Gregory Myers, David Israel, Venkata Gadde, Horacio Franco, Harry Bratt
-
Publication number: 20050125224Abstract: A method and apparatus are provided for fusion of recognition results from multiple types of data sources. In one embodiment, the inventive method implementing a first processing technique to recognize at least a portion of terms contained in a first media source, implementing a second processing technique to recognize at least a portion of terms contained in a second media source that contains a different type of data than that contained in the first media source, and adapting the first processing technique based at least in part on results generated by the second processing technique.Type: ApplicationFiled: November 8, 2004Publication date: June 9, 2005Inventors: Gregory Myers, Harry Bratt, Anand Venkataraman, Andreas Stolcke, Horacio Franco, Venkata Rao Gadde
-
Publication number: 20050055210Abstract: A method and apparatus are provided for performing speech recognition using a dynamic vocabulary. Results from a preliminary speech recognition pass can be used to update or refine a language model in order to improve the accuracy of search results and to simplify subsequent recognition passes. This iterative process greatly reduces the number of alternative hypotheses produced during each speech recognition pass, as well as the time required to process subsequent passes, making the speech recognition process faster, more efficient and more accurate. The iterative process is characterized by the use of results from one or more data set queries, where the keys used to query the data set, as well as the queries themselves, are constructed in a manner that produces more effective language models for use in subsequent attempts at decoding a given speech signal.Type: ApplicationFiled: August 5, 2004Publication date: March 10, 2005Inventors: Anand Venkataraman, Horacio Franco, Douglas A. Bercow
-
Patent number: 6226611Abstract: Pronunciation quality is automatically evaluated for an utterance of speech based on one or more pronunciation scores. One type of pronunciation score is based on duration of acoustic units. Examples of acoustic units include phones and syllables. Another type of pronunciation score is based on a posterior probability that a piece of input speech corresponds to a certain model such as an HMM, given the piece of input speech. Speech may be segmented into phones and syllables for evaluation with respect to the models. The utterance of speech may be an arbitrary utterance made up of a sequence of words which had not been encountered before. Pronunciation scores are converted into grades as would be assigned by human graders. Pronunciation quality may be evaluated in a client-server language instruction environment.Type: GrantFiled: January 26, 2000Date of Patent: May 1, 2001Assignee: SRI InternationalInventors: Leonardo Neumeyer, Horacio Franco, Mitchel Weintraub, Patti Price, Vassilios Digalakis
-
Patent number: 6055498Abstract: Pronunciation quality is automatically evaluated for an utterance of speech based on one or more pronunciation scores. One type of pronunciation score is based on duration of acoustic units. Examples of acoustic units include phones and syllables. Another type of pronunciation score is based on a posterior probability that a piece of input speech corresponds to a certain model, such as a hidden Markov model, given the piece of input speech. Speech may be segmented into phones and syllable for evaluation with respect to the models. The utterance of speech may be an arbitrary utterance made up of a sequence of words which had not been encountered before. Pronunciation scores are converted into grades as would be assigned by human graders. Pronunciation quality may be evaluated in a client-server language instruction environment.Type: GrantFiled: October 2, 1997Date of Patent: April 25, 2000Assignee: SRI InternationalInventors: Leonardo Neumeyer, Horacio Franco, Mitchel Weintraub, Patti Price, Vassilios Digalakis