Patents by Inventor Horacio Franco

Horacio Franco has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Time-frequency convolutional neural network with bottleneck architecture for query-by-example processing

Patent number: 10777188

Abstract: A computing system determines whether a reference audio signal contains a query. A time-frequency convolutional neural network (TFCNN) comprises a time and frequency convolutional layers and a series of additional layers, which include a bottleneck layer. The computation engine applies the TFCNN to samples of a query utterance at least through the bottleneck layer. A query feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the query utterance. The computation engine also applies the TFCNN to samples of the reference audio signal at least through the bottleneck layer. A reference feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the reference audio signal. The computation engine determines at least one detection score based on the query feature vector and the reference feature vector.

Type: Grant

Filed: November 14, 2018

Date of Patent: September 15, 2020

Assignee: SRI International

Inventors: Julien van Hout, Vikramjit Mitra, Horacio Franco, Emre Yilmaz
TIME-FREQUENCY CONVOLUTIONAL NEURAL NETWORK WITH BOTTLENECK ARCHITECTURE FOR QUERY-BY-EXAMPLE PROCESSING

Publication number: 20200152179

Abstract: A computing system determines whether a reference audio signal contains a query. A time-frequency convolutional neural network (TFCNN) comprises a time and frequency convolutional layers and a series of additional layers, which include a bottleneck layer. The computation engine applies the TFCNN to samples of a query utterance at least through the bottleneck layer. A query feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the query utterance. The computation engine also applies the TFCNN to samples of the reference audio signal at least through the bottleneck layer. A reference feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the reference audio signal. The computation engine determines at least one detection score based on the query feature vector and the reference feature vector.

Type: Application

Filed: November 14, 2018

Publication date: May 14, 2020

Inventors: Julien van Hout, Vikramjit Mitra, Horacio Franco, Emre Yilmaz
Method and apparatus for adding new vocabulary to interactive translation and dialogue systems

Patent number: 9576570

Abstract: The present invention relates to a method and apparatus for adding new vocabulary to interactive translation and dialog systems. In one embodiment, a method for adding a new word to a vocabulary of an interactive dialog includes receiving an input signal that includes at least one word not currently in the vocabulary, inserting the word into a dynamic component of a search graph associated with the vocabulary, and compiling the dynamic component independently of a permanent component of the search graph to produce a new sub-grammar, where the permanent component comprises a plurality of words that are permanently part of the search graph.

Type: Grant

Filed: July 30, 2010

Date of Patent: February 21, 2017

Assignee: SRI INTERNATIONAL

Inventors: Kristin Precoda, Horacio Franco, Jing Zheng, Michael Frandsen, Victor Abrash, Murat Akbacak, Andreas Stolcke
Method and apparatus for conducting an interactive dialogue

Patent number: 8527270

Abstract: The present invention relates to a method and apparatus for enhancing interactive translation and dialogue systems. In one embodiment, a method for conducting an interactive dialogue includes receiving an input signal in a first language, where the input signal includes one or more words, processing the words in accordance with a vocabulary, and adjusting a probability relating to at least one of the words in the vocabulary for an output signal. Subsequently, the method may output a translation of the input signal in a second language, in accordance with the vocabulary. In one embodiment, adjusting the probability involves adjusting a probability that the word will be used in actual conversation.

Type: Grant

Filed: July 30, 2010

Date of Patent: September 3, 2013

Assignee: SRI International

Inventors: Kristin Precoda, Horacio Franco
METHOD AND APPARATUS FOR ADDING NEW VOCABULARY TO INTERACTIVE TRANSLATION AND DIALOGUE SYSTEMS

Publication number: 20120029904

Abstract: The present invention relates to a method and apparatus for adding new vocabulary to interactive translation and dialogue systems. In one embodiment, a method for adding a new word to a vocabulary of an interactive dialogue includes receiving an input signal that includes at least one word not currently in the vocabulary, inserting the word into a dynamic component of a search graph associated with the vocabulary, and compiling the dynamic component independently of a permanent component of the search graph to produce a new sub-grammar, where the permanent component comprises a plurality of words that are permanently part of the search graph.

Type: Application

Filed: July 30, 2010

Publication date: February 2, 2012

Inventors: KRISTIN PRECODA, HORACIO FRANCO, JING ZHENG, MICHAEL FRANDSEN, VICTOR ABRASH, MURAT AKBACAK, ANDREAS STOLCKE
METHOD AND APPARATUS FOR ENHANCING INTERACTIVE TRANSLATION AND DIALOGUE SYSTEMS

Publication number: 20120029903

Abstract: The present invention relates to a method and apparatus for enhancing interactive translation and dialogue systems. In one embodiment, a method for conducting an interactive dialogue includes receiving an input signal in a first language, where the input signal includes one or more words, processing the words in accordance with a vocabulary, and adjusting a probability relating to at least one of the words in the vocabulary for an output signal. Subsequently, the method may output a translation of the input signal in a second language, in accordance with the vocabulary. In one embodiment, adjusting the probability involves adjusting a probability that the word will be used in actual conversation.

Type: Application

Filed: July 30, 2010

Publication date: February 2, 2012

Inventors: KRISTIN PRECODA, HORACIO FRANCO
Method and apparatus for error correction in speech recognition applications

Patent number: 7756710

Abstract: In one embodiment, the present invention is a method and apparatus for error correction in speech recognition applications. In one embodiment, a method for recognizing user speech includes receiving a first utterance from the user, receiving a subsequent utterance from the user, and combining acoustic evidence from the first utterance with acoustic evidence from the subsequent utterance in order to recognize the first utterance. It is assumed that, if the first utterance has been incorrectly recognized on a first attempt, the user will repeat the first utterance (or at least the incorrectly recognized portion of the first utterance) in the subsequent utterance.

Type: Grant

Filed: July 13, 2006

Date of Patent: July 13, 2010

Assignee: SRI International

Inventors: Horacio Franco, Gregory Myers, Jing Zheng, Federico Cesari, Cregg Cowan
METHOD AND APPARATUS FOR ERROR CORRECTION IN SPEECH RECOGNITION APPLICATIONS

Publication number: 20100125458

Abstract: In one embodiment, the present invention is a method and apparatus for error correction in speech recognition applications. In one embodiment, a method for recognizing user speech includes receiving a first utterance from the user, receiving a subsequent utterance from the user, and combining acoustic evidence from the first utterance with acoustic evidence from the subsequent utterance in order to recognize the first utterance. It is assumed that, if the first utterance has been incorrectly recognized on a first attempt, the user will repeat the first utterance (or at least the incorrectly recognized portion of the first utterance) in the subsequent utterance.

Type: Application

Filed: July 13, 2006

Publication date: May 20, 2010

Inventors: Horacio Franco, Gregory Myers, Jing Zheng, Federico Cesari, Cregg Cowan
Method and apparatus for obtaining complete speech signals for speech recognition applications

Patent number: 7610199

Abstract: The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

Type: Grant

Filed: September 1, 2005

Date of Patent: October 27, 2009

Assignee: SRI International

Inventors: Victor Abrash, Federico Cesari, Horacio Franco, Christopher George, Jing Zheng
Method and apparatus for recognizing speech in a noisy environment

Patent number: 7571095

Abstract: An apparatus and a concomitant method for recognizing speech in a noisy environment are provided. The present method includes applying a first interpolation weight to a clean speech model to produce a weighted clean speech model, applying a second interpolation weight to a noise model to produce a weighted noise model, and deriving a noisy speech model directly from the weighted clean speech model and the weighted noise model. At least one of the first interpolation weight and the second interpolation weight is computed in a maximum likelihood framework.

Type: Grant

Filed: August 31, 2005

Date of Patent: August 4, 2009

Assignee: SRI International

Inventors: Martin Graciarena, Horacio Franco, Venkata Ramana Rao Gadde
Method and apparatus for performing relational speech recognition

Patent number: 7533020

Abstract: A method and apparatus are provided for performing speech recognition using observable and meaningful relationships between words within a single utterance and using a structured data source as a source of constraints on the recognition process. Results from a first constrained speech recognition pass can be combined with information about the observable and meaningful word relationships to constrain or simplify subsequent recognition passes. This iterative process greatly reduces the search space required for each recognition pass, making the speech recognition process more efficient, faster and accurate.

Type: Grant

Filed: February 23, 2005

Date of Patent: May 12, 2009

Assignee: Nuance Communications, Inc.

Inventors: James F. Arnold, Michael W. Frandsen, Anand Venkataraman, Douglas A. Bercow, Gregory K. Myers, David J. Israel, Venkata Ramana Rao Gadde, Horacio Franco, Harry Bratt
Method and apparatus for obtaining complete speech signals for speech recognition applications

Publication number: 20060241948

Abstract: The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

Type: Application

Filed: September 1, 2005

Publication date: October 26, 2006

Inventors: Victor Abrash, Federico Cesari, Horacio Franco, Christopher George, Jing Zheng
Method and apparatus for recognizing speech in a noisy environment

Patent number: 7120580

Abstract: An apparatus and a concomitant method for speech recognition. In one embodiment, the present method is referred to as a “Dynamic Noise Compensation” (DNC) method where the method estimates the models for noisy speech using models for clean speech and a noise model. Specifically, the model for the noisy speech is estimated by interpolation between the clean speech model and the noise model. This approach reduces computational cycles and does not require large memory capacity.

Type: Grant

Filed: August 15, 2001

Date of Patent: October 10, 2006

Assignee: SRI International

Inventors: Venkata Ramana Rao Gadde, Horacio Franco, John Butzberger
Method and apparatus for recognizing speech in a noisy environment

Publication number: 20060195317

Abstract: An apparatus and a concomitant method for recognizing speech in a noisy environment are provided. The present method includes applying a first interpolation weight to a clean speech model to produce a weighted clean speech model, applying a second interpolation weight to a noise model to produce a weighted noise model, and deriving a noisy speech model directly from the weighted clean speech model and the weighted noise model. At least one of the first interpolation weight and the second interpolation weight is computed in a maximum likelihood framework.

Type: Application

Filed: August 31, 2005

Publication date: August 31, 2006

Inventors: Martin Graciarena, Horacio Franco, Venkata Gadde
Method and apparatus for performing relational speech recognition

Publication number: 20050234723

Abstract: A method and apparatus are provided for performing speech recognition using observable and meaningful relationships between words within a single utterance and using a structured data source as a source of constraints on the recognition process. Results from a first constrained speech recognition pass can be combined with information about the observable and meaningful word relationships to constrain or simplify subsequent recognition passes. This iterative process greatly reduces the search space required for each recognition pass, making the speech recognition process more efficient, faster and accurate.

Type: Application

Filed: February 23, 2005

Publication date: October 20, 2005

Inventors: James Arnold, Michael Frandsen, Anand Venkataraman, Douglas Bercow, Gregory Myers, David Israel, Venkata Gadde, Horacio Franco, Harry Bratt
Method and apparatus for fusion of recognition results from multiple types of data sources

Publication number: 20050125224

Abstract: A method and apparatus are provided for fusion of recognition results from multiple types of data sources. In one embodiment, the inventive method implementing a first processing technique to recognize at least a portion of terms contained in a first media source, implementing a second processing technique to recognize at least a portion of terms contained in a second media source that contains a different type of data than that contained in the first media source, and adapting the first processing technique based at least in part on results generated by the second processing technique.

Type: Application

Filed: November 8, 2004

Publication date: June 9, 2005

Inventors: Gregory Myers, Harry Bratt, Anand Venkataraman, Andreas Stolcke, Horacio Franco, Venkata Rao Gadde
Method and apparatus for speech recognition using a dynamic vocabulary

Publication number: 20050055210

Abstract: A method and apparatus are provided for performing speech recognition using a dynamic vocabulary. Results from a preliminary speech recognition pass can be used to update or refine a language model in order to improve the accuracy of search results and to simplify subsequent recognition passes. This iterative process greatly reduces the number of alternative hypotheses produced during each speech recognition pass, as well as the time required to process subsequent passes, making the speech recognition process faster, more efficient and more accurate. The iterative process is characterized by the use of results from one or more data set queries, where the keys used to query the data set, as well as the queries themselves, are constructed in a manner that produces more effective language models for use in subsequent attempts at decoding a given speech signal.

Type: Application

Filed: August 5, 2004

Publication date: March 10, 2005

Inventors: Anand Venkataraman, Horacio Franco, Douglas A. Bercow
Method and system for automatic text-independent grading of pronunciation for language instruction

Patent number: 6226611

Abstract: Pronunciation quality is automatically evaluated for an utterance of speech based on one or more pronunciation scores. One type of pronunciation score is based on duration of acoustic units. Examples of acoustic units include phones and syllables. Another type of pronunciation score is based on a posterior probability that a piece of input speech corresponds to a certain model such as an HMM, given the piece of input speech. Speech may be segmented into phones and syllables for evaluation with respect to the models. The utterance of speech may be an arbitrary utterance made up of a sequence of words which had not been encountered before. Pronunciation scores are converted into grades as would be assigned by human graders. Pronunciation quality may be evaluated in a client-server language instruction environment.

Type: Grant

Filed: January 26, 2000

Date of Patent: May 1, 2001

Assignee: SRI International

Inventors: Leonardo Neumeyer, Horacio Franco, Mitchel Weintraub, Patti Price, Vassilios Digalakis
Method and apparatus for automatic text-independent grading of pronunciation for language instruction

Patent number: 6055498

Abstract: Pronunciation quality is automatically evaluated for an utterance of speech based on one or more pronunciation scores. One type of pronunciation score is based on duration of acoustic units. Examples of acoustic units include phones and syllables. Another type of pronunciation score is based on a posterior probability that a piece of input speech corresponds to a certain model, such as a hidden Markov model, given the piece of input speech. Speech may be segmented into phones and syllable for evaluation with respect to the models. The utterance of speech may be an arbitrary utterance made up of a sequence of words which had not been encountered before. Pronunciation scores are converted into grades as would be assigned by human graders. Pronunciation quality may be evaluated in a client-server language instruction environment.

Type: Grant

Filed: October 2, 1997

Date of Patent: April 25, 2000

Assignee: SRI International

Inventors: Leonardo Neumeyer, Horacio Franco, Mitchel Weintraub, Patti Price, Vassilios Digalakis