Patents by Inventor Mehryar Mohri

Mehryar Mohri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for regularly approximating context-free grammars through transformation

Patent number: 7716041

Abstract: Context-free grammars generally comprise a large number of rules, where each rule defines how a string of symbols is generated from a different series of symbols. While techniques for creating finite-state automata from the rules of context-free grammars exist, these techniques require an input grammar to be strongly regular. Systems and methods that convert the rules of a context-free grammar into a strongly regular grammar include transforming each input rule into a set of output rules that approximate the input rule. The output rules are all right- or left-linear and are strongly regular. In various exemplary embodiments, the output rules are output in a specific format that specifies, for each rule, the left-hand non-terminal symbol, a single right-hand non-terminal symbol, and zero, one or more terminal symbols. If the input context-free grammar rule is weighted, the weight of that rule is distributed and assigned to the output rules.

Type: Grant

Filed: September 18, 2007

Date of Patent: May 11, 2010

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Mehryar Mohri
System and method of epsilon removal of weighted automata and transducers

Patent number: 7634408

Abstract: An improved ?-removal method is disclosed that computes for any input weighted automaton A with ?-transitions an equivalent weighted automaton B with no ?-transitions. The method comprises two main steps. The first step comprises computing for each state “p” of the automaton A its ?-closure. The second step in the method comprises modifying the outgoing transitions of each state “p” by removing those labeled with ?. The method next comprises adding to the set of transitions leaving the state “p” non-?-transitions leaving each state “q” in the set of states reachable from “p” via a path labeled with ? with their weights pre-{circle around (x)}multiplied by the ?-distance from state “p” to state “q” in the automaton A. State “p” is a final state if some state “q” within the set of states reachable from “p” via a path labeled with ? is final and the final weight ? ? [ p ] = ? q ? ? ? [ p ] ? ? F ? ( d ? [ p , q ] ? p ? [ q ] ) .

Type: Grant

Filed: April 28, 2008

Date of Patent: December 15, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Mehryar Mohri
SYSTEMS AND METHODS FOR GENERATING WEIGHTED FINITE-STATE AUTOMATA REPRESENTING GRAMMARS

Publication number: 20080243484

Abstract: A context-free grammar can be represented by a weighted finite-state transducer. This representation can be used to efficiently compile that grammar into a weighted finite-state automaton that accepts the strings allowed by the grammar with the corresponding weights. The rules of a context-free grammar are input. A finite-state automaton is generated from the input rules. Strongly connected components of the finite-state automaton are identified. An automaton is generated for each strongly connected component. A topology that defines a number of states, and that uses active ones of the non-terminal symbols of the context-free grammar as the labels between those states, is defined. The topology is expanded by replacing a transition, and its beginning and end states, with the automaton that includes, as a state, the symbol used as the label on that transition. The topology can be fully expanded or dynamically expanded as required to recognize a particular input string.

Type: Application

Filed: June 6, 2008

Publication date: October 2, 2008

Applicant: AT&T CORP.

Inventors: Mehryar Mohri, Mark-Jan Nederhof
Systems and methods for generating weighted finite-state automata representing grammars

Patent number: 7398197

Abstract: A context-free grammar can be represented by a weighted finite-state transducer. This representation can be used to efficiently compile that grammar into a weighted finite-state automaton that accepts the strings allowed by the grammar with the corresponding weights. The rules of a context-free grammar are input. A finite-state automaton is generated from the input rules. Strongly connected components of the finite-state automaton are identified. An automaton is generated for each strongly connected component. A topology that defines a number of states, and that uses active ones of the non-terminal symbols of the context-free grammar as the labels between those states, is defined. The topology is expanded by replacing a transition, and its beginning and end states, with the automaton that includes, as a state, the symbol used as the label on that transition. The topology can be fully expanded or dynamically expanded as required to recognize a particular input string.

Type: Grant

Filed: December 5, 2006

Date of Patent: July 8, 2008

Assignee: AT&T Corp.

Inventors: Mehryar Mohri, Mark-Jan Nederhof
System and method of epsilon removal of weighted automata and transducers

Patent number: 7383185

Abstract: An improved ?-removal method is disclosed that computes for any input weighted automaton A with ?-transitions an equivalent weighted automation B with no ?-transitions. The method comprises two main steps. The first step comprises computing for each state “p” of the automaton A its ?-closure. The second step in the method comprises modifying the outgoing transitions of each state “p” by removing those labeled with ?. The method next comprises adding to the set of transitions leaving the state “p” non-?-transitions leaving each state “q” in the set of states reachable from “p” via a path labeled with ? with their weights pre-{circle around (x)}-multiplied by the ?-distance from state “p” to state “q” in the automaton A. State “p” is a final state if some state “q” within the set of states reachable from “p” via a path labeled with ? is final and the final weight ? ? [ p ] = ? q ? ? e ? [ p ] ? F ? ( d ? [ p , q ] ? ? ? [ q ] ) .

Type: Grant

Filed: August 29, 2005

Date of Patent: June 3, 2008

Assignee: AT&T Corp.

Inventor: Mehryar Mohri
Methods and apparatus for rapid acoustic unit selection from a large speech corpus

Patent number: 7369994

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. Accordingly, a method is disclosed for constructing an efficient concatenation cost database by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatention costs, and storing those concatenation costs likely to occur.

Type: Grant

Filed: May 4, 2006

Date of Patent: May 6, 2008

Assignee: AT&T Corp.

Inventors: Mark C. Beutnagel, Mehryar Mohri, Michael D. Riley
SYSTEMS AND METHODS FOR REGULARLY APPROXIMATING CONTEXT-FREE GRAMMARS THROUGH TRANSFORMATION

Publication number: 20080010059

Abstract: Context-free grammars generally comprise a large number of rules, where each rule defines how a string of symbols is generated from a different series of symbols. While techniques for creating finite-state automata from the rules of context-free grammars exist, these techniques require an input grammar to be strongly regular. Systems and methods that convert the rules of a context-free grammar into a strongly regular grammar include transforming each input rule into a set of output rules that approximate the input rule. The output rules are all right- or left-linear and are strongly regular. In various exemplary embodiments, the output rules are output in a specific format that specifies, for each rule, the left-hand non-terminal symbol, a single right-hand non-terminal symbol, and zero, one or more terminal symbols. If the input context-free grammar rule is weighted, the weight of that rule is distributed and assigned to the output rules.

Type: Application

Filed: September 18, 2007

Publication date: January 10, 2008

Applicant: AT & T Corp.

Inventor: Mehryar Mohri
SYSTEMS AND METHODS FOR DETERMINING THE DETERMINIZABILITY OF FINITE-STATE AUTOMATA AND TRANSDUCERS

Publication number: 20070299668

Abstract: Finite-state transducers and weighted finite-state automata may not be determinizable. The twins property can be used to characterize the determinizability of such devices. For a weighted finite-state automaton or transducer, that weighted finite-state automaton or transducer and its inverse are intersected or composed, respectively. The resulting device is checked to determine if it has the cycle-identity property. If not, the original weighted finite-state automaton or transducer is not determinizable. For a weighted or unweighted finite-state transducer, that device is checked to determine if it is functional. If not, that device is not determinizable. That device is then composed with its inverse. The composed device is checked to determine if every edge in the composed device having a cycle-accessible end state meets at least one of a number of conditions. If so, the original device has the twins property. If the original device has the twins property, then it is determinizable.

Type: Application

Filed: June 29, 2007

Publication date: December 27, 2007

Inventors: CYRIL ALLAUZEN, Mehryar Mohri
Systems and methods for regularly approximating context-free grammars through transformation

Patent number: 7289948

Abstract: Context-free grammars generally comprise a large number of rules, where each rule defines how a sting of symbols is generated from a different series of symbols. While techniques for creating finite-state automata from the rules of context-free grammars exist, these techniques require an input grammar to be strongly regular. Systems and methods that convert the rules of a context-free grammar into a strongly regular grammar include transforming each input rule into a set of output rules that approximate the input rule. The output rules are all right- or left-linear and are strongly regular. In various exemplary embodiments, the output rules are output in a specific format that specifies, for each rule, the left-hand non-terminal symbol, a single right-hand non-terminal symbol, and zero, one or more terminal symbols. If the input context-free grammar rule is weighted, the weight of that rule is distributed and assigned to the output rules.

Type: Grant

Filed: July 22, 2002

Date of Patent: October 30, 2007

Assignee: AT&T Corp.

Inventor: Mehryar Mohri
Systems and methods for determining the determinizability of finite-state automata and transducers

Patent number: 7240004

Abstract: Finite-state transducers and weighted finite-state automata may not be determinizable. The twins property can be used to characterize the determinizability of such devices. For a weighted finite-state automaton or transducer, that weighted finite-state automaton or transducer and its inverse are intersected or composed, respectively. The resulting device is checked to determine if it has the cycle-identity property. If not, the original weighted finite-state automaton or transducer is not determinizable. For a weighted or unweighted finite-state transducer, that device is checked to determine if it is functional. If not, that device is not determinizable. That device is then composed with its inverse. The composed device is checked to determine if every edge in the composed device having a cycle-accessible end state meets at least one of a number of conditions. If so, the original device has the twins property. If the original device has the twins property, then it is determinizable.

Type: Grant

Filed: June 20, 2002

Date of Patent: July 3, 2007

Assignee: AT&T Corp.

Inventors: Cyril Allauzen, Mehryar Mohri
Systems and methods for generating weighted finite-state automata representing grammars

Patent number: 7181386

Abstract: A context-free grammar can be represented by a weighted finite-state transducer. This representation can be used to efficiently compile that grammar into a weighted finite-state automaton that accepts the strings allowed by the grammar with the corresponding weights. The rules of a context-free grammar are input. A finite-state automaton is generated from the input rules. Strongly connected components of the finite-state automaton are identified. An automaton is generated for each strongly connected component. A topology that defines a number of states, and that uses active ones of the non-terminal symbols of the context-free grammar as the labels between those states, is defined. The topology is expanded by replacing a transition, and its beginning and end states, with the automaton that includes, as a state, the symbol used as the label on that transition. The topology can be fully expanded or dynamically expanded as required to recognize a particular input string.

Type: Grant

Filed: July 18, 2002

Date of Patent: February 20, 2007

Assignee: AT&T Corp.

Inventors: Mehryar Mohri, Mark-Jan Nederhof
Methods and apparatus for rapid acoustic unit selection from a large speech corpus

Patent number: 7082396

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice.

Type: Grant

Filed: December 19, 2003

Date of Patent: July 25, 2006

Assignee: AT&T Corp

Inventors: Mark C. Beutnagel, Mehryar Mohri, Michael D. Riley
System and method of ? removal of weighted automata and transducers

Patent number: 7027988

Abstract: An improved ?-removal method is disclosed that computes for any input weighted automaton A with ?-transitions an equivalent weighted automaton B with no ?-transitions. The method comprises two main steps. The first step comprises computing for each state “p” of the automaton A its ?-closure. The second step in the method comprises modifying the outgoing transitions of each state “p” by removing those labeled with ?. The method next comprises adding to the set of transitions leaving the state “p” non-?-transitions leaving each state “q” in the set of states reachable from “p” via a path labeled with ?with their weights pre--multiplied by the ?-distance from state “p” to state “q” in the automaton A. State “p” is a final state if some state “q” within the set of states reachable from “p” via a path labeled with ?is final and the final weight ? ? [ p ] = ? q ? e ? [ p ] ? F ? ( d ? [ p , q ] ? ? ? [ q ] ) .

Type: Grant

Filed: July 20, 2001

Date of Patent: April 11, 2006

Assignee: AT&T Corp.

Inventor: Mehryar Mohri
Methods and apparatus for rapid acoustic unit selection from a large speech corpus

Patent number: 6701295

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice.

Type: Grant

Filed: February 6, 2003

Date of Patent: March 2, 2004

Assignee: AT&T Corp.

Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
Method and apparatus for rapid acoustic unit selection from a large speech corpus

Patent number: 6697780

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice.

Type: Grant

Filed: April 25, 2000

Date of Patent: February 24, 2004

Assignee: AT&T Corp.

Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
Systems and methods for determining the N-best strings

Publication number: 20030187644

Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.

Type: Application

Filed: November 21, 2002

Publication date: October 2, 2003

Inventors: Mehryar Mohri, Michael Dennis Riley
System and methods for optimizing networks of weighted unweighted directed graphs

Patent number: 6587844

Abstract: Unweighted finite state automata may be used in speech recognition systems, but considerably reduce the speed and accuracy of the speech recognition system. Unfortunately, developing a suitable training corpus for a speech recognition task is time consuming and expensive, if it is even possible. Additionally, it is unlikely that a training corpus could adequately reflect the various probabilities for the word and/or phoneme combinations. Accordingly, such very-large-vocabulary speech recognition systems often must be used in an unweighted state. The directed graph optimizing systems and methods determine the shortest distances between source and end nodes of a weighted directed graph. These various directed graph optimizing systems and methods also reweight the directed graph based on the determined shortest distances, so that the weights are, for example, front weighted. Accordingly, searches through the directed graph that are based on the total weights of the paths taken will be more efficient.

Type: Grant

Filed: February 1, 2000

Date of Patent: July 1, 2003

Assignee: AT&T Corp.

Inventor: Mehryar Mohri
Systems and methods for generating weighted finite-state automata representing grammars

Publication number: 20030120480

Abstract: A context-free grammar can be represented by a weighted finite-state transducer. This representation can be used to efficiently compile that grammar into a weighted finite-state automaton that accepts the strings allowed by the grammar with the corresponding weights. The rules of a context-free grammar are input. A finite-state automaton is generated from the input rules. Strongly connected components of the finite-state automaton are identified. An automaton is generated for each strongly connected component. A topology that defines a number of states, and that uses active ones of the non-terminal symbols of the context-free grammar as the labels between those states, is defined. The topology is expanded by replacing a transition, and its beginning and end states, with the automaton that includes, as a state, the symbol used as the label on that transition. The topology can be fully expanded or dynamically expanded as required to recognize a particular input string.

Type: Application

Filed: July 18, 2002

Publication date: June 26, 2003

Inventors: Mehryar Mohri, Mark-Jan Nederhof
Methods and apparatus for rapid acoustic unit selection from a large speech corpus

Publication number: 20030115049

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice.

Type: Application

Filed: February 6, 2003

Publication date: June 19, 2003

Applicant: AT&T CORP.

Inventors: Mark Charles. Beutnagel, Mehryar Mohri, Michael Dennis Riley
Fully expanded context-dependent networks for speech recognition

Patent number: 6574597

Abstract: A large vocabulary speech recognizer including a combined weighted network of transducers reflecting fully expanded context-dependent modeling of pronunciations and language that can be used with a single-pass Viterbi or other coder based on sequences of labels provided by feature analysis of input speech.

Type: Grant

Filed: February 11, 2000

Date of Patent: June 3, 2003

Assignee: AT&T Corp.

Inventors: Mehryar Mohri, Michael Dennis Riley

prev 1 2 3 next