Patents by Inventor Geoffrey G. Zweig

Geoffrey G. Zweig has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20110224982
    Abstract: Described is a technology in which information retrieval (IR) techniques are used in a speech recognition (ASR) system. Acoustic units (e.g., phones, syllables, multi-phone units, words and/or phrases) are decoded, and features found from those acoustic units. The features are then used with IR techniques (e.g., TF-IDF based retrieval) to obtain a target output (a word or words).
    Type: Application
    Filed: March 12, 2010
    Publication date: September 15, 2011
    Applicant: c/o Microsoft Corporation
    Inventors: Alejandro Acero, James Garnet Droppo, III, Xiaoqiang Xiao, Geoffrey G. Zweig
  • Publication number: 20100076765
    Abstract: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.
    Type: Application
    Filed: September 19, 2008
    Publication date: March 25, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Geoffrey G. Zweig, Xiao Li, Dan Bohus, Alejandro Acero, Eric J. Horvitz
  • Publication number: 20100076752
    Abstract: The described implementations relate to automated data cleanup. One system includes a language model generated from language model seed text and a dictionary of possible data substitutions. This system also includes a transducer configured to cleanse a corpus utilizing the language model and the dictionary.
    Type: Application
    Filed: September 17, 2009
    Publication date: March 25, 2010
    Inventors: Geoffrey G. Zweig, Yun-Cheng Ju
  • Publication number: 20080312921
    Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.
    Type: Application
    Filed: August 20, 2008
    Publication date: December 18, 2008
    Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Rameah A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
  • Patent number: 7464031
    Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.
    Type: Grant
    Filed: November 28, 2003
    Date of Patent: December 9, 2008
    Assignee: International Business Machines Corporation
    Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Ramesh A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
  • Publication number: 20080281806
    Abstract: A database having listings rather than long documents is searched using a term frequency-inverse document frequency (Tf/Idf) algorithm.
    Type: Application
    Filed: May 10, 2007
    Publication date: November 13, 2008
    Applicant: Microsoft Corporation
    Inventors: Ye-Yi Wang, Dong Yu, Yun-Cheng Ju, Alejandro Acero, Geoffrey G. Zweig
  • Patent number: 7251599
    Abstract: Methods and arrangements for facilitating database access in speech recognition. A plurality of possible subsequences corresponding to a database entry are ascertained, a record of such subsequences and their correspondence to database entries is created, and either or both of the following are carried out: unique signatures are ascertained via determining whether a subsequence corresponding to a given database entry does not also correspond to at least one other database entry; and/or multiple occurrences of a given subsequence are found, with corresponding database entries being grouped into a confusion set.
    Type: Grant
    Filed: December 10, 2002
    Date of Patent: July 31, 2007
    Assignee: International Business Machines Corporation
    Inventors: Benoit Maison, Geoffrey G. Zweig
  • Patent number: 7216077
    Abstract: Methods and arrangements using lattice-based information for unsupervised speaker adaptation. By performing adaptation against a word lattice, correct models are more likely to be used in estimating a transform. Further, a particular type of lattice proposed herein enables the use of a natural confidence measure given by the posterior occupancy probability of a state, that is, the statistics of a particular state will be updated with the current frame only if the a posteriori probability of the state at that particular time is greater than a predetermined threshold.
    Type: Grant
    Filed: September 26, 2000
    Date of Patent: May 8, 2007
    Assignee: International Business Machines Corporation
    Inventors: Mukund Padmanabhan, George A. Saon, Geoffrey G. Zweig
  • Patent number: 6842796
    Abstract: Techniques are provided for enumerating regularly identifiable or stereotypical phrases that people commonly use to convey particular information, and where exactly in these phrases the particular information is to be found. In one embodiment, such phrases are referred to as “regular expressions.” Using such enumerated phrases, the invention is able to automatically identify them in an input data stream and then identify and extract the particular information associated with the phrase that is being sought, e.g., important or relevant information.
    Type: Grant
    Filed: July 3, 2001
    Date of Patent: January 11, 2005
    Assignee: International Business Machines Corporation
    Inventors: Geoffrey G. Zweig, Mukund Padmanabhan
  • Publication number: 20040111262
    Abstract: Methods and arrangements for facilitating database access in speech recognition. A plurality of possible subsequences corresponding to a database entry are ascertained, a record of such subsequences and their correspondence to database entries is created, and either or both of the following are carried out: unique signatures are ascertained via determining whether a subsequence corresponding to a given database entry does not also correspond to at least one other database entry; and/or multiple occurrences of a given subsequence are found, with corresponding database entries being grouped into a confusion set.
    Type: Application
    Filed: December 10, 2002
    Publication date: June 10, 2004
    Applicant: IBM Corporation
    Inventors: Benoit Maison, Geoffrey G. Zweig
  • Publication number: 20030050782
    Abstract: Techniques are provided for enumerating regularly identifiable or stereotypical phrases that people commonly use to convey particular information, and where exactly in these phrases the particular information is to be found. In one embodiment, such phrases are referred to as “regular expressions.” Using such enumerated phrases, the invention is able to automatically identify them in an input data stream and then identify and extract the particular information associated with the phrase that is being sought, e.g., important or relevant information.
    Type: Application
    Filed: July 3, 2001
    Publication date: March 13, 2003
    Applicant: International Business Machines Corporation
    Inventors: Geoffrey G. Zweig, Mukund Padmanabhan
  • Patent number: 6411933
    Abstract: A method of validating production of a biometric attribute allegedly associated with a user comprises the following steps. A first signal is generated representing data associated with the biometric attribute allegedly received in association with the user. A second signal is also generated representing data associated with at least one feature detected in association with the production of the biometric attribute allegedly received from the user. Then, the first signal and the second signal are compared to determine a correlation level between the biometric attribute and the production feature, wherein the validation of the production of the biometric attribute depends on the correlation level. Accordingly, the invention serves to provide substantial assurance that the biometric attribute offered by the user has been physically generated by the user.
    Type: Grant
    Filed: November 22, 1999
    Date of Patent: June 25, 2002
    Assignee: International Business Machines Corporation
    Inventors: Stephane Herman Maes, Geoffrey G. Zweig
  • Patent number: 6349296
    Abstract: A computer-implemented method determines the resemblance of data objects such as Web pages. Each data object is partitioned into a sequence of tokens. The tokens are grouped into overlapping sets of the tokens to form shingles. Each shingle is represented by a unique identification element encoded as a fingerprint. A minimum element from each of the images of the set of fingerprints associated with a document under each of a plurality of pseudo random permutations of the set of all fingerprints are selected to generate a sketch of each data object. The sketches characterize the resemblance of the data objects. The sketches can be further partitioned into a plurality of groups. Each group is fingerprinted to form a feature. Data objects that share more than a certain numbers of features are estimated to be nearly identical.
    Type: Grant
    Filed: August 21, 2000
    Date of Patent: February 19, 2002
    Assignee: AltaVista Company
    Inventors: Andrei Z. Broder, Steven C. Glassman, Charles G. Nelson, Mark S. Manasse, Geoffrey G. Zweig
  • Patent number: 6119124
    Abstract: A computer-implemented method determines the resemblance of data objects such as Web pages. Each data object is partitioned into a sequence of tokens. The tokens are grouped into overlapping sets of the tokens to form shingles. Each shingle is represented by a unique identification element encoded as a fingerprint. A minimum element from each of the images of the set of fingerprints associated with a document under each of a plurality of pseudo random permutations of the set of all fingerprints are selected to generate a sketch of each data object. The sketches characterize the resemblance of the data objects. The sketches can be further partitioned into a plurality of groups. Each group is fingerprinted to form a feature. Data objects that share more than a certain numbers of features are estimated to be nearly identical.
    Type: Grant
    Filed: March 26, 1998
    Date of Patent: September 12, 2000
    Assignee: Digital Equipment Corporation
    Inventors: Andrei Z. Broder, Steven C. Glassman, Charles G. Nelson, Mark S. Manasse, Geoffrey G. Zweig