Patents by Inventor Geoffrey G. Zweig

Geoffrey G. Zweig has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

AUTOMATIC SPEECH RECOGNITION BASED UPON INFORMATION RETRIEVAL METHODS

Publication number: 20110224982

Abstract: Described is a technology in which information retrieval (IR) techniques are used in a speech recognition (ASR) system. Acoustic units (e.g., phones, syllables, multi-phone units, words and/or phrases) are decoded, and features found from those acoustic units. The features are then used with IR techniques (e.g., TF-IDF based retrieval) to obtain a target output (a word or words).

Type: Application

Filed: March 12, 2010

Publication date: September 15, 2011

Applicant: c/o Microsoft Corporation

Inventors: Alejandro Acero, James Garnet Droppo, III, Xiaoqiang Xiao, Geoffrey G. Zweig
STRUCTURED MODELS OF REPITITION FOR SPEECH RECOGNITION

Publication number: 20100076765

Abstract: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.

Type: Application

Filed: September 19, 2008

Publication date: March 25, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Geoffrey G. Zweig, Xiao Li, Dan Bohus, Alejandro Acero, Eric J. Horvitz
Automated Data Cleanup

Publication number: 20100076752

Abstract: The described implementations relate to automated data cleanup. One system includes a language model generated from language model seed text and a dictionary of possible data substitutions. This system also includes a transducer configured to cleanse a corpus utilizing the language model and the dictionary.

Type: Application

Filed: September 17, 2009

Publication date: March 25, 2010

Inventors: Geoffrey G. Zweig, Yun-Cheng Ju
SPEECH RECOGNITION UTILIZING MULTITUDE OF SPEECH FEATURES

Publication number: 20080312921

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Application

Filed: August 20, 2008

Publication date: December 18, 2008

Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Rameah A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
Speech recognition utilizing multitude of speech features

Patent number: 7464031

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Grant

Filed: November 28, 2003

Date of Patent: December 9, 2008

Assignee: International Business Machines Corporation

Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Ramesh A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
SEARCHING A DATABASE OF LISTINGS

Publication number: 20080281806

Abstract: A database having listings rather than long documents is searched using a term frequency-inverse document frequency (Tf/Idf) algorithm.

Type: Application

Filed: May 10, 2007

Publication date: November 13, 2008

Applicant: Microsoft Corporation

Inventors: Ye-Yi Wang, Dong Yu, Yun-Cheng Ju, Alejandro Acero, Geoffrey G. Zweig
Automatic construction of unique signatures and confusable sets for database access

Patent number: 7251599

Abstract: Methods and arrangements for facilitating database access in speech recognition. A plurality of possible subsequences corresponding to a database entry are ascertained, a record of such subsequences and their correspondence to database entries is created, and either or both of the following are carried out: unique signatures are ascertained via determining whether a subsequence corresponding to a given database entry does not also correspond to at least one other database entry; and/or multiple occurrences of a given subsequence are found, with corresponding database entries being grouped into a confusion set.

Type: Grant

Filed: December 10, 2002

Date of Patent: July 31, 2007

Assignee: International Business Machines Corporation

Inventors: Benoit Maison, Geoffrey G. Zweig
Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation

Patent number: 7216077

Abstract: Methods and arrangements using lattice-based information for unsupervised speaker adaptation. By performing adaptation against a word lattice, correct models are more likely to be used in estimating a transform. Further, a particular type of lattice proposed herein enables the use of a natural confidence measure given by the posterior occupancy probability of a state, that is, the statistics of a particular state will be updated with the current frame only if the a posteriori probability of the state at that particular time is greater than a predetermined threshold.

Type: Grant

Filed: September 26, 2000

Date of Patent: May 8, 2007

Assignee: International Business Machines Corporation

Inventors: Mukund Padmanabhan, George A. Saon, Geoffrey G. Zweig
Information extraction from documents with regular expression matching

Patent number: 6842796

Abstract: Techniques are provided for enumerating regularly identifiable or stereotypical phrases that people commonly use to convey particular information, and where exactly in these phrases the particular information is to be found. In one embodiment, such phrases are referred to as “regular expressions.” Using such enumerated phrases, the invention is able to automatically identify them in an input data stream and then identify and extract the particular information associated with the phrase that is being sought, e.g., important or relevant information.

Type: Grant

Filed: July 3, 2001

Date of Patent: January 11, 2005

Assignee: International Business Machines Corporation

Inventors: Geoffrey G. Zweig, Mukund Padmanabhan
Automatic construction of unique signatures and confusable sets for database access

Publication number: 20040111262

Abstract: Methods and arrangements for facilitating database access in speech recognition. A plurality of possible subsequences corresponding to a database entry are ascertained, a record of such subsequences and their correspondence to database entries is created, and either or both of the following are carried out: unique signatures are ascertained via determining whether a subsequence corresponding to a given database entry does not also correspond to at least one other database entry; and/or multiple occurrences of a given subsequence are found, with corresponding database entries being grouped into a confusion set.

Type: Application

Filed: December 10, 2002

Publication date: June 10, 2004

Applicant: IBM Corporation

Inventors: Benoit Maison, Geoffrey G. Zweig
Information extraction from documents with regular expression matching

Publication number: 20030050782

Abstract: Techniques are provided for enumerating regularly identifiable or stereotypical phrases that people commonly use to convey particular information, and where exactly in these phrases the particular information is to be found. In one embodiment, such phrases are referred to as “regular expressions.” Using such enumerated phrases, the invention is able to automatically identify them in an input data stream and then identify and extract the particular information associated with the phrase that is being sought, e.g., important or relevant information.

Type: Application

Filed: July 3, 2001

Publication date: March 13, 2003

Applicant: International Business Machines Corporation

Inventors: Geoffrey G. Zweig, Mukund Padmanabhan
Methods and apparatus for correlating biometric attributes and biometric attribute production features

Patent number: 6411933

Abstract: A method of validating production of a biometric attribute allegedly associated with a user comprises the following steps. A first signal is generated representing data associated with the biometric attribute allegedly received in association with the user. A second signal is also generated representing data associated with at least one feature detected in association with the production of the biometric attribute allegedly received from the user. Then, the first signal and the second signal are compared to determine a correlation level between the biometric attribute and the production feature, wherein the validation of the production of the biometric attribute depends on the correlation level. Accordingly, the invention serves to provide substantial assurance that the biometric attribute offered by the user has been physically generated by the user.

Type: Grant

Filed: November 22, 1999

Date of Patent: June 25, 2002

Assignee: International Business Machines Corporation

Inventors: Stephane Herman Maes, Geoffrey G. Zweig
Method for clustering closely resembling data objects

Patent number: 6349296

Abstract: A computer-implemented method determines the resemblance of data objects such as Web pages. Each data object is partitioned into a sequence of tokens. The tokens are grouped into overlapping sets of the tokens to form shingles. Each shingle is represented by a unique identification element encoded as a fingerprint. A minimum element from each of the images of the set of fingerprints associated with a document under each of a plurality of pseudo random permutations of the set of all fingerprints are selected to generate a sketch of each data object. The sketches characterize the resemblance of the data objects. The sketches can be further partitioned into a plurality of groups. Each group is fingerprinted to form a feature. Data objects that share more than a certain numbers of features are estimated to be nearly identical.

Type: Grant

Filed: August 21, 2000

Date of Patent: February 19, 2002

Assignee: AltaVista Company

Inventors: Andrei Z. Broder, Steven C. Glassman, Charles G. Nelson, Mark S. Manasse, Geoffrey G. Zweig
Method for clustering closely resembling data objects

Patent number: 6119124

Abstract: A computer-implemented method determines the resemblance of data objects such as Web pages. Each data object is partitioned into a sequence of tokens. The tokens are grouped into overlapping sets of the tokens to form shingles. Each shingle is represented by a unique identification element encoded as a fingerprint. A minimum element from each of the images of the set of fingerprints associated with a document under each of a plurality of pseudo random permutations of the set of all fingerprints are selected to generate a sketch of each data object. The sketches characterize the resemblance of the data objects. The sketches can be further partitioned into a plurality of groups. Each group is fingerprinted to form a feature. Data objects that share more than a certain numbers of features are estimated to be nearly identical.

Type: Grant

Filed: March 26, 1998

Date of Patent: September 12, 2000

Assignee: Digital Equipment Corporation

Inventors: Andrei Z. Broder, Steven C. Glassman, Charles G. Nelson, Mark S. Manasse, Geoffrey G. Zweig

prev 1 2