Patents by Inventor Colin Andrew Cherry

Colin Andrew Cherry has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Inverted Projection for Robust Speech Translation

Publication number: 20230021824

Abstract: The technology provides an approach to train translation models that are robust to transcription errors and punctuation errors. The approach includes introducing errors from actual automatic speech recognition and automatic punctuation systems into the source side of the machine translation training data. A method for training a machine translation model includes performing automatic speech recognition on input source audio to generate a system transcript. The method aligns a human transcript of the source audio to the system transcript, including projecting system segmentation onto the human transcript. Then the method performs segment robustness training of a machine translation model according to the aligned human and system transcripts, and performs system robustness training of the machine translation model, e.g., by injecting token errors into training data.

Type: Application

Filed: July 7, 2022

Publication date: January 26, 2023

Applicant: Google LLC

Inventors: Dirk Ryan Padfield, Colin Andrew Cherry
Re-translation for simultaneous, spoken-language machine translation

Patent number: 11562152

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-storage media, for re-translation for simultaneous, spoken-language machine translation. In some implementations, a stream of audio data comprising speech in a first language is received. A transcription for the speech in the stream of audio data is generated using an automated speech recognizer through a series of updates. A translation of the transcription into a second language is generated using a machine translation module. The translation is generated with translation iterations that translate increasing amounts of the transcription, including re-translating previously portions of the transcription. A series of translation updates are provided to a client device based on the translation iterations.

Type: Grant

Filed: September 23, 2020

Date of Patent: January 24, 2023

Assignee: Google LLC

Inventors: Naveen Arivazhagan, Colin Andrew Cherry, Wolfgang Macherey, Te I, George Foster, Pallavi N Baljekar
RE-TRANSLATION FOR SIMULTANEOUS, SPOKEN-LANGUAGE MACHINE TRANSLATION

Publication number: 20220092274

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-storage media, for re-translation for simultaneous, spoken-language machine translation. In some implementations, a stream of audio data comprising speech in a first language is received. A transcription for the speech in the stream of audio data is generated using an automated speech recognizer through a series of updates. A translation of the transcription into a second language is generated using a machine translation module. The translation is generated with translation iterations that translate increasing amounts of the transcription, including re-translating previously portions of the transcription. A series of translation updates are provided to a client device based on the translation iterations.

Type: Application

Filed: September 23, 2020

Publication date: March 24, 2022

Inventors: Naveen Arivazhagan, Colin Andrew Cherry, Wolfgang Macherey, Te I, George Foster, Pallavi N. Baljekar
Unsupervised learning using global features, including for log-linear model word segmentation

Patent number: 8909514

Abstract: Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.

Type: Grant

Filed: December 15, 2009

Date of Patent: December 9, 2014

Assignee: Microsoft Corporation

Inventors: Kristina N. Toutanova, Colin Andrew Cherry, Hoifung Poon
TRANSLITERATION USING INDICATOR AND HYBRID GENERATIVE FEATURES

Publication number: 20110218796

Abstract: Described is a transliteration engine/substring decoder that back-transliterates an input string from a source language into an output string in a target language. The transliteration engine may be based upon discriminately weighted indicator features and/or generative models in which the decoder's discriminative parameters are learned. The training data may be based on source-target pairs, which may be transformed into derivations. Features extracted from these derivations include indicator features and hybrid generative model features.

Type: Application

Filed: March 5, 2010

Publication date: September 8, 2011

Applicant: Microsoft Corporation

Inventors: Hisami Suzuki, Colin Andrew Cherry
UNSUPERVISED LEARNING USING GLOBAL FEATURES, INCLUDING FOR LOG-LINEAR MODEL WORD SEGMENTATION

Publication number: 20110144992

Abstract: Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.

Type: Application

Filed: December 15, 2009

Publication date: June 16, 2011

Applicant: Microsoft Corporation

Inventors: Kristina N. Toutanova, Colin Andrew Cherry, Hoifung Poon

Inverted Projection for Robust Speech Translation

Re-translation for simultaneous, spoken-language machine translation

RE-TRANSLATION FOR SIMULTANEOUS, SPOKEN-LANGUAGE MACHINE TRANSLATION

Unsupervised learning using global features, including for log-linear model word segmentation

TRANSLITERATION USING INDICATOR AND HYBRID GENERATIVE FEATURES

UNSUPERVISED LEARNING USING GLOBAL FEATURES, INCLUDING FOR LOG-LINEAR MODEL WORD SEGMENTATION