Patents by Inventor Colin Andrew Cherry

Colin Andrew Cherry has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230021824
    Abstract: The technology provides an approach to train translation models that are robust to transcription errors and punctuation errors. The approach includes introducing errors from actual automatic speech recognition and automatic punctuation systems into the source side of the machine translation training data. A method for training a machine translation model includes performing automatic speech recognition on input source audio to generate a system transcript. The method aligns a human transcript of the source audio to the system transcript, including projecting system segmentation onto the human transcript. Then the method performs segment robustness training of a machine translation model according to the aligned human and system transcripts, and performs system robustness training of the machine translation model, e.g., by injecting token errors into training data.
    Type: Application
    Filed: July 7, 2022
    Publication date: January 26, 2023
    Applicant: Google LLC
    Inventors: Dirk Ryan Padfield, Colin Andrew Cherry
  • Patent number: 11562152
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-storage media, for re-translation for simultaneous, spoken-language machine translation. In some implementations, a stream of audio data comprising speech in a first language is received. A transcription for the speech in the stream of audio data is generated using an automated speech recognizer through a series of updates. A translation of the transcription into a second language is generated using a machine translation module. The translation is generated with translation iterations that translate increasing amounts of the transcription, including re-translating previously portions of the transcription. A series of translation updates are provided to a client device based on the translation iterations.
    Type: Grant
    Filed: September 23, 2020
    Date of Patent: January 24, 2023
    Assignee: Google LLC
    Inventors: Naveen Arivazhagan, Colin Andrew Cherry, Wolfgang Macherey, Te I, George Foster, Pallavi N Baljekar
  • Publication number: 20220092274
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-storage media, for re-translation for simultaneous, spoken-language machine translation. In some implementations, a stream of audio data comprising speech in a first language is received. A transcription for the speech in the stream of audio data is generated using an automated speech recognizer through a series of updates. A translation of the transcription into a second language is generated using a machine translation module. The translation is generated with translation iterations that translate increasing amounts of the transcription, including re-translating previously portions of the transcription. A series of translation updates are provided to a client device based on the translation iterations.
    Type: Application
    Filed: September 23, 2020
    Publication date: March 24, 2022
    Inventors: Naveen Arivazhagan, Colin Andrew Cherry, Wolfgang Macherey, Te I, George Foster, Pallavi N. Baljekar
  • Patent number: 8909514
    Abstract: Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.
    Type: Grant
    Filed: December 15, 2009
    Date of Patent: December 9, 2014
    Assignee: Microsoft Corporation
    Inventors: Kristina N. Toutanova, Colin Andrew Cherry, Hoifung Poon
  • Publication number: 20110218796
    Abstract: Described is a transliteration engine/substring decoder that back-transliterates an input string from a source language into an output string in a target language. The transliteration engine may be based upon discriminately weighted indicator features and/or generative models in which the decoder's discriminative parameters are learned. The training data may be based on source-target pairs, which may be transformed into derivations. Features extracted from these derivations include indicator features and hybrid generative model features.
    Type: Application
    Filed: March 5, 2010
    Publication date: September 8, 2011
    Applicant: Microsoft Corporation
    Inventors: Hisami Suzuki, Colin Andrew Cherry
  • Publication number: 20110144992
    Abstract: Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Applicant: Microsoft Corporation
    Inventors: Kristina N. Toutanova, Colin Andrew Cherry, Hoifung Poon