Patents by Inventor Kevin Stefan Clark

Kevin Stefan Clark has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240160857
    Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.
    Type: Application
    Filed: January 25, 2024
    Publication date: May 16, 2024
    Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
  • Patent number: 11922281
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a machine learning model using teacher annealing.
    Type: Grant
    Filed: October 31, 2022
    Date of Patent: March 5, 2024
    Assignee: Google LLC
    Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
  • Patent number: 11914969
    Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.
    Type: Grant
    Filed: September 19, 2022
    Date of Patent: February 27, 2024
    Assignee: GOOGLE LLC
    Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
  • Publication number: 20230049747
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a machine learning model using teacher annealing.
    Type: Application
    Filed: October 31, 2022
    Publication date: February 16, 2023
    Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
  • Publication number: 20230015737
    Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.
    Type: Application
    Filed: September 19, 2022
    Publication date: January 19, 2023
    Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
  • Patent number: 11488067
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a machine learning model using teacher annealing.
    Type: Grant
    Filed: May 11, 2020
    Date of Patent: November 1, 2022
    Assignee: Google LLC
    Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
  • Patent number: 11449684
    Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.
    Type: Grant
    Filed: September 21, 2020
    Date of Patent: September 20, 2022
    Assignee: GOOGLE LLC
    Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
  • Publication number: 20220067304
    Abstract: Systems and methods are provided for training and using energy-based language models such as cloze language models. In particular, one aspect of the present disclosure is directed to an energy-based cloze language model for representation learning over text. In some instances, the models provided herein can be referred to as the “Electric” model. Similar to the BERT model, example models proposed herein can be a conditional generative model of tokens given their contexts. However, example models proposed herein do not mask text or output a full distribution over tokens that could occur in a context. Instead, the example proposed models assign a scalar energy score to each input token. Another aspect of the present disclosure provides techniques to train the proposed models to assign low energies to data tokens and high energies to other ones using an algorithm based on noise-contrastive estimation.
    Type: Application
    Filed: August 27, 2021
    Publication date: March 3, 2022
    Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
  • Publication number: 20210089724
    Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.
    Type: Application
    Filed: September 21, 2020
    Publication date: March 25, 2021
    Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
  • Publication number: 20200364617
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a machine learning model using teacher annealing.
    Type: Application
    Filed: May 11, 2020
    Publication date: November 19, 2020
    Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark