Patents by Inventor Kevin Stefan Clark

Kevin Stefan Clark has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Contrastive Pre-Training for Language Tasks

Publication number: 20240160857

Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.

Type: Application

Filed: January 25, 2024

Publication date: May 16, 2024

Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
Training machine learning models using teacher annealing

Patent number: 11922281

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a machine learning model using teacher annealing.

Type: Grant

Filed: October 31, 2022

Date of Patent: March 5, 2024

Assignee: Google LLC

Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
Contrastive pre-training for language tasks

Patent number: 11914969

Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.

Type: Grant

Filed: September 19, 2022

Date of Patent: February 27, 2024

Assignee: GOOGLE LLC

Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
TRAINING MACHINE LEARNING MODELS USING TEACHER ANNEALING

Publication number: 20230049747

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a machine learning model using teacher annealing.

Type: Application

Filed: October 31, 2022

Publication date: February 16, 2023

Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
Contrastive Pre-Training for Language Tasks

Publication number: 20230015737

Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.

Type: Application

Filed: September 19, 2022

Publication date: January 19, 2023

Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
Training machine learning models using teacher annealing

Patent number: 11488067

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a machine learning model using teacher annealing.

Type: Grant

Filed: May 11, 2020

Date of Patent: November 1, 2022

Assignee: Google LLC

Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
Contrastive pre-training for language tasks

Patent number: 11449684

Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.

Type: Grant

Filed: September 21, 2020

Date of Patent: September 20, 2022

Assignee: GOOGLE LLC

Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
Energy-Based Language Models

Publication number: 20220067304

Abstract: Systems and methods are provided for training and using energy-based language models such as cloze language models. In particular, one aspect of the present disclosure is directed to an energy-based cloze language model for representation learning over text. In some instances, the models provided herein can be referred to as the “Electric” model. Similar to the BERT model, example models proposed herein can be a conditional generative model of tokens given their contexts. However, example models proposed herein do not mask text or output a full distribution over tokens that could occur in a context. Instead, the example proposed models assign a scalar energy score to each input token. Another aspect of the present disclosure provides techniques to train the proposed models to assign low energies to data tokens and high energies to other ones using an algorithm based on noise-contrastive estimation.

Type: Application

Filed: August 27, 2021

Publication date: March 3, 2022

Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
Contrastive Pre-Training for Language Tasks

Publication number: 20210089724

Abstract: Systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. In particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. In some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.

Type: Application

Filed: September 21, 2020

Publication date: March 25, 2021

Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark
TRAINING MACHINE LEARNING MODELS USING TEACHER ANNEALING

Publication number: 20200364617

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a machine learning model using teacher annealing.

Type: Application

Filed: May 11, 2020

Publication date: November 19, 2020

Inventors: Thang Minh Luong, Quoc V. Le, Kevin Stefan Clark