Patents by Inventor Joshua Timothy Ainslie

Joshua Timothy Ainslie has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

RELATIVE POSITION BIASES IN ATTENTION NEURAL NETWORKS USING FUNCTIONAL INTERPOLATION

Publication number: 20250111210

Abstract: Systems and methods for processing inputs using attention neural networks. In particular, one or more of the attention layers within the attention neural network compute relative position biases using functional interpolation.

Type: Application

Filed: September 27, 2024

Publication date: April 3, 2025

Inventors: Chong You, Guru Guruganesh, Joshua Timothy Ainslie, Manzil Zaheer, Sanjiv Kumar, Santiago Ontañón, Shanda Li, Venkata Sesha Pavana Srinadh Bhojanapalli, Sumit Sanghai
Sparse Mixer Architecture

Publication number: 20240386256

Abstract: Improved multi-layer machine learning model architectures are provided that exhibit increased accuracy, decreased training time, decreased inference compute cost, and/or increased stability while training. These improved models include a plurality of sequential layers, each layer comprising a mixing layer that feeds into a feedforward layer. These improved models achieve these benefits by ‘enhancing’ a subset of the feedforward layers with mixture-of-experts or other sparse multi-network architectures while ‘degrading’ a subset of the mixing layers to be simple linear mixing layers (e.g., that multiply inputs by one or more mixing matrices) rather than more complicated attentional mixing mechanisms (e.g., including a number of matrix multiplications, dot products, and nonlinear operations).

Type: Application

Filed: May 16, 2023

Publication date: November 21, 2024

Inventors: James Lee Thorp, Joshua Timothy Ainslie
MIXING TOKENS WITH SPECTRAL TRANSFORM

Publication number: 20230077928

Abstract: Transformer systems and methods of using such transformer systems including computer programs encoded on a computer storage medium, for performing a deep learning task on an input sequence to generate an encoded output. In one aspect, one of the transformer systems includes an encoder architecture block, comprising: a spectral transform mixing layer that receives input embeddings of input tokens and generates, as output, a spectral transform output along a sequence dimension of the input embeddings; and a feed forward layer that receives an input based on the input embeddings of input tokens and the spectral transform output and generates an output for a subsequent processing block.

Type: Application

Filed: September 14, 2021

Publication date: March 16, 2023

Inventors: James Patrick Lee-Thorp, Joshua Timothy Ainslie, Ilya Eckstein, Santiago Ontañón
ATTENTION NEURAL NETWORKS WITH SPARSE ATTENTION MECHANISMS

Publication number: 20220156553

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing network inputs using an attention neural network that has one or more sparse attention sub-layers. Each sparse attention sub-layer is configured to apply a sparse attention mechanism that attends differently for input positions that are in a first proper subset of the input positions in the input to the sub-layer than for positions that are not in the first proper subset.

Type: Application

Filed: January 31, 2022

Publication date: May 19, 2022

Inventors: Joshua Timothy Ainslie, Santiago Ontañón, Philip Pham, Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Amr Ahmed
Attention neural networks with sparse attention mechanisms

Patent number: 11238332

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing network inputs using an attention neural network that has one or more sparse attention sub-layers. Each sparse attention sub-layer is configured to apply a sparse attention mechanism that attends differently for input positions that are in a first proper subset of the input positions in the input to the sub-layer than for positions that are not in the first proper subset.

Type: Grant

Filed: June 7, 2021

Date of Patent: February 1, 2022

Assignee: Google LLC

Inventors: Joshua Timothy Ainslie, Santiago Ontañón, Philip Pham, Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Amr Ahmed
ATTENTION NEURAL NETWORKS WITH SPARSE ATTENTION MECHANISMS

Publication number: 20210383191

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing network inputs using an attention neural network that has one or more sparse attention sub-layers. Each sparse attention sub-layer is configured to apply a sparse attention mechanism that attends differently for input positions that are in a first proper subset of the input positions in the input to the sub-layer than for positions that are not in the first proper subset.

Type: Application

Filed: June 7, 2021

Publication date: December 9, 2021

Inventors: Joshua Timothy Ainslie, Santiago Ontañón, Philip Pham, Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Amr Ahmed

RELATIVE POSITION BIASES IN ATTENTION NEURAL NETWORKS USING FUNCTIONAL INTERPOLATION

Sparse Mixer Architecture

MIXING TOKENS WITH SPECTRAL TRANSFORM

ATTENTION NEURAL NETWORKS WITH SPARSE ATTENTION MECHANISMS

Attention neural networks with sparse attention mechanisms

ATTENTION NEURAL NETWORKS WITH SPARSE ATTENTION MECHANISMS