Patents by Inventor Noam M. Shazeer

Noam M. Shazeer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ATTENTION-BASED IMAGE GENERATION NEURAL NETWORKS

Publication number: 20250118064

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output image. In one aspect, one of the methods includes generating the output image intensity value by intensity value according to a generation order of pixel-color channel pairs from the output image, comprising, for each particular generation order position in the generation order: generating a current output image representation of a current output image, processing the current output image representation using a decoder neural network to generate a probability distribution over possible intensity values for the pixel—color channel pair at the particular generation order position, wherein the decoder neural network includes one or more local masked self-attention sub-layers; and selecting an intensity value for the pixel—color channel pair at the particular generation order position using the probability distribution.

Type: Application

Filed: October 11, 2024

Publication date: April 10, 2025

Inventors: Noam M. Shazeer, Lukasz Mieczyslaw Kaiser, Jakob D. Uszkoreit, Niki J. Parmar, Ashish Teku Vaswani
Attention-based decoder-only sequence transduction neural networks

Patent number: 12271817

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

Type: Grant

Filed: January 4, 2024

Date of Patent: April 8, 2025

Assignee: Google LLC

Inventors: Noam M. Shazeer, Lukasz Mieczyslaw Kaiser, Etienne Pot, Mohammad Saleh, Ben David Goodrich, Peter J. Liu, Ryan Sepassi
Distributing tensor computations across computing devices

Patent number: 12265903

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing tensor computations across computing devices. One of the methods includes: receiving specification data that specifies a distribution of tensor computations among a plurality of computing devices, wherein each tensor computation (i) is defined to receive, as input, one or more respective input tensors each having one or more respective input dimensions, (ii) is defined to generate, as output, one or more respective output tensors each having one or more respective output dimensions, or both, wherein the specification data specifies a respective layout for each input and output tensor that assigns each dimension of the input or output tensor to one or more of the plurality of computing devices; assigning, based on the layouts for the input and output tensors, respective device-local operations to each of the computing devices; and causing the tensor computations to be executed.

Type: Grant

Filed: October 5, 2020

Date of Patent: April 1, 2025

Assignee: Google LLC

Inventor: Noam M. Shazeer
Attention neural networks with linear units

Patent number: 12254411

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input to generate a network output. In one aspect, one of the systems includes an attention neural network configured to perform the machine learning task, the attention neural network including one or more attention layers, each attention layer comprising an attention sub-layer and a feed-forward sub-layer that applies an element-wise multiplication between two vectors generated as a result of two different linear transformations performed on the same attended layer input.

Type: Grant

Filed: February 12, 2021

Date of Patent: March 18, 2025

Assignee: Google LLC

Inventor: Noam M. Shazeer
NEURAL NETWORKS WITH SWITCH LAYERS

Publication number: 20250053815

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input to generate a network output. In one aspect, one of the systems includes a neural network configured to perform the machine learning task, the neural network including one or more switch layers.

Type: Application

Filed: August 15, 2024

Publication date: February 13, 2025

Inventors: William Bradley Fedus, Barret Zoph, Noam M. Shazeer
Attention-based sequence transduction neural networks

Patent number: 12217173

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. In one aspect, one of the systems includes an encoder neural network configured to receive the input sequence and generate encoded representations of the network inputs, the encoder neural network comprising a sequence of one or more encoder subnetworks, each encoder subnetwork configured to receive a respective encoder subnetwork input for each of the input positions and to generate a respective subnetwork output for each of the input positions, and each encoder subnetwork comprising: an encoder self-attention sub-layer that is configured to receive the subnetwork input for each of the input positions and, for each particular input position in the input order: apply an attention mechanism over the encoder subnetwork inputs using one or more queries derived from the encoder subnetwork input at the particular input position.

Type: Grant

Filed: September 3, 2021

Date of Patent: February 4, 2025

Assignee: Google LLC

Inventors: Noam M. Shazeer, Aidan Nicholas Gomez, Lukasz Mieczyslaw Kaiser, Jakob D. Uszkoreit, Llion Owen Jones, Niki J. Parmar, Illia Polosukhin, Ashish Teku Vaswani
USING LARGE LANGUAGE MODEL(S) IN GENERATING AUTOMATED ASSISTANT RESPONSE(S)

Publication number: 20250037711

Abstract: As part of a dialog session between a user and an automated assistant, implementations can receive a stream of audio data that captures a spoken utterance including an assistant query, determine, based on processing the stream of audio data, a set of assistant outputs that are each predicted to be responsive to the assistant query, process, using large language model (LLM) output(s), the assistant outputs and context of the dialog session to generate a set of modified assistant outputs, and cause given modified assistant output, from among the set of modified assistant outputs, to be provided for presentation to the user in response to the spoken utterance. In some implementations, the LLM output(s) can be generated in an offline manner for subsequent use in an online manner. In additional or alternative implementations, the LLM output(s) can be generated in an online manner when the spoken utterance is received.

Type: Application

Filed: October 10, 2024

Publication date: January 30, 2025

Inventors: Martin Baeuml, Thushan Amarasiriwardena, Roberto Pieraccini, Vikram Sridar, Daniel De Freitas Adiwardana, Noam M. Shazeer, Quoc Le
MIXTURE OF EXPERTS NEURAL NETWORKS

Publication number: 20250021799

Abstract: A system includes a neural network that includes a Mixture of Experts (MoE) subnetwork between a first neural network layer and a second neural network layer. The MoE subnetwork includes multiple expert neural networks. Each expert neural network is configured to process a first layer output generated by the first neural network layer to generate a respective expert output. The MoE subnetwork further includes a gating subsystem that selects, based on the first layer output, one or more of the expert neural networks and determine a respective weight for each selected expert neural network, provides the first layer output as input to each of the selected expert neural networks, combines the expert outputs generated by the selected expert neural networks in accordance with the weights for the selected expert neural networks to generate an MoE output, and provides the MoE output as input to the second neural network layer.

Type: Application

Filed: July 18, 2024

Publication date: January 16, 2025

Inventors: Noam M. Shazeer, Azalia Mirhoseini, Krzysztof Stanislaw Maziarz
GRANULAR NEURAL NETWORK ARCHITECTURE SEARCH OVER LOW-LEVEL PRIMITIVES

Publication number: 20240428071

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing a machine learning task on a network input to generate a network output. One of the systems includes an attention neural network configured to perform the machine learning task. The attention neural network includes one or more attentions layers that each include a squared ReLU activation layer, a depth-wise convolution layer, or both.

Type: Application

Filed: September 3, 2024

Publication date: December 26, 2024

Inventors: David Richard So, Quoc V. Le, Hanxiao Liu, Wojciech Andrzej Manke, Zihang Dai, Noam M. Shazeer
EVALUATING OUTPUT SEQUENCES USING AN AUTO-REGRESSIVE LANGUAGE MODEL NEURAL NETWORK

Publication number: 20240403639

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating candidate output sequences using language model neural networks. In particular, an auto-regressive language model neural network is used to generate a candidate output sequence. The same auto-regressive language model neural network is used to evaluate the candidate output sequence to determine rating scores for each of one or more criteria. The rating score(s) are then used to determine whether to provide the candidate output sequence.

Type: Application

Filed: August 8, 2024

Publication date: December 5, 2024

Inventors: Daniel De Freitas Adiwardana, Noam M. Shazeer
Using large language model(s) in generating automated assistant response(s

Patent number: 12148421

Abstract: As part of a dialog session between a user and an automated assistant, implementations can receive a stream of audio data that captures a spoken utterance including an assistant query, determine, based on processing the stream of audio data, a set of assistant outputs that are each predicted to be responsive to the assistant query, process, using large language model (LLM) output(s), the assistant outputs and context of the dialog session to generate a set of modified assistant outputs, and cause given modified assistant output, from among the set of modified assistant outputs, to be provided for presentation to the user in response to the spoken utterance. In some implementations, the LLM output(s) can be generated in an offline manner for subsequent use in an online manner. In additional or alternative implementations, the LLM output(s) can be generated in an online manner when the spoken utterance is received.

Type: Grant

Filed: November 22, 2021

Date of Patent: November 19, 2024

Assignee: GOOGLE LLC

Inventors: Martin Baeuml, Thushan Amarasiriwardena, Roberto Pieraccini, Vikram Sridar, Daniel De Freitas Adiwardana, Noam M. Shazeer, Quoc Le
Attention-based image generation neural networks

Patent number: 12142034

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output image. In one aspect, one of the methods includes generating the output image intensity value by intensity value according to a generation order of pixel-color channel pairs from the output image, comprising, for each particular generation order position in the generation order: generating a current output image representation of a current output image, processing the current output image representation using a decoder neural network to generate a probability distribution over possible intensity values for the pixel-color channel pair at the particular generation order position, wherein the decoder neural network includes one or more local masked self-attention sub-layers; and selecting an intensity value for the pixel-color channel pair at the particular generation order position using the probability distribution.

Type: Grant

Filed: November 8, 2023

Date of Patent: November 12, 2024

Assignee: Google LLC

Inventors: Noam M. Shazeer, Lukasz Mieczyslaw Kaiser, Jakob D. Uszkoreit, Niki J. Parmar, Ashish Teku Vaswani
Speech recognition with attention-based recurrent neural networks

Patent number: 12100391

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.

Type: Grant

Filed: October 7, 2021

Date of Patent: September 24, 2024

Assignee: Google LLC

Inventors: William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Noam M. Shazeer
Neural networks with switch layers

Patent number: 12093829

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input to generate a network output. In one aspect, one of the systems includes a neural network configured to perform the machine learning task, the neural network including one or more switch layers.

Type: Grant

Filed: July 7, 2023

Date of Patent: September 17, 2024

Assignee: Google LLC

Inventors: William Bradley Fedus, Barret Zoph, Noam M. Shazeer
Evaluating output sequences using an auto-regressive language model neural network

Patent number: 12086713

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating candidate output sequences using language model neural networks. In particular, an auto-regressive language model neural network is used to generate a candidate output sequence. The same auto-regressive language model neural network is used to evaluate the candidate output sequence to determine rating scores for each of one or more criteria. The rating score(s) are then used to determine whether to provide the candidate output sequence.

Type: Grant

Filed: July 28, 2022

Date of Patent: September 10, 2024

Assignee: Google LLC

Inventors: Daniel De Freitas Adiwardana, Noam M. Shazeer
Mixture of experts neural networks

Patent number: 12067476

Abstract: A system includes a neural network that includes a Mixture of Experts (MoE) subnetwork between a first neural network layer and a second neural network layer. The MoE subnetwork includes multiple expert neural networks. Each expert neural network is configured to process a first layer output generated by the first neural network layer to generate a respective expert output. The MoE subnetwork further includes a gating subsystem that selects, based on the first layer output, one or more of the expert neural networks and determine a respective weight for each selected expert neural network, provides the first layer output as input to each of the selected expert neural networks, combines the expert outputs generated by the selected expert neural networks in accordance with the weights for the selected expert neural networks to generate an MoE output, and provides the MoE output as input to the second neural network layer.

Type: Grant

Filed: September 8, 2023

Date of Patent: August 20, 2024

Assignee: Google LLC

Inventors: Noam M. Shazeer, Azalia Mirhoseini, Krzysztof Stanislaw Maziarz
ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

Publication number: 20240256859

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

Type: Application

Filed: January 4, 2024

Publication date: August 1, 2024

Inventors: Noam M. Shazeer, Lukasz Mieczyslaw Kaiser, Etienne Pot, Mohammad Saleh, Ben David Goodrich, Peter J. Liu, Ryan Sepassi
ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

Publication number: 20240220796

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

Type: Application

Filed: January 4, 2024

Publication date: July 4, 2024

Inventors: Noam M. Shazeer, Lukasz Mieczyslaw Kaiser, Etienne Pot, Mohammad Saleh, Ben David Goodrich, Peter J. Liu, Ryan Sepassi
ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

Publication number: 20240211752

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

Type: Application

Filed: January 4, 2024

Publication date: June 27, 2024

Inventors: Noam M. Shazeer, Lukasz Mieczyslaw Kaiser, Etienne Pot, Mohammad Saleh, Ben David Goodrich, Peter J. Liu, Ryan Sepassi
ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

Publication number: 20240211751

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

Type: Application

Filed: January 4, 2024

Publication date: June 27, 2024

Inventors: Noam M. Shazeer, Lukasz Mieczyslaw Kaiser, Etienne Pot, Mohammad Saleh, Ben David Goodrich, Peter J. Liu, Ryan Sepassi

1 2 3 4 5 … next