Patents by Inventor Tal Schuster

Tal Schuster has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dynamic selection from among multiple candidate generative models with differing computational efficiencies

Patent number: 12664187

Abstract: Implementations disclose selecting, in response to receiving a request and from among multiple candidate generative models (e.g., multiple candidate large language models (LLMs)) with differing computational efficiencies, a particular generative model to utilize in generating a response to the request. Those implementations reduce latency and/or conserve computational resource(s) through selection, for various requests, of a more computationally efficient generative model for utilization in lieu of a less computationally efficient generative model. Further, those implementations seek to achieve such benefits, through utilization of more computationally efficient generative models, while also still selectively utilizing less computationally efficient generative models for certain requests to mitigate occurrences of a generated response being inaccurate and/or under-specified.

Type: Grant

Filed: June 19, 2023

Date of Patent: June 23, 2026

Assignee: GOOGLE LLC

Inventors: Seungyeon Kim, Ankit Singh Rawat, Wittawat Jitkrittum, Hari Narasimhan, Sashank Reddi, Neha Gupta, Srinadh Bhojanapalli, Aditya Menon, Manzil Zaheer, Tal Schuster, Sanjiv Kumar, Toby Boyd, Zhifeng Chen, Emanuel Taropa, Vikram Kasivajhula, Trevor Strohman, Martin Baeuml, Leif Schelin, Yanping Huang
Efficient decoding of output sequences using adaptive early exiting

Patent number: 12657436

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.

Type: Grant

Filed: January 29, 2024

Date of Patent: June 16, 2026

Assignee: Google LLC

Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
EFFICIENT DECODING OF OUTPUT SEQUENCES USING PARAMETER SHARING

Publication number: 20260093982

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing a machine learning task. One of the methods includes generating an output sequence by, at each of a plurality of output time steps: generating a current input sequence from at least the tokens at output time steps that precede the output time step in the output sequence; generating a respective embedding for each input in the current input sequence; and processing the respective embeddings for the inputs in the current input sequence through one or more layer blocks in the sequence of layer blocks until a termination criterion is satisfied.

Type: Application

Filed: October 1, 2025

Publication date: April 2, 2026

Inventors: Adam Joshua Fisch, Tal Schuster, Hrayr Harutyunyan, Ziwei Ji, Seungyeon Kim, Sangmin Bae
Efficient Estimation & Verification with Early Exits

Publication number: 20260073287

Abstract: One example aspect is directed to a computer-implemented method for performing model decoding with reduced latency. The method includes obtaining a pre-trained sequence processing model comprising a plurality of layers. The method includes modifying the sequence processing model to contain an adapter layer that is configured to receive and process an intermediate representation generated by a particular intermediate layer of the plurality of layers to predict an output token. The method includes training the adapter layer while holding the plurality of layers of the sequence processing model frozen. The method includes deploying the sequence processing model for speculative decoding in which the adapter layer, the particular intermediate layer, and the plurality of layers that precede the particular intermediate layer perform speculative token decoding and the plurality of layers that are subsequent to the particular intermediate layer perform token verification.

Type: Application

Filed: September 12, 2024

Publication date: March 12, 2026

Inventors: Tal Schuster, Ivan Korotkov, Ziwei Ji, Seungyeon Kim
DYNAMIC SELECTION FROM AMONG MULTIPLE CANDIDATE GENERATIVE MODELS WITH DIFFERING COMPUTATIONAL EFFICIENCIES

Publication number: 20240311405

Abstract: Implementations disclose selecting, in response to receiving a request and from among multiple candidate generative models (e.g., multiple candidate large language models (LLMs)) with differing computational efficiencies, a particular generative model to utilize in generating a response to the request. Those implementations reduce latency and/or conserve computational resource(s) through selection, for various requests, of a more computationally efficient generative model for utilization in lieu of a less computationally efficient generative model. Further, those implementations seek to achieve such benefits, through utilization of more computationally efficient generative models, while also still selectively utilizing less computationally efficient generative models for certain requests to mitigate occurrences of a generated response being inaccurate and/or under-specified.

Type: Application

Filed: June 19, 2023

Publication date: September 19, 2024

Inventors: Seungyeon Kim, Ankit Singh Rawat, Wittawat Jitkrittum, Hari Narasimhan, Sashank Reddi, Neha Gupta, Srinadh Bhojanapalli, Aditya Menon, Manzil Zaheer, Tal Schuster, Sanjiv Kumar, Toby Boyd, Zhifeng Chen, Emanuel Taropa, Vikram Kasivajhula, Trevor Strohman, Martin Baeuml, Leif Schelin, Yanping Huang
EFFICIENT DECODING OF OUTPUT SEQUENCES USING ADAPTIVE EARLY EXITING

Publication number: 20240169184

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.

Type: Application

Filed: January 29, 2024

Publication date: May 23, 2024

Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, JR.
Efficient decoding of output sequences using adaptive early exiting

Patent number: 11886976

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.

Type: Grant

Filed: July 14, 2023

Date of Patent: January 30, 2024

Assignee: Google LLC

Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
EFFICIENT DECODING OF OUTPUT SEQUENCES USING ADAPTIVE EARLY EXITING

Publication number: 20240020516

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.

Type: Application

Filed: July 14, 2023

Publication date: January 18, 2024

Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.