Patents by Inventor Tal Schuster

Tal Schuster has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12657436
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.
    Type: Grant
    Filed: January 29, 2024
    Date of Patent: June 16, 2026
    Assignee: Google LLC
    Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
  • Publication number: 20260093982
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing a machine learning task. One of the methods includes generating an output sequence by, at each of a plurality of output time steps: generating a current input sequence from at least the tokens at output time steps that precede the output time step in the output sequence; generating a respective embedding for each input in the current input sequence; and processing the respective embeddings for the inputs in the current input sequence through one or more layer blocks in the sequence of layer blocks until a termination criterion is satisfied.
    Type: Application
    Filed: October 1, 2025
    Publication date: April 2, 2026
    Inventors: Adam Joshua Fisch, Tal Schuster, Hrayr Harutyunyan, Ziwei Ji, Seungyeon Kim, Sangmin Bae
  • Publication number: 20260073287
    Abstract: One example aspect is directed to a computer-implemented method for performing model decoding with reduced latency. The method includes obtaining a pre-trained sequence processing model comprising a plurality of layers. The method includes modifying the sequence processing model to contain an adapter layer that is configured to receive and process an intermediate representation generated by a particular intermediate layer of the plurality of layers to predict an output token. The method includes training the adapter layer while holding the plurality of layers of the sequence processing model frozen. The method includes deploying the sequence processing model for speculative decoding in which the adapter layer, the particular intermediate layer, and the plurality of layers that precede the particular intermediate layer perform speculative token decoding and the plurality of layers that are subsequent to the particular intermediate layer perform token verification.
    Type: Application
    Filed: September 12, 2024
    Publication date: March 12, 2026
    Inventors: Tal Schuster, Ivan Korotkov, Ziwei Ji, Seungyeon Kim
  • Publication number: 20240311405
    Abstract: Implementations disclose selecting, in response to receiving a request and from among multiple candidate generative models (e.g., multiple candidate large language models (LLMs)) with differing computational efficiencies, a particular generative model to utilize in generating a response to the request. Those implementations reduce latency and/or conserve computational resource(s) through selection, for various requests, of a more computationally efficient generative model for utilization in lieu of a less computationally efficient generative model. Further, those implementations seek to achieve such benefits, through utilization of more computationally efficient generative models, while also still selectively utilizing less computationally efficient generative models for certain requests to mitigate occurrences of a generated response being inaccurate and/or under-specified.
    Type: Application
    Filed: June 19, 2023
    Publication date: September 19, 2024
    Inventors: Seungyeon Kim, Ankit Singh Rawat, Wittawat Jitkrittum, Hari Narasimhan, Sashank Reddi, Neha Gupta, Srinadh Bhojanapalli, Aditya Menon, Manzil Zaheer, Tal Schuster, Sanjiv Kumar, Toby Boyd, Zhifeng Chen, Emanuel Taropa, Vikram Kasivajhula, Trevor Strohman, Martin Baeuml, Leif Schelin, Yanping Huang
  • Publication number: 20240169184
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.
    Type: Application
    Filed: January 29, 2024
    Publication date: May 23, 2024
    Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, JR.
  • Patent number: 11886976
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.
    Type: Grant
    Filed: July 14, 2023
    Date of Patent: January 30, 2024
    Assignee: Google LLC
    Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
  • Publication number: 20240020516
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.
    Type: Application
    Filed: July 14, 2023
    Publication date: January 18, 2024
    Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.