Patents by Inventor Yi Tay

Yi Tay has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12657436
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.
    Type: Grant
    Filed: January 29, 2024
    Date of Patent: June 16, 2026
    Assignee: Google LLC
    Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
  • Patent number: 12608594
    Abstract: Provided are machine-learned attention models that feature omnidirectional processing, example implementations of which can be referred to as Omnidirectional Representations from Transformers (OMNINET). In example models described in the present disclosure, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in some or all of the other tokens across the entire network.
    Type: Grant
    Filed: February 4, 2022
    Date of Patent: April 21, 2026
    Assignee: GOOGLE LLC
    Inventors: Yi Tay, Da-Cheng Juan, Dara Bahri, Donald Arthur Metzler, Jr., Jai Prakash Gupta, Mostafa Dehghani, Phillip Pham, Vamsi Krishna Aribandi, Zhen Qin
  • Patent number: 12511521
    Abstract: The present disclosure provides echo-attention layers, a new efficient method for increasing the expressiveness of self-attention layers without incurring significant parameter or training time costs. One intuition behind the proposed method is to learn to echo, i.e., attend once and then get N echo-ed attentions for free (or at a relatively cheap cost). As compared to stacking new layers, the proposed echoed attentions are targeted at providing similar representation power at a better cost efficiency.
    Type: Grant
    Filed: February 3, 2022
    Date of Patent: December 30, 2025
    Assignee: GOOGLE LLC
    Inventors: Yi Tay, Donald Arthur Metzler, Jr., Dara Bahri, Mostafa Dehghani
  • Patent number: 12346793
    Abstract: A system for performing a machine learning task on a network input is described. The system includes one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to implement (i) multiple sorting networks in which each sorting network is configured to sort vector blocks in a sequence of vector blocks to generate a sorted sequence of vector blocks; and (ii) a sorting attention neural network configured to perform the machine learning task on the input sequence by executing multiple sorting attention mechanisms using the sorting networks.
    Type: Grant
    Filed: February 8, 2021
    Date of Patent: July 1, 2025
    Assignee: Google LLC
    Inventors: Yi Tay, Liu Yang, Donald Arthur Metzler, Jr., Dara Bahri, Da-Cheng Juan
  • Publication number: 20250165469
    Abstract: Provided are systems and methods for training and/or use of a machine learning model that can directly predict one or more resources that are responsive to a query as an output of the model. In particular, the present disclosure demonstrates that information retrieval can be accomplished with a single machine learning model (e.g., that has a neural network architecture such as, for example, a Transformer architecture) in which all information about the corpus is encoded in the parameters of the model. To this end, the present disclosure introduces the Differentiable Search Index (DSI), a new paradigm that learns a query-to-result (e.g., in text-to-text format) model that will map queries (e.g., text strings) directly to relevant resource identifiers (“docids”) (e.g.
    Type: Application
    Filed: February 9, 2023
    Publication date: May 22, 2025
    Inventors: Yi Tay, Vinh Quoc Tran, William Weston Cohen, Donald Arthur Metzler, JR.
  • Publication number: 20250156756
    Abstract: An example method for pretraining a machine-learned model is provided. The example method includes obtaining a plurality of different combinations of configuration parameters of a pretraining objective framework. The example method includes generating, using the pretraining objective framework, a plurality of corrupted training examples from one or more training examples, wherein the plurality of corrupted training examples are respectively generated according to the plurality of different combinations. The example method includes inputting the plurality of corrupted training examples into the machine-learned model, wherein the machine-learned model is configured to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples. The example method includes obtaining, from the machine-learned model, a plurality of outputs respectively generated by the machine-learned model based on the plurality of corrupted training examples.
    Type: Application
    Filed: December 30, 2022
    Publication date: May 15, 2025
    Inventors: Yi Tay, Mostafa Dehghani
  • Publication number: 20240403636
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for executing and training a multi-modal, multi-task self-attention neural network.
    Type: Application
    Filed: October 5, 2022
    Publication date: December 5, 2024
    Inventors: Valerii Likhosherstov, Mostafa Dehghani, Anurag Arnab, Krzysztof Marcin Choromanski, Mario Lucic, Yi Tay
  • Publication number: 20240289552
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing a machine learning task on an input sequence of characters that has a respective character at each of a plurality of character positions to generate a network output. One of the systems includes a neural network configured to perform the machine learning task, the neural network comprising a gradient-based sub-word tokenizer and an output neural network. The gradient-based sub-word tokenizer is configured to apply a learned, i.e., flexible, sub-word tokenization strategy to the input sequence of characters to generate a sequence of latent sub-word representations. The output neural network is configured to process the latent sub-word representation to generate the network output for the task.
    Type: Application
    Filed: May 27, 2022
    Publication date: August 29, 2024
    Inventors: Yi Tay, Dara Bahri, Donald Arthur Metzler, Jr., Hyung Won Chung, Jai Prakash Gupta, Sebastian Nikolas Ruder, Simon Baumgartner, Vinh Quoc Tran, Zhen Qin
  • Publication number: 20240256965
    Abstract: An example method for training a machine-learned sequence processing model includes obtaining a plurality of training examples for training the machine-learned sequence processing model. For each respective training example of the plurality of training examples, the example method includes: obtaining a respective query associated with the respective training example; inputting the respective query to the machine-learned sequence processing model; obtaining, from the machine-learned sequence processing model a response to the respective query and a trace of intermediate states from the respective query to the response; evaluating the response using a ground truth response associated with the respective training example; evaluating the trace using a ground truth trace associated with the respective training example; and updating one or more parameters of the machine-learned sequence processing model based on the evaluation of the response and based on the evaluation of the trace.
    Type: Application
    Filed: January 26, 2024
    Publication date: August 1, 2024
    Inventors: Hyung Won Chung, Barret Zoph, Dengyong Zhou, Liam Fedus, Shayne Longpre, Le Hou, Yi Tay, Jason Weng Wei, Siddhartha Brahma, Quoc V. Le
  • Publication number: 20240256964
    Abstract: An example method includes obtaining a pretrained machine-learned model that was initially pretrained using a pretraining dataset and further pretraining the model by generating, using a pretraining objective framework, a plurality of corrupted training examples from one or more training examples obtained from the pretraining dataset. A first set of one or more training examples can be corrupted according to a first set of configuration parameters of the pretraining objective framework. A second set can be corrupted according to a second set of configuration parameters of the pretraining objective framework. The example method includes inputting the plurality of corrupted training examples into model; obtaining from the model, a plurality of outputs respectively generated by model based on the plurality of corrupted training examples; and updating one or more parameters of model based on an evaluation of the plurality of outputs.
    Type: Application
    Filed: January 26, 2024
    Publication date: August 1, 2024
    Inventors: Yi Tay, Mostafa Dehghani
  • Publication number: 20240169184
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.
    Type: Application
    Filed: January 29, 2024
    Publication date: May 23, 2024
    Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, JR.
  • Patent number: 11886976
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.
    Type: Grant
    Filed: July 14, 2023
    Date of Patent: January 30, 2024
    Assignee: Google LLC
    Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
  • Publication number: 20240020516
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.
    Type: Application
    Filed: July 14, 2023
    Publication date: January 18, 2024
    Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
  • Publication number: 20230244938
    Abstract: An example method for pretraining a machine-learned model is provided. The example method includes obtaining a plurality of different combinations of configuration parameters of a pretraining objective framework. The example method includes generating, using the pretraining objective framework, a plurality of corrupted training examples from one or more training examples, wherein the plurality of corrupted training examples are respectively generated according to the plurality of different combinations. The example method includes inputting the plurality of corrupted training examples into the machine-learned model, wherein the machine-learned model is configured to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples. The example method includes obtaining, from the machine-learned model, a plurality of outputs respectively generated by the machine-learned model based on the plurality of corrupted training examples.
    Type: Application
    Filed: January 27, 2023
    Publication date: August 3, 2023
    Inventors: Jason Weng Wei, Dengyong Zhou, Xuezhi Wang, Dale Eric Schuurmans, Quoc V. Le, Maarten Paul Bosma, Ed Huai-Hsin Chi, Olivier Jean Andrè Bousquet, Le Hou, Charles Aloysius Sutton, Nathanael Martin Schärli, Nathan Kemp Sekiguchi Scales, Augustus Quadrozzi Odena, Sharan Ajit Narang, Guy Gur-Ari Krakover, Aakanksha Chowdhery, David Martin Dohan, Aitor Lewkowycz, Henryk Michalewski, Jiageng Luan, David J. Bieber, Jacob Austin, Anders Johan Andreassen, Maxwell Isaac Nye, Yi Tay, Mostafa Dehghani
  • Publication number: 20220383120
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network having a plurality of network parameters. One of the methods includes obtaining an unlabeled training input from a set of unlabeled training data; processing the unlabeled training input to generate a first embedding; generating a corrupted version of the unlabeled training input, comprising determining a proper subset of the feature dimensions and, for each feature dimension that is in the proper subset of feature dimensions, applying a corruption to the respective feature in the feature dimension using one or more feature values sampled from a marginal distribution of the feature dimension as specified in the set of unlabeled training data; processing the corrupted version of the unlabeled training input to generate a second embedding; and determining an update to the current values of the plurality of network parameters.
    Type: Application
    Filed: May 27, 2022
    Publication date: December 1, 2022
    Inventors: Dara Bahri, Donald Arthur Metzler, JR., Hanxi Heinrich Jiang, Yi Tay
  • Publication number: 20220245432
    Abstract: The present disclosure provides echo-attention layers, a new efficient method for increasing the expressiveness of self-attention layers without incurring significant parameter or training time costs. One intuition behind the proposed method is to learn to echo, i.e., attend once and then get N echo-ed attentions for free (or at a relatively cheap cost). As compared to stacking new layers, the proposed echoed attentions are targeted at providing similar representation power at a better cost efficiency.
    Type: Application
    Filed: February 3, 2022
    Publication date: August 4, 2022
    Inventors: Yi Tay, Donald Arthur Metzler, JR., Dara Bahri, Mostafa Dehghani
  • Publication number: 20220245428
    Abstract: Provided are machine-learned attention models that feature omnidirectional processing, example implementations of which can be referred to as Omnidirectional Representations from Transformers (OMNINET). In example models described in the present disclosure, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in some or all of the other tokens across the entire network.
    Type: Application
    Filed: February 4, 2022
    Publication date: August 4, 2022
    Inventors: Yi Tay, Da-Cheng Juan, Dara Bahri, Donald Arthur Metzler, JR., Jai Prakash Gupta, Mostafa Dehghani, Phillip Pham, Vamsi Krishna Aribandi, Zhen Qin
  • Publication number: 20210248450
    Abstract: A system for performing a machine learning task on a network input is described. The system includes one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to implement (i) multiple sorting networks in which each sorting network is configured to sort vector blocks in a sequence of vector blocks to generate a sorted sequence of vector blocks; and (ii) a sorting attention neural network configured to perform the machine learning task on the input sequence by executing multiple sorting attention mechanisms using the sorting networks.
    Type: Application
    Filed: February 8, 2021
    Publication date: August 12, 2021
    Inventors: Yi Tay, Liu Yang, Donald Arthur Metzler, JR., Dara Bahri, Da-Cheng Juan