Patents by Inventor Yi Tay
Yi Tay has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12657436Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.Type: GrantFiled: January 29, 2024Date of Patent: June 16, 2026Assignee: Google LLCInventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
-
Patent number: 12608594Abstract: Provided are machine-learned attention models that feature omnidirectional processing, example implementations of which can be referred to as Omnidirectional Representations from Transformers (OMNINET). In example models described in the present disclosure, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in some or all of the other tokens across the entire network.Type: GrantFiled: February 4, 2022Date of Patent: April 21, 2026Assignee: GOOGLE LLCInventors: Yi Tay, Da-Cheng Juan, Dara Bahri, Donald Arthur Metzler, Jr., Jai Prakash Gupta, Mostafa Dehghani, Phillip Pham, Vamsi Krishna Aribandi, Zhen Qin
-
Patent number: 12511521Abstract: The present disclosure provides echo-attention layers, a new efficient method for increasing the expressiveness of self-attention layers without incurring significant parameter or training time costs. One intuition behind the proposed method is to learn to echo, i.e., attend once and then get N echo-ed attentions for free (or at a relatively cheap cost). As compared to stacking new layers, the proposed echoed attentions are targeted at providing similar representation power at a better cost efficiency.Type: GrantFiled: February 3, 2022Date of Patent: December 30, 2025Assignee: GOOGLE LLCInventors: Yi Tay, Donald Arthur Metzler, Jr., Dara Bahri, Mostafa Dehghani
-
Patent number: 12346793Abstract: A system for performing a machine learning task on a network input is described. The system includes one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to implement (i) multiple sorting networks in which each sorting network is configured to sort vector blocks in a sequence of vector blocks to generate a sorted sequence of vector blocks; and (ii) a sorting attention neural network configured to perform the machine learning task on the input sequence by executing multiple sorting attention mechanisms using the sorting networks.Type: GrantFiled: February 8, 2021Date of Patent: July 1, 2025Assignee: Google LLCInventors: Yi Tay, Liu Yang, Donald Arthur Metzler, Jr., Dara Bahri, Da-Cheng Juan
-
Publication number: 20250165469Abstract: Provided are systems and methods for training and/or use of a machine learning model that can directly predict one or more resources that are responsive to a query as an output of the model. In particular, the present disclosure demonstrates that information retrieval can be accomplished with a single machine learning model (e.g., that has a neural network architecture such as, for example, a Transformer architecture) in which all information about the corpus is encoded in the parameters of the model. To this end, the present disclosure introduces the Differentiable Search Index (DSI), a new paradigm that learns a query-to-result (e.g., in text-to-text format) model that will map queries (e.g., text strings) directly to relevant resource identifiers (“docids”) (e.g.Type: ApplicationFiled: February 9, 2023Publication date: May 22, 2025Inventors: Yi Tay, Vinh Quoc Tran, William Weston Cohen, Donald Arthur Metzler, JR.
-
Publication number: 20250156756Abstract: An example method for pretraining a machine-learned model is provided. The example method includes obtaining a plurality of different combinations of configuration parameters of a pretraining objective framework. The example method includes generating, using the pretraining objective framework, a plurality of corrupted training examples from one or more training examples, wherein the plurality of corrupted training examples are respectively generated according to the plurality of different combinations. The example method includes inputting the plurality of corrupted training examples into the machine-learned model, wherein the machine-learned model is configured to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples. The example method includes obtaining, from the machine-learned model, a plurality of outputs respectively generated by the machine-learned model based on the plurality of corrupted training examples.Type: ApplicationFiled: December 30, 2022Publication date: May 15, 2025Inventors: Yi Tay, Mostafa Dehghani
-
Publication number: 20240403636Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for executing and training a multi-modal, multi-task self-attention neural network.Type: ApplicationFiled: October 5, 2022Publication date: December 5, 2024Inventors: Valerii Likhosherstov, Mostafa Dehghani, Anurag Arnab, Krzysztof Marcin Choromanski, Mario Lucic, Yi Tay
-
Publication number: 20240289552Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing a machine learning task on an input sequence of characters that has a respective character at each of a plurality of character positions to generate a network output. One of the systems includes a neural network configured to perform the machine learning task, the neural network comprising a gradient-based sub-word tokenizer and an output neural network. The gradient-based sub-word tokenizer is configured to apply a learned, i.e., flexible, sub-word tokenization strategy to the input sequence of characters to generate a sequence of latent sub-word representations. The output neural network is configured to process the latent sub-word representation to generate the network output for the task.Type: ApplicationFiled: May 27, 2022Publication date: August 29, 2024Inventors: Yi Tay, Dara Bahri, Donald Arthur Metzler, Jr., Hyung Won Chung, Jai Prakash Gupta, Sebastian Nikolas Ruder, Simon Baumgartner, Vinh Quoc Tran, Zhen Qin
-
Publication number: 20240256965Abstract: An example method for training a machine-learned sequence processing model includes obtaining a plurality of training examples for training the machine-learned sequence processing model. For each respective training example of the plurality of training examples, the example method includes: obtaining a respective query associated with the respective training example; inputting the respective query to the machine-learned sequence processing model; obtaining, from the machine-learned sequence processing model a response to the respective query and a trace of intermediate states from the respective query to the response; evaluating the response using a ground truth response associated with the respective training example; evaluating the trace using a ground truth trace associated with the respective training example; and updating one or more parameters of the machine-learned sequence processing model based on the evaluation of the response and based on the evaluation of the trace.Type: ApplicationFiled: January 26, 2024Publication date: August 1, 2024Inventors: Hyung Won Chung, Barret Zoph, Dengyong Zhou, Liam Fedus, Shayne Longpre, Le Hou, Yi Tay, Jason Weng Wei, Siddhartha Brahma, Quoc V. Le
-
Publication number: 20240256964Abstract: An example method includes obtaining a pretrained machine-learned model that was initially pretrained using a pretraining dataset and further pretraining the model by generating, using a pretraining objective framework, a plurality of corrupted training examples from one or more training examples obtained from the pretraining dataset. A first set of one or more training examples can be corrupted according to a first set of configuration parameters of the pretraining objective framework. A second set can be corrupted according to a second set of configuration parameters of the pretraining objective framework. The example method includes inputting the plurality of corrupted training examples into model; obtaining from the model, a plurality of outputs respectively generated by model based on the plurality of corrupted training examples; and updating one or more parameters of model based on an evaluation of the plurality of outputs.Type: ApplicationFiled: January 26, 2024Publication date: August 1, 2024Inventors: Yi Tay, Mostafa Dehghani
-
Publication number: 20240169184Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.Type: ApplicationFiled: January 29, 2024Publication date: May 23, 2024Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, JR.
-
Patent number: 11886976Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.Type: GrantFiled: July 14, 2023Date of Patent: January 30, 2024Assignee: Google LLCInventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
-
Publication number: 20240020516Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.Type: ApplicationFiled: July 14, 2023Publication date: January 18, 2024Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
-
Publication number: 20230244938Abstract: An example method for pretraining a machine-learned model is provided. The example method includes obtaining a plurality of different combinations of configuration parameters of a pretraining objective framework. The example method includes generating, using the pretraining objective framework, a plurality of corrupted training examples from one or more training examples, wherein the plurality of corrupted training examples are respectively generated according to the plurality of different combinations. The example method includes inputting the plurality of corrupted training examples into the machine-learned model, wherein the machine-learned model is configured to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples. The example method includes obtaining, from the machine-learned model, a plurality of outputs respectively generated by the machine-learned model based on the plurality of corrupted training examples.Type: ApplicationFiled: January 27, 2023Publication date: August 3, 2023Inventors: Jason Weng Wei, Dengyong Zhou, Xuezhi Wang, Dale Eric Schuurmans, Quoc V. Le, Maarten Paul Bosma, Ed Huai-Hsin Chi, Olivier Jean Andrè Bousquet, Le Hou, Charles Aloysius Sutton, Nathanael Martin Schärli, Nathan Kemp Sekiguchi Scales, Augustus Quadrozzi Odena, Sharan Ajit Narang, Guy Gur-Ari Krakover, Aakanksha Chowdhery, David Martin Dohan, Aitor Lewkowycz, Henryk Michalewski, Jiageng Luan, David J. Bieber, Jacob Austin, Anders Johan Andreassen, Maxwell Isaac Nye, Yi Tay, Mostafa Dehghani
-
Publication number: 20220383120Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network having a plurality of network parameters. One of the methods includes obtaining an unlabeled training input from a set of unlabeled training data; processing the unlabeled training input to generate a first embedding; generating a corrupted version of the unlabeled training input, comprising determining a proper subset of the feature dimensions and, for each feature dimension that is in the proper subset of feature dimensions, applying a corruption to the respective feature in the feature dimension using one or more feature values sampled from a marginal distribution of the feature dimension as specified in the set of unlabeled training data; processing the corrupted version of the unlabeled training input to generate a second embedding; and determining an update to the current values of the plurality of network parameters.Type: ApplicationFiled: May 27, 2022Publication date: December 1, 2022Inventors: Dara Bahri, Donald Arthur Metzler, JR., Hanxi Heinrich Jiang, Yi Tay
-
Publication number: 20220245432Abstract: The present disclosure provides echo-attention layers, a new efficient method for increasing the expressiveness of self-attention layers without incurring significant parameter or training time costs. One intuition behind the proposed method is to learn to echo, i.e., attend once and then get N echo-ed attentions for free (or at a relatively cheap cost). As compared to stacking new layers, the proposed echoed attentions are targeted at providing similar representation power at a better cost efficiency.Type: ApplicationFiled: February 3, 2022Publication date: August 4, 2022Inventors: Yi Tay, Donald Arthur Metzler, JR., Dara Bahri, Mostafa Dehghani
-
Publication number: 20220245428Abstract: Provided are machine-learned attention models that feature omnidirectional processing, example implementations of which can be referred to as Omnidirectional Representations from Transformers (OMNINET). In example models described in the present disclosure, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in some or all of the other tokens across the entire network.Type: ApplicationFiled: February 4, 2022Publication date: August 4, 2022Inventors: Yi Tay, Da-Cheng Juan, Dara Bahri, Donald Arthur Metzler, JR., Jai Prakash Gupta, Mostafa Dehghani, Phillip Pham, Vamsi Krishna Aribandi, Zhen Qin
-
Publication number: 20210248450Abstract: A system for performing a machine learning task on a network input is described. The system includes one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to implement (i) multiple sorting networks in which each sorting network is configured to sort vector blocks in a sequence of vector blocks to generate a sorted sequence of vector blocks; and (ii) a sorting attention neural network configured to perform the machine learning task on the input sequence by executing multiple sorting attention mechanisms using the sorting networks.Type: ApplicationFiled: February 8, 2021Publication date: August 12, 2021Inventors: Yi Tay, Liu Yang, Donald Arthur Metzler, JR., Dara Bahri, Da-Cheng Juan