Patents by Inventor Rohan Anil

Rohan Anil has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Training of large neural networks

Patent number: 12353981

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform any one or more of a variety of machine learning tasks. For example, the neural network can be configured as a generative neural network, e.g., an autoregressive generative neural network.

Type: Grant

Filed: May 10, 2024

Date of Patent: July 8, 2025

Assignee: Google LLC

Inventors: Slav Petrov, Yonghui Wu, Andrew M. Dai, David Richard So, Dmitry Lepikhin, Erica Ann Moreira, Gaurav Mishra, Jonathan Hudson Clark, Maxim Krikun, Melvin Jose Johnson Premkumar, Nan Du, Orhan Firat, Rohan Anil, Siamak Shakeri, Xavier Garcia, Yanping Huang, Yong Cheng, Yuanzhong Xu, Yujing Zhang, Zachary Alexander Nado, Eric Jun Jie Ni, Kefan Xiao, Vladimir Feinberg, Jin Young Sohn, Aurko Roy
TRAINING NEURAL NETWORKS USING LAYERWISE FISHER APPROXIMATIONS

Publication number: 20250156716

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network using layer-wise Fisher approximations.

Type: Application

Filed: February 10, 2023

Publication date: May 15, 2025

Inventors: Ehsan Amid, Rohan Anil
MEDIA ITEM CHARACTERIZATION BASED ON MULTIMODAL EMBEDDINGS

Publication number: 20250111671

Abstract: Methods and systems for media item characterization based on multimodal embeddings are provided herein. A media item including a sequence of video frames is identified. A set of video embeddings representing visual features of the sequence of video frames is obtained. A set of audio embeddings representing audio features of the sequence of video frames is obtained. A set of audiovisual embeddings is generated based on the set of video embeddings and the set of audio embeddings. Each of the set of audiovisual embeddings represents a visual feature and an audio feature of a respective video frame of the sequence of video frames. One or more media characteristics associated with the media item are determined based on the set of audiovisual embeddings.

Type: Application

Filed: September 27, 2024

Publication date: April 3, 2025

Inventors: Tao Zhu, Jiahui Yu, Jingchen Feng, Kai Chen, Pooya Abolghasemi, Gagan Bansal, Jieren Xu, Hui Miao, Yaping Zhang, Shuchao Bi, Yonghui Wu, Claire Cui, Rohan Anil
Knowledge Distillation Via Learning to Predict Principal Components Coefficients

Publication number: 20250005453

Abstract: Provided is an approach for knowledge distillation based on exporting Principal Components approximations (e.g., Bregman representations) of one or more layer-wise representations of the teacher model. In particular, the present disclosure provides an extension to the original Bregman PCA formulation by incorporating a mean vector and orthonormalizing the principal directions with respect to the geometry of the local convex function around the mean. This extended formulation allows viewing the learned representation as a dense layer, thus casting the problem as learning the linear coefficients of the compressed examples, as the input to this layer, by the student network. Example empirical data indicates that example implementations of the approach improve performance when compared to typical teacher-student training using soft labels.

Type: Application

Filed: December 12, 2022

Publication date: January 2, 2025

Inventors: Ehsan Amid, Christopher James Fifty, Manfred Klaus Warmuth, Rohan Anil
TRAINING OF LARGE NEURAL NETWORKS

Publication number: 20240378427

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform any one or more of a variety of machine learning tasks. For example, the neural network can be configured as a generative neural network, e.g., an autoregressive generative neural network.

Type: Application

Filed: May 10, 2024

Publication date: November 14, 2024

Inventors: Slav Petrov, Yonghui Wu, Andrew M. Dai, David Richard So, Dmitry Lepikhin, Erica Ann Moreira, Gaurav Mishra, Jonathan Hudson Clark, Maxim Krikun, Melvin Jose Johnson Premkumar, Nan Du, Orhan Firat, Rohan Anil, Siamak Shakeri, Xavier Garcia, Yanping Huang, Yong Cheng, Yuanzhong Xu, Yujing Zhang, Zachary Alexander Nado, Eric Jun Jie Ni, Kefan Xiao, Vladimir Feinberg, Jin Young Sohn, Aurko Roy
TRAINING OF LARGE NEURAL NETWORKS

Publication number: 20240378441

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform any one or more of a variety of machine learning tasks. For example, the neural network can be configured as a generative neural network, e.g., an autoregressive generative neural network.

Type: Application

Filed: May 10, 2024

Publication date: November 14, 2024

Inventors: Slav Petrov, Yonghui Wu, Andrew M. Dai, David Richard So, Dmitry Lepikhin, Erica Ann Moreira, Gaurav Mishra, Jonathan Hudson Clark, Maxim Krikun, Melvin Jose Johnson Premkumar, Nan Du, Orhan Firat, Rohan Anil, Siamak Shakeri, Xavier Garcia, Yanping Huang, Yong Cheng, Yuanzhong Xu, Yujing Zhang, Zachary Alexander Nado, Eric Jun Jie Ni, Kefan Xiao, Vladimir Feinberg, Jin Young Sohn, Aurko Roy
Heterogeneous Federated Learning Via Multi-Directional Knowledge Distillation

Publication number: 20240249193

Abstract: Generally, the present disclosure is directed to enhanced federated learning (FL) that employs a set of clients with varying amounts of computational resources (e.g., system memory, storage, and processing bandwidth). To overcome limitations of conventional FL methods that employ a set of clients with varying amounts of computational resources, the embodiments run multi-directional knowledge distillation between the server models produced by each federated averaging (FedAvg) pool, using unlabeled server data as the distillation dataset. By co-distilling the two (or more) models frequently over the course of FedAvg rounds, information is shared between the pools without sharing model parameters. This leads to increased performance and faster convergence (in fewer federated rounds).

Type: Application

Filed: January 19, 2024

Publication date: July 25, 2024

Inventors: Jared Alexander Lichtarge, Rajiv Mathews, Rohan Anil, Ehsan Amid, Shankar Kumar
DECENTRALIZED LEARNING OF MACHINE LEARNING MODEL(S) THROUGH UTILIZATION OF STALE UPDATES(S) RECEIVED FROM STRAGGLER COMPUTING DEVICE(S)

Publication number: 20240095582

Abstract: During a round of decentralized learning for updating of a global machine learning (ML) model, remote processor(s) of a remote system may transmit, to a population of computing devices, primary weights for a primary version of the global ML model, and cause each of the computing devices to generate a corresponding update for the primary version of the global ML model. Further, the remote processor(s) may cause the primary version of the global ML model to be updated based on the corresponding updates that are received during the round of decentralized learning. However, the remote processor(s) may receive other corresponding updates subsequent to the round of decentralized learning. Accordingly, various techniques described herein (e.g., FARe-DUST, FeAST on MSG, and/or other techniques) enable the other corresponding updates to be utilized in achieving a final version of the global ML model.

Type: Application

Filed: December 6, 2022

Publication date: March 21, 2024

Inventors: Andrew Hard, Sean Augenstein, Rohan Anil, Rajiv Mathews, Lara McConnaughey, Ehsan Amid, Antonious Girgis
ATTENTION NEURAL NETWORKS WITH N-GRAMMER LAYERS

Publication number: 20240078379

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input to generate a network output. In one aspect, one of the systems includes a neural network configured to perform the machine learning task, the neural network comprising an N-grammer layer and an output neural network, the N-grammer layer configured to: at each of one or more heads: receive a sequence of input embeddings; generate a discrete latent representation of the sequence of input embeddings by using a learned product quantization codebook; generate a plurality of n-gram indices from the discrete latent representation; and generate a latent n-gram representation of the sequence of input embeddings; and generate a sequence of output embeddings, and the output neural network configured to: receive the sequence of output embeddings; and process the sequence of output embeddings to generate the network output.

Type: Application

Filed: September 6, 2022

Publication date: March 7, 2024

Inventors: Rohan Anil, Aurko Roy
HYBRID FEDERATED LEARNING OF MACHINE LEARNING MODEL(S)

Publication number: 20240070530

Abstract: Implementations disclosed herein are directed to a hybrid federated learning (FL) technique that utilizes both federated averaging (FA) and federated distillation (FD) during a given round of FL of a given global machine learning (ML) model. Implementations may identify a population of client devices to participate in the given round of FL, determine a corresponding quantity of instances of client data available at each of the client devices that may be utilized during the given round of FL, and select different subsets of the client devices based on the corresponding quantity of instances of client data. Further, implementations may cause a first subset of the client devices to generate a corresponding FA update and a second subset of client devices to generate a corresponding FD update. Moreover, implementations may subsequently update the given global ML model based on the corresponding FA updates and the corresponding FD updates.

Type: Application

Filed: December 5, 2022

Publication date: February 29, 2024

Inventors: Ehsan Amid, Rajiv Mathews, Rohan Anil, Shankar Kumar, Jared Lichtarge
DISTRIBUTED COMPUTING PIPELINE PROCESSING

Publication number: 20230105476

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing computational graphs on distributed computing devices.

Type: Application

Filed: March 6, 2020

Publication date: April 6, 2023

Inventors: Rohan Anil, Battulga Bayarsaikhan, Ryan P. Doherty, Emanuel Taropa
TRAINING NEURAL NETWORKS USING LAYER-WISE LOSSES

Publication number: 20220253713

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network using local layer-wise losses.

Type: Application

Filed: February 7, 2022

Publication date: August 11, 2022

Inventors: Ehsan Amid, Manfred Klaus Warmuth, Rohan Anil
WIDE AND DEEP MACHINE LEARNING MODELS

Publication number: 20200372359

Abstract: A system includes one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the computers to implement a combined machine learning model for processing an input including multiple features to generate a predicted output for the machine learning input. The combined model includes: a deep machine learning model configured to process the features to generate a deep model output; a wide machine learning model configured to process the features to generate a wide model output; and a combining layer configured to process the deep model output generated by the deep machine learning model and the wide model output generated by the wide machine learning model to generate the predicted output, in which the deep model and the wide model have been trained jointly on training data to generate the deep model output and the wide model output.

Type: Application

Filed: August 12, 2020

Publication date: November 26, 2020

Inventors: Tal Shaked, Rohan Anil, Hrishikesh Balkrishna Aradhye, Mustafa Ispir, Glen Anderson, Wei Chai, Mehmet Levent Koc, Jeremiah Joseph Harmsen, Xiaobing Liu, Gregory Sean Corrado, Tushar Deepak Chandra, Heng-Tze Cheng
Wide and deep machine learning models

Patent number: 10762422

Abstract: A system includes one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the computers to implement a combined machine learning model for processing an input including multiple features to generate a predicted output for the machine learning input. The combined model includes: a deep machine learning model configured to process the features to generate a deep model output; a wide machine learning model configured to process the features to generate a wide model output; and a combining layer configured to process the deep model output generated by the deep machine learning model and the wide model output generated by the wide machine learning model to generate the predicted output, in which the deep model and the wide model have been trained jointly on training data to generate the deep model output and the wide model output.

Type: Grant

Filed: December 29, 2016

Date of Patent: September 1, 2020

Assignee: Google LLC

Inventors: Tal Shaked, Rohan Anil, Hrishikesh Balkrishna Aradhye, Mustafa Ispir, Glen Anderson, Wei Chai, Mehmet Levent Koc, Jeremiah Harmsen, Xiaobing Liu, Gregory Sean Corrado, Tushar Deepak Chandra, Heng-Tze Cheng
IN-STORE CUSTOMER TRACKING AND ENGAGEMENT SYSTEM

Publication number: 20190019228

Abstract: An instore customer tracking and engagement system for tracking customers in a retail store is provided. The system includes a memory having computer-readable instructions stored therein. The system further includes a processor configured to access customer identification data of one or more customers visiting a retail store. The processor is further configured to determine a unique customer identity signature (UCIS) of each of the one or more customers based upon the customer identification data of the respective customer. In addition, the processor is configured to track the activity of each of the customers within the retail store using the unique customer identity signature of the respective customer. Further, the processor is configured to generate an offline clickstream for each of the one or more customers using the unique customer identity signature as the respective customer moves across the retail store. The system includes a data analytics cloud platform communicatively coupled to processor.

Type: Application

Filed: July 11, 2018

Publication date: January 17, 2019

Applicant: Capillary Technologies International Pte Ltd.

Inventors: Boddu Aneesh REDDY, Mahadar Rohan ANIL, Panda Subrat KUMAR, Banerjee SUMANDEEP
WIDE AND DEEP MACHINE LEARNING MODELS

Publication number: 20170300814

Abstract: A system includes one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the computers to implement a combined machine learning model for processing an input including multiple features to generate a predicted output for the machine learning input. The combined model includes: a deep machine learning model configured to process the features to generate a deep model output; a wide machine learning model configured to process the features to generate a wide model output; and a combining layer configured to process the deep model output generated by the deep machine learning model and the wide model output generated by the wide machine learning model to generate the predicted output, in which the deep model and the wide model have been trained jointly on training data to generate the deep model output and the wide model output.

Type: Application

Filed: December 29, 2016

Publication date: October 19, 2017

Inventors: Tal Shaked, Rohan Anil, Hrishikesh Balkrishna Aradhye, Mustafa Ispir, Glen Anderson, Wei Chai, Mehmet Levent Koc, Jeremiah Harmsen, Xiaobing Liu, Gregory Sean Corrado, Tushar Deepak Chandra, Heng-Tze Cheng