Patents by Inventor Gil Shamir

Gil Shamir has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DIRECT POSTERIOR PREFERENCE FINE-TUNING

Publication number: 20250252292

Abstract: Provided is a methodology for direct supervised preference fine-tuning of sequence processing models such as, for example, so-called large language models (LLMs) and large multimodal models (LMMs). The proposed approaches can fine-tune the model to directly predict the posterior token probabilities conditioned on a positive preference of the sequence for which the token is the last token on a sequence of tokens that are the prefix to the sequence. This method offers a simpler fine-tuning approach that directly generates the desired posteriors for use in decoding, without requiring additional inference per vocabulary token at decoding time.

Type: Application

Filed: February 6, 2024

Publication date: August 7, 2025

Inventors: Gil Shamir, Yonghui Wu
Forecasting uncertainty in machine learning models

Patent number: 12380683

Abstract: Provided are systems and methods for generating a score for any model which can be updated online, regardless of model type architecture and parameters, leveraging relations between regret and uncertainty.

Type: Grant

Filed: November 17, 2022

Date of Patent: August 5, 2025

Assignee: GOOGLE LLC

Inventor: Gil Shamir
Cross-list learning to rank

Patent number: 12314275

Abstract: Provided are systems and methods that perform learning to rank using training data for two or more different training lists. Specifically, a training dataset can include a number of training examples. Each training example can include a query and a plurality of items that are potentially responsive to the query. The ranking model can be trained using pairs of items taken from two different training examples.

Type: Grant

Filed: August 14, 2023

Date of Patent: May 27, 2025

Assignee: GOOGLE LLC

Inventor: Gil Shamir
MACHINE LEARNING RANKING DISTILLATION

Publication number: 20250077934

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium for training and using distilled machine learning models. In one aspect, a method includes obtaining a first input that includes training example sets that each include one or more feature values and, for each item, an outcome label that represents whether the item had a positive outcome. A first machine learning model is trained using the first input and is configured to generate a set of scores that represents whether the item will have a positive outcome when presented in the context of the training example set and with each other item in the example set. A distilled machine learning model is trained using the set of scores for each example set. The distilled machine learning model is configured to generate a distilled score.

Type: Application

Filed: September 23, 2022

Publication date: March 6, 2025

Inventors: Gil Shamir, Zhuoshu Li
Cross-List Learning to Rank

Publication number: 20250061117

Abstract: Provided are systems and methods that perform learning to rank using training data for two or more different training lists. Specifically, a training dataset can include a number of training examples. Each training example can include a query and a plurality of items that are potentially responsive to the query. The ranking model can be trained using pairs of items taken from two different training examples.

Type: Application

Filed: August 14, 2023

Publication date: February 20, 2025

Inventor: Gil Shamir
Minimum Deep Learning with Gating Multiplier

Publication number: 20250028966

Abstract: Systems and methods according to the present disclosure can employ a computer-implemented method for inference using a machine-learned model. The method can be implemented by a computing system having one or more computing devices. The method can include obtaining data descriptive of a neural network including one or more network units and one or more gating paths, wherein each of the gating path(s) includes one or more gating units. The method can include obtaining data descriptive of one or more input features. The method can include determining one or more network unit outputs from the network unit(s) based at least in part on the input feature(s). The method can include determining one or more gating values from the gating path(s). The method can include determining one or more gated network unit outputs based at least in part on a combination of the network unit output(s) and the gating value(s).

Type: Application

Filed: October 3, 2024

Publication date: January 23, 2025

Inventor: Gil Shamir
Minimum deep learning with gating multiplier

Patent number: 12141703

Abstract: Systems and methods according to the present disclosure can employ a computer-implemented method for inference using a machine-learned model. The method can be implemented by a computing system having one or more computing devices. The method can include obtaining data descriptive of a neural network including one or more network units and one or more gating paths, wherein each of the gating path(s) includes one or more gating units. The method can include obtaining data descriptive of one or more input features. The method can include determining one or more network unit outputs from the network unit(s) based at least in part on the input feature(s). The method can include determining one or more gating values from the gating path(s). The method can include determining one or more gated network unit outputs based at least in part on a combination of the network unit output(s) and the gating value(s).

Type: Grant

Filed: September 14, 2023

Date of Patent: November 12, 2024

Assignee: GOOGLE LLC

Inventor: Gil Shamir
MACHINE LEARNING RANK AND PREDICTION CALIBRATION

Publication number: 20240242106

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium for training and using machine learning (ML) models. In one aspect, a method includes receiving a digital component request. A first ML model can output scores indicating a likelihood of a positive outcome for digital components. Input data can be provided to a second ML model and can include feature values for a subset of digital components that were selected based on the output scores. The second ML model can be trained to output an engagement predictions and/or ranking of digital components based at least in part on feature values of digital components that will be provided together as recommendations, and can produce a second output that includes ranking and engagement predictions of the digital components in the subset of digital components. At least one digital component can be provided based on the second output.

Type: Application

Filed: September 23, 2022

Publication date: July 18, 2024

Inventors: Gil Shamir, Zhuoshu Li
Forecasting Uncertainty in Machine Learning Models

Publication number: 20240169707

Abstract: Provided are systems and methods for generating a score for any model which can be updated online, regardless of model type architecture and parameters, leveraging relations between regret and uncertainty.

Type: Application

Filed: November 17, 2022

Publication date: May 23, 2024

Inventor: Gil Shamir
Minimum Deep Learning with Gating Multiplier

Publication number: 20240005166

Abstract: Systems and methods according to the present disclosure can employ a computer-implemented method for inference using a machine-learned model. The method can be implemented by a computing system having one or more computing devices. The method can include obtaining data descriptive of a neural network including one or more network units and one or more gating paths, wherein each of the gating path(s) includes one or more gating units. The method can include obtaining data descriptive of one or more input features. The method can include determining one or more network unit outputs from the network unit(s) based at least in part on the input feature(s). The method can include determining one or more gating values from the gating path(s). The method can include determining one or more gated network unit outputs based at least in part on a combination of the network unit output(s) and the gating value(s).

Type: Application

Filed: September 14, 2023

Publication date: January 4, 2024

Inventor: Gil Shamir
Minimum deep learning with gating multiplier

Patent number: 11790236

Abstract: Systems and methods according to the present disclosure can employ a computer-implemented method for inference using a machine-learned model. The method can be implemented by a computing system having one or more computing devices. The method can include obtaining data descriptive of a neural network including one or more network units and one or more gating paths, wherein each of the gating path(s) includes one or more gating units. The method can include obtaining data descriptive of one or more input features. The method can include determining one or more network unit outputs from the network unit(s) based at least in part on the input feature(s). The method can include determining one or more gating values from the gating path(s). The method can include determining one or more gated network unit outputs based at least in part on a combination of the network unit output(s) and the gating value(s).

Type: Grant

Filed: March 4, 2020

Date of Patent: October 17, 2023

Assignee: GOOGLE LLC

Inventor: Gil Shamir
TRAINING MACHINE LEARNING MODELS USING QUANTILE AND MEDIAN RANKING DISTILLATION

Publication number: 20230252281

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that obtain a first machine learning model that is configured to output a score. The training examples can each include feature values that represent features of an item, and an outcome label for the item. From the training examples, training pairs of training examples are determined. For each training pair: (i) a score is generated for each training example in the training pair using the first machine learning model; and (ii) for the training pair, a score difference of the scores generated for the training examples in the training pair is determined. Using the training pairs and the score differences, a second machine learning model is trained to produce score differences that, for the same training examples, are within a threshold value of the score differences produced by the first machine learning model.

Type: Application

Filed: June 2, 2022

Publication date: August 10, 2023

Inventors: Gil Shamir, Zhuoshu Li
Asymmetric functionality activation for improved stability in neural networks

Patent number: 11475309

Abstract: Thus, aspects of the present disclosure address model “blow up” by changing the functionality of the activation, thereby providing “dead” or “dying” neurons with the ability to recover from this situation. As one example, for activation functions that have an input region in which the neuron is turned off by a 0 or close to 0 gradient, a training computing system can keep the neuron turned off when the gradient pushes the unit farther into the region (e.g., by applying an update with zero or reduced magnitude). However, if the gradient for the current training example (or batch) attempts to push the unit towards a region in which the neuron is active again, the system can allow for a non-zero gradient (e.g., by applying an update with standard or increased magnitude).

Type: Grant

Filed: April 14, 2020

Date of Patent: October 18, 2022

Assignee: GOOGLE LLC

Inventor: Gil Shamir
Approximate Bayesian Logistic Regression For Sparse Online Learning

Publication number: 20220108219

Abstract: Systems and methods leverage low complexity (e.g., linear overall, fixed per example) analytical approximations to perform machine learning problems such as, for example, the sparse online logistic regression problem. Unlike variational inference and other methods, the proposed systems and methods lead to analytical closed forms, lowering the practical number of computations. Further, unlike techniques used for dense features sets, such as Gaussian Mixtures, the proposed systems and methods allow for sparse problems with huge feature sets without increasing complexity. With the analytical closed forms, there is also no need for applying stochastic gradient methods on surrogate losses, and for tuning and balancing learning and regularization parameters of such methods.

Type: Application

Filed: October 1, 2021

Publication date: April 7, 2022

Inventors: Gil Shamir, Wojciech Szpankowski
Asymmetric Functionality Activation for Improved Stability in Neural Networks

Publication number: 20210319320

Abstract: Thus, aspects of the present disclosure address model “blow up” by changing the functionality of the activation, thereby providing “dead” or “dying” neurons with the ability to recover from this situation. As one example, for activation functions that have an input region in which the neuron is turned off by a 0 or close to 0 gradient, a training computing system can keep the neuron turned off when the gradient pushes the unit farther into the region (e.g., by applying an update with zero or reduced magnitude). However, if the gradient for the current training example (or batch) attempts to push the unit towards a region in which the neuron is active again, the system can allow for a non-zero gradient (e.g., by applying an update with standard or increased magnitude).

Type: Application

Filed: April 14, 2020

Publication date: October 14, 2021

Inventor: Gil Shamir
Minimum Deep Learning with Gating Multiplier

Publication number: 20210279591

Abstract: Systems and methods according to the present disclosure can employ a computer-implemented method for inference using a machine-learned model. The method can be implemented by a computing system having one or more computing devices. The method can include obtaining data descriptive of a neural network including one or more network units and one or more gating paths, wherein each of the gating path(s) includes one or more gating units. The method can include obtaining data descriptive of one or more input features. The method can include determining one or more network unit outputs from the network unit(s) based at least in part on the input feature(s). The method can include determining one or more gating values from the gating path(s). The method can include determining one or more gated network unit outputs based at least in part on a combination of the network unit output(s) and the gating value(s).

Type: Application

Filed: March 4, 2020

Publication date: September 9, 2021

Inventor: Gil Shamir
Distilling from Ensembles to Improve Reproducibility of Neural Networks

Publication number: 20210158156

Abstract: Systems and methods can improve the reproducibility of neural networks by distilling from ensembles. In particular, aspects of the present disclosure are directed to a training scheme that utilizes a combination of an ensemble of neural networks and a single, “wide” neural network that is more powerful (e.g., exhibits a greater accuracy) than the ensemble. Specifically, the output of the ensemble can be distilled into the single neural network during training of the single neural network. After training, the single neural network can be deployed to generate inferences. In such fashion, the single neural model can provide a superior prediction accuracy while, during training, the ensemble can serve to influence the single neural network to be more reproducible. In addition, an additional single wide tower can be added to generate another output, that can be distilled to the single neural network, to further improve its accuracy.

Type: Application

Filed: September 18, 2020

Publication date: May 27, 2021

Inventors: Gil Shamir, Lorenzo Coviello
Smooth Continuous Piecewise Constructed Activation Functions

Publication number: 20210133565

Abstract: Aspects of the present disclosure are directed to novel activation functions which enable improved reproducibility and accuracy tradeoffs in neural networks. In particular, the present disclosure provides a family of activation functions that, on one hand, are smooth with continuous gradient and optionally monotonic but, on the other hand, also mimic the mathematical behavior of a Rectified Linear Unit (ReLU). As examples, the activation functions described herein include a smooth rectified linear unit function and also a leaky version of such function. In various implementations, the proposed functions can provide both a complete stop region and a constant positive gradient (e.g., that can be 1) pass region like a ReLU, thereby matching accuracy performance of a ReLU. Additional implementations include a leaky version and/or functions that feature different constant gradients in the pass region.

Type: Application

Filed: June 16, 2020

Publication date: May 6, 2021

Inventors: Gil Shamir, Dong Lin, Sergey Ioffe
Regularization of machine learning models

Patent number: 10600000

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage medium, for regularizing feature weights maintained by a machine learning model. The method includes actions of obtaining a set of training data that includes multiple training feature vectors, and training the machine learning model on each of the training feature vectors, comprising, for each feature vector and for each of a plurality of the features of the feature vector: determining a first loss for the feature vector with the feature, determining a second loss for the feature vector without the feature, and updating a current benefit score for the feature using the first loss and the second loss, wherein the benefit score for the feature is indicative of the usefulness of the feature in generating accurate predicted outcomes for training feature vectors.

Type: Grant

Filed: December 2, 2016

Date of Patent: March 24, 2020

Assignee: Google LLC

Inventor: Gil Shamir
Systems and Methods for Improved Generalization, Reproducibility, and Stabilization of Neural Networks via Error Control Code Constraints

Publication number: 20190258936

Abstract: The present disclosure relates generally to machine learning. More particularly, the present disclosure relates to systems and methods for improved generalization, reproducibility, and stabilization of neural networks via the application of error control, modulation, and/or lattice code constraints during training.

Type: Application

Filed: February 14, 2019

Publication date: August 22, 2019

Inventor: Gil Shamir

1 2 next