Patents by Inventor TAYLER HETHERINGTON

TAYLER HETHERINGTON has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11720751
    Abstract: A model-agnostic global explainer for textual data processing (NLP) machine learning (ML) models, “NLP-MLX”, is described herein. NLP-MLX explains global behavior of arbitrary NLP ML models by identifying globally-important tokens within a textual dataset containing text data. NLP-MLX accommodates any arbitrary combination of training dataset pre-processing operations used by the NLP ML model. NLP-MLX includes four main stages. A Text Analysis stage converts text in documents of a target dataset into tokens. A Token Extraction stage uses pre-processing techniques to efficiently pre-filter the complete list of tokens into a smaller set of candidate important tokens. A Perturbation Generation stage perturbs tokens within documents of the dataset to help evaluate the effect of different tokens, and combinations of tokens, on the model's predictions.
    Type: Grant
    Filed: January 11, 2021
    Date of Patent: August 8, 2023
    Assignee: Oracle International Corporation
    Inventors: Zahra Zohrevand, Tayler Hetherington, Karoon Rashedi Nia, Yasha Pushak, Sanjay Jinturkar, Nipun Agarwal
  • Patent number: 11687540
    Abstract: Techniques are described for fast approximate conditional sampling by randomly sampling a dataset and then performing a nearest neighbor search on the pre-sampled dataset to reduce the data over which the nearest neighbor search must be performed and, according to an embodiment, to effectively reduce the number of nearest neighbors that are to be found within the random sample. Furthermore, KD-Tree-based stratified sampling is used to generate a representative sample of a dataset. KD-Tree-based stratified sampling may be used to identify the random sample for fast approximate conditional sampling, which reduces variance in the resulting data sample. As such, using KD-Tree-based stratified sampling to generate the random sample for fast approximate conditional sampling ensures that any nearest neighbor selected, for a target data instance, from the random sample is likely to be among the nearest neighbors of the target data instance within the unsampled dataset.
    Type: Grant
    Filed: February 18, 2021
    Date of Patent: June 27, 2023
    Assignee: Oracle International Corporation
    Inventors: Yasha Pushak, Tayler Hetherington, Karoon Rashedi Nia, Zahra Zohrevand, Sanjay Jinturkar, Nipun Agarwal
  • Patent number: 11531915
    Abstract: Herein are techniques to generate candidate rulesets for machine learning (ML) explainability (MLX) for black-box ML models. In an embodiment, an ML model generates classifications that each associates a distinct example with a label. A decision tree that, based on the classifications, contains tree nodes is received or generated. Each node contains label(s), a condition that identifies a feature of examples, and a split value for the feature. When a node has child nodes, the feature and the split value that are identified by the condition of the node are set to maximize information gain of the child nodes. Candidate rules are generated by traversing the tree. Each rule is built from a combination of nodes in a tree traversal path. Each rule contains a condition of at least one node and is assigned to a rule level. Candidate rules are subsequently optimized into an optimal ruleset for actual use.
    Type: Grant
    Filed: March 20, 2019
    Date of Patent: December 20, 2022
    Assignee: Oracle International Corporation
    Inventors: Tayler Hetherington, Zahra Zohrevand, Onur Kocberber, Karoon Rashedi Nia, Sam Idicula, Nipun Agarwal
  • Publication number: 20220366297
    Abstract: In an embodiment, a computer hosts a machine learning (ML) model that infers a particular inference for a particular tuple that is based on many features. For each feature, and for each of many original tuples, the computer: a) randomly selects many perturbed values from original values of the feature in the original tuples, b) generates perturbed tuples that are based on the original tuple and a respective perturbed value, c) causes the ML model to infer a respective perturbed inference for each perturbed tuple, and d) measures a respective difference between each perturbed inference of the perturbed tuples and the particular inference. For each feature, a respective importance of the feature is calculated based on the differences measured for the feature. Feature importances may be used to rank features by influence and/or generate a local ML explainability (MLX) explanation.
    Type: Application
    Filed: May 13, 2021
    Publication date: November 17, 2022
    Inventors: Yasha Pushak, Zahra Zohrevand, Tayler Hetherington, Karoon Rashedi Nia, Sanjay Jinturkar, Nipun Agarwal
  • Publication number: 20220335255
    Abstract: In an embodiment, a computer assigns a respective probability distribution to each of many features that include a first feature and a second feature that are assigned different probability distributions. For each original tuple that are based on the features, a machine learning (ML) model infers a respective original inference. For each feature, and for each original tuple, the computer: a) generates perturbed values based on the probability distribution of the feature, b) generates perturbed tuples that are based on the original tuple and a respective perturbed value, c) causes the ML model to infer a respective perturbed inference for each perturbed tuple, and d) measures a respective difference between each perturbed inference and the original inference. A respective importance of each feature is calculated based on the differences measured for the feature. Feature importances may be used to rank features by influence and/or generate a global or local ML explainability (MLX) explanation.
    Type: Application
    Filed: April 16, 2021
    Publication date: October 20, 2022
    Inventors: ZAHRA ZOHREVAND, YASHA PUSHAK, TAYLER HETHERINGTON, KAROON RASHEDI NIA, SANJAY JINTURKAR, NIPUN AGARWAL
  • Publication number: 20220309360
    Abstract: Herein are techniques for topic modeling and content perturbation that provide machine learning (ML) explainability (MLX) for natural language processing (NLP). A computer hosts an ML model that infers an original inference for each of many text documents that contain many distinct terms. To each text document (TD) is assigned, based on terms in the TD, a topic that contains a subset of the distinct terms. In a perturbed copy of each TD, a perturbed subset of the distinct terms is replaced. For the perturbed copy of each TD, the ML model infers a perturbed inference. For TDs of a topic, the computer detects that a difference between original inferences of the TDs of the topic and perturbed inferences of the TDs of the topic exceeds a threshold. Based on terms in the TDs of the topic, the topic is replaced with multiple, finer-grained new topics. After sufficient topic modeling, a regional explanation of the ML model is generated.
    Type: Application
    Filed: March 25, 2021
    Publication date: September 29, 2022
    Inventors: Zahra Zohrevand, Tayler Hetherington, Karoon Rashedi Nia, Yasha Pushak, Sanjay Jinturkar, Nipun Agarwal
  • Publication number: 20220261400
    Abstract: Techniques are described for fast approximate conditional sampling by randomly sampling a dataset and then performing a nearest neighbor search on the pre-sampled dataset to reduce the data over which the nearest neighbor search must be performed and, according to an embodiment, to effectively reduce the number of nearest neighbors that are to be found within the random sample. Furthermore, KD-Tree-based stratified sampling is used to generate a representative sample of a dataset. KD-Tree-based stratified sampling may be used to identify the random sample for fast approximate conditional sampling, which reduces variance in the resulting data sample. As such, using KD-Tree-based stratified sampling to generate the random sample for fast approximate conditional sampling ensures that any nearest neighbor selected, for a target data instance, from the random sample is likely to be among the nearest neighbors of the target data instance within the unsampled dataset.
    Type: Application
    Filed: February 18, 2021
    Publication date: August 18, 2022
    Inventors: Yasha Pushak, Tayler Hetherington, Karoon Rashedi Nia, Zahra Zohrevand, Sanjay Jinturkar, Nipun Agarwal
  • Publication number: 20220229983
    Abstract: A model-agnostic global explainer for textual data processing (NLP) machine learning (ML) models, “NLP-MLX”, is described herein. NLP-MLX explains global behavior of arbitrary NLP ML models by identifying globally-important tokens within a textual dataset containing text data. NLP-MLX accommodates any arbitrary combination of training dataset pre-processing operations used by the NLP ML model. NLP-MLX includes four main stages. A Text Analysis stage converts text in documents of a target dataset into tokens. A Token Extraction stage uses pre-processing techniques to efficiently pre-filter the complete list of tokens into a smaller set of candidate important tokens. A Perturbation Generation stage perturbs tokens within documents of the dataset to help evaluate the effect of different tokens, and combinations of tokens, on the model's predictions.
    Type: Application
    Filed: January 11, 2021
    Publication date: July 21, 2022
    Inventors: Zahra Zohrevand, Tayler Hetherington, Karoon Rashedi Nia, Yasha Pushak, Sanjay Jinturkar, Nipun Agarwal
  • Publication number: 20220198277
    Abstract: Herein are generative adversarial networks to ensure realistic local samples and surrogate models to provide machine learning (ML) explainability (MLX). Based on many features, an embodiment trains an ML model. The ML model inferences an original inference for original feature values respectively for many features. Based on the same features, a generator model is trained to generate realistic local samples that are distinct combinations of feature values for the features. A surrogate model is trained based on the generator model and based on the original inference by the ML model and/or the original feature values that the original inference is based on. Based on the surrogate model, the ML model is explained. The local samples may be weighted based on semantic similarity to the original feature values, which may facilitate training the surrogate model and/or ranking the relative importance of the features. Local sample weighting may be based on populating a random forest with the local samples.
    Type: Application
    Filed: December 22, 2020
    Publication date: June 23, 2022
    Inventors: Karoon Rashedi Nia, Tayler Hetherington, Zahra Zohrevand, Yasha Pushak, Sanjay Jinturkar, Nipun Agarwal
  • Publication number: 20220188645
    Abstract: Herein are counterfactual explanations of machine learning (ML) inferencing provided by generative adversarial networks (GANs) that ensure realistic counterfactuals and use latent spaces to optimize perturbations. In an embodiment, a first computer trains a generator model in a GAN. A same or second computer hosts a classifier model that inferences an original label for original feature values respectively for many features. Runtime ML explainability (MLX) occurs on the first or second or a third computer as follows. The generator model from the GAN generates a sequence of revised feature values that are based on noise. The noise is iteratively optimized based on a distance between the original feature values and current revised feature values in the sequence of revised feature values. The classifier model inferences a current label respectively for each counterfactual in the sequence of revised feature values.
    Type: Application
    Filed: December 16, 2020
    Publication date: June 16, 2022
    Inventors: Karoon Rashedi Nia, Tayler Hetherington, Zahra Zohrevand, Yasha Pushak, Sanjay Jinturkar, Nipun Agarwal
  • Publication number: 20220172105
    Abstract: End-to-end explanation techniques, which efficiently explain the behavior (feature importance) of any machine learning model on large tabular datasets, are disclosed. These techniques comprise two down-sampling methods to efficiently select a small set of representative samples of a high-dimensional dataset for explaining a machine learning model by making use of the characteristics of the dataset or of an explainer of a machine learning model to optimize the explanation quality. These techniques significantly improve the explanation speed while maintaining the explanation quality of a full dataset evaluation.
    Type: Application
    Filed: November 30, 2020
    Publication date: June 2, 2022
    Inventors: KAROON RASHEDI NIA, TAYLER HETHERINGTON, ZAHRA ZOHREVAND, SANJAY JINTURKAR, NIPUN AGARWAL
  • Publication number: 20220129791
    Abstract: A systematic explainer is described herein, which comprises local, model-agnostic, surrogate ML model-based explanation techniques that faithfully explain predictions from any machine learning classifier or regressor. The systematic explainer systematically generates local data samples around a given target data sample, which improves on exhaustive or random data sample generation algorithms. Specifically, using principles of locality and approximation of local decision boundaries, techniques described herein identify a hypersphere (or data sample neighborhood) over which to train the surrogate ML model such that the surrogate ML model produces valuable, high-quality information explaining data samples in the neighborhood of the target data sample.
    Type: Application
    Filed: October 28, 2020
    Publication date: April 28, 2022
    Inventors: Karoon Rashedi Nia, Tayler Hetherington, Zahra Zohrevand, Sanjay Jinturkar, Nipun Agarwal
  • Publication number: 20200302318
    Abstract: Herein are techniques to generate candidate rulesets for machine learning (ML) explainability (MLX) for black-box ML models. In an embodiment, an ML model generates classifications that each associates a distinct example with a label. A decision tree that, based on the classifications, contains tree nodes is received or generated. Each node contains label(s), a condition that identifies a feature of examples, and a split value for the feature. When a node has child nodes, the feature and the split value that are identified by the condition of the node are set to maximize information gain of the child nodes. Candidate rules are generated by traversing the tree. Each rule is built from a combination of nodes in a tree traversal path. Each rule contains a condition of at least one node and is assigned to a rule level. Candidate rules are subsequently optimized into an optimal ruleset for actual use.
    Type: Application
    Filed: March 20, 2019
    Publication date: September 24, 2020
    Inventors: TAYLER HETHERINGTON, ZAHRA ZOHREVAND, ONUR KOCBERBER, KAROON RASHEDI NIA, SAM IDICULA, NIPUN AGARWAL
  • Patent number: 10768982
    Abstract: Herein are techniques for analysis of data streams. In an embodiment, a computer associates each software actor with data streams. Each software actor has its own backlog queue of data to analyze. In response to receiving some stream content and based on the received stream content, data is distributed to some software actors. In response to determining that the data satisfies completeness criteria of a particular software actor, an indication of the data is appended onto the backlog queue of the particular software actor. The particular software actor is reset to an initial state by loading an execution snapshot of a previous initial execution of an embedded virtual machine. Based on the particular software actor, execution of the execution snapshot of the previous initial execution is resumed to dequeue and process the indication of the data from the backlog queue of the particular software actor to generate a result.
    Type: Grant
    Filed: September 19, 2018
    Date of Patent: September 8, 2020
    Assignee: Oracle International Corporation
    Inventors: Andrew Brownsword, Tayler Hetherington, Pavan Chandrashekar, Akhilesh Singhania, Stuart Wray, Pravin Shinde, Felix Schmidt, Craig Schelp, Onur Kocberber, Juan Fernandez Peinador, Rod Reddekopp, Manel Fernandez Gomez, Nipun Agarwal
  • Publication number: 20200097810
    Abstract: Techniques are described herein for automatically generating statistical features describing trends in time-series data that may then become inputs to machine learning models. The framework involves a set of algorithms for selecting a number and size of window based statistical features to use as input features, evaluating them during a series of training phases with a machine learning model using training, test and validation time series data. The training and evaluation phases provide particular values for a number and a size of window based statistical features that yield best scores in terms of prediction accuracy. The particular values are then used with input time series data to generate an augmented time-series data to input to the trained machine learning model for obtaining predictions regarding the time series as well as identified anomalies in the input time series data.
    Type: Application
    Filed: September 25, 2018
    Publication date: March 26, 2020
    Inventors: Tayler Hetherington, Sam Idicula, Nipun Agarwal
  • Publication number: 20200089529
    Abstract: Herein are techniques for analysis of data streams. In an embodiment, a computer associates each software actor with data streams. Each software actor has its own backlog queue of data to analyze. In response to receiving some stream content and based on the received stream content, data is distributed to some software actors. In response to determining that the data satisfies completeness criteria of a particular software actor, an indication of the data is appended onto the backlog queue of the particular software actor. The particular software actor is reset to an initial state by loading an execution snapshot of a previous initial execution of an embedded virtual machine. Based on the particular software actor, execution of the execution snapshot of the previous initial execution is resumed to dequeue and process the indication of the data from the backlog queue of the particular software actor to generate a result.
    Type: Application
    Filed: September 19, 2018
    Publication date: March 19, 2020
    Inventors: ANDREW BROWNSWORD, TAYLER HETHERINGTON, PAVAN CHANDRASHEKAR, AKHILESH SINGHANIA, STUART WRAY, PRAVIN SHINDE, FELIX SCHMIDT, CRAIG SCHELP, ONUR KOCBERBER, JUAN FERNANDEZ PEINADOR, ROD REDDEKOPP, MANEL FERNANDEZ GOMEZ, NIPUN AGARWAL