Patents by Inventor Orna Raz

Orna Raz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Generating an error policy for a machine learning engine

Patent number: 12670440

Abstract: A computer hardware system includes a slice generator and a policy generator and performs the following. The slice generator slices a first dataset including true values and predicted values of a class variable into a plurality of slices each defining a plurality of observations within the first dataset. A first one and another one of the plurality of slices are selected, and a union of observations is generated by adding observations within the selected another one to observations within the selected first one of the plurality of slices. The selecting another one of the plurality of slices and the generating the union is repeated until a number of observations within the union reaches a predetermined value. Using the policy generator and after the number of observations within the union reaches the predetermined value, an error policy is generated. The predicted values were generated by a machine learning engine.

Type: Grant

Filed: December 20, 2022

Date of Patent: June 30, 2026

Assignee: International Business Machines Corporation

Inventors: Samuel Solomon Ackerman, Orna Raz, Eitan Daniel Farchi, Marcel Zalmanovici
Generating data slice rules for data generation

Patent number: 12645962

Abstract: An example system includes a processor to receive a data set. The processor can generate a data slice rule based on a data observation for a data point in the data set. The processor can generate an instance of data based on the generated data slice rule.

Type: Grant

Filed: February 28, 2022

Date of Patent: June 2, 2026

Assignee: International Business Machines Corporation

Inventors: Orna Raz, George Kour, Ramasuri Narayanam, Samuel Solomon Ackerman, Marcel Zalmanovici
Detecting labels of a data catalog incorrectly assigned to data set fields

Patent number: 12579127

Abstract: Described are techniques for detecting labels incorrectly assigned to data set fields. The data of each data set field, such as those data set fields assigned to the same label, are represented using a set of characteristics. The data set fields are then clustered into clusters based on the characteristics of the data of the data set fields. Those clusters of data set fields with a homogeneity (being assigned the same label) that exceeds a first threshold value and is below a second threshold value are identified. One or labels assigned to the data set fields of the identified clusters are identified as being suspect for incorrect assignments by having a frequency below a third threshold value (e.g., 3%), which may be user-designated. The label(s) identified as being suspect for incorrect assignment are then presented to a user for review.

Type: Grant

Filed: July 8, 2023

Date of Patent: March 17, 2026

Assignee: International Business Machines Corporation

Inventors: Orna Raz, Yannick Saillet, Maya Zohar, Marcel Zalmanovici
MODIFYING ARTIFICIAL NEURAL NETWORKS FOR TESTING SECURITY

Publication number: 20260050673

Abstract: Systems, methods, and computer program products for modifying an artificial neural network are described herein. A method comprises reading an input artificial neural network; iteratively generating a modified artificial neural network, wherein generating the modified artificial neural network comprises removing at least one node from the artificial neural network; determining a performance score for the modified artificial neural network; and selecting a subset of the nodes of the artificial neural network. Determining the performance score may comprise providing a plurality of input prompts to the modified artificial neural network; generating a plurality of outputs based on the plurality of input prompts, determining output scores for the plurality of outputs, and determining the performance score based on the output scores.

Type: Application

Filed: August 14, 2024

Publication date: February 19, 2026

Inventors: Ora Nova Fandina, Orna Raz, George Kour, Marcel Zalmanovici, Eitan Daniel Farchi, Ateret Anaby-Tavor
GROUND-TRUTH-LESS PERFORMANCE PREDICTION OF GENERATIVE QUESTION-ANSWERING SYSTEMS

Publication number: 20260017346

Abstract: Systems and techniques that facilitate ground-truth-less performance prediction of generative question-answering systems are provided. In various embodiments, a system can access a large language model (LLM) and a natural language question for which a ground-truth answer is unavailable. In various aspects, the system can generate, via a machine learning classifier that receives as input a set of properties associated with the natural language question, a classification label indicating whether or not the large language model will correctly answer the natural language question. In various instances, the set of properties can include a semantic category of the natural language question, a subject popularity of the natural language question, a semantic consistency exhibited by the LLM in response to repeated executions on the natural language question, or a semantic consistency exhibited by the LLM in response to execution on paraphrases of the natural language question.

Type: Application

Filed: July 12, 2024

Publication date: January 15, 2026

Inventors: Ella Rabinovich, Samuel Solomon Ackerman, ORNA RAZ, Eitan Daniel Farchi, Ateret Anaby - Tavor
Reliable and interpretable drift detection in streams of short texts

Patent number: 12499878

Abstract: Various systems and methods are presented regarding detecting data drift. The data of interest can be batches of utterances received at an interface (e.g., a chatbot). The batches of utterances can be compared with topics present in training data utilized to train a data classifier (e.g., an autoencoder), wherein topics identified in the batches of utterances that are not present in the training data can be considered to be novel topics. The greater the presence of novel topics in a batch of utterances, the greater the divergence of the batch of utterances from the content of the training data. The novel topics can be identified and subsequently applied to the training data such that the data classifier can be re-trained with the novel topics, thereby causing the data classifier to be contemporaneous with the novel topics. In an embodiment, the utterances can be short streams of text, symbols, and suchlike.

Type: Grant

Filed: April 5, 2023

Date of Patent: December 16, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ella Rabinovich, Matan Vetzler, Samuel Solomon Ackerman, Ateret Anaby - Tavor, Eitan Daniel Farchi, Orna Raz
DETECTING LABELS OF A DATA CATALOG INCORRECTLY ASSIGNED TO DATA SET FIELDS

Publication number: 20250013629

Abstract: Described are techniques for detecting labels incorrectly assigned to data set fields. The data of each data set field, such as those data set fields assigned to the same label, are represented using a set of characteristics. The data set fields are then clustered into clusters based on the characteristics of the data of the data set fields. Those clusters of data set fields with a homogeneity (being assigned the same label) that exceeds a first threshold value and is below a second threshold value are identified. One or labels assigned to the data set fields of the identified clusters are identified as being suspect for incorrect assignments by having a frequency below a third threshold value (e.g., 3%), which may be user-designated. The label(s) identified as being suspect for incorrect assignment are then presented to a user for review.

Type: Application

Filed: July 8, 2023

Publication date: January 9, 2025

Inventors: Orna Raz, Yannick Saillet, Maya Zohar, Marcel Zalmanovici
PROVIDING AND COMPARING CUSTOMIZED RISK SCORES FOR ARTIFICIAL INTELLIGENCE MODELS

Publication number: 20240362337

Abstract: One or more systems, devices, computer program products and/or computer-implemented methods provided herein relate to risk assessment for artificial intelligence models, and more specifically, to the generation of customized risk scores and converted comparable scores. In an embodiment, the customized risk assessment scores can be based on a risk profile determined from risk assessment requirements and measurements of an artificial intelligence model. In another embodiment, one or more customized risk assessment scores can be converted to a converted risk assessment score that is comparable to a customized risk assessment score or another converted risk assessment score.

Type: Application

Filed: April 28, 2023

Publication date: October 31, 2024

Inventors: Abigail Goldsteen, Michael Hind, Jacquelyn Martino, David John Piorkowski, Orna Raz, John Thomas Richards, Moninder Singh, Marcel Zalmanovici
RELIABLE AND INTERPRETABLE DRIFT DETECTION IN STREAMS OF SHORT TEXTS

Publication number: 20240339112

Abstract: Various systems and methods are presented regarding detecting data drift. The data of interest can be batches of utterances received at an interface (e.g., a chatbot). The batches of utterances can be compared with topics present in training data utilized to train a data classifier (e.g., an autoencoder), wherein topics identified in the batches of utterances that are not present in the training data can be considered to be novel topics. The greater the presence of novel topics in a batch of utterances, the greater the divergence of the batch of utterances from the content of the training data. The novel topics can be identified and subsequently applied to the training data such that the data classifier can be re-trained with the novel topics, thereby causing the data classifier to be contemporaneous with the novel topics. In an embodiment, the utterances can be short streams of text, symbols, and suchlike.

Type: Application

Filed: April 5, 2023

Publication date: October 10, 2024

Inventors: Ella Rabinovich, Matan Vetzler, Samuel Solomon Ackerman, Ateret Anaby - Tavor, Eitan Daniel Farchi, Orna Raz
Method and apparatus for enhancing effectivity of machine learning solutions

Patent number: 12056580

Abstract: A method, system and computer program product, the method comprising: creating a model representing underperforming cases; from a case collection having a total performance, and which comprises for each of a multiplicity of records: a value for each feature from a collection of features, a ground truth label and a prediction of a machine learning (ML) engine, obtaining one or more features; dividing the records into groups, based on values of the features in each record; for one group of the groups, calculating a performance parameter of the ML engine over the portion of the records associated with the group; subject to the performance parameter of the group being below the total performance in at least a predetermined threshold: determining a characteristic for the group; adding the characteristic of the group to the model; and providing the model to a user, thus indicating under-performing parts of the test collection.

Type: Grant

Filed: October 24, 2019

Date of Patent: August 6, 2024

Assignee: International Business Machines Corporation

Inventors: Orna Raz, Marcel Zalmanovici, Aviad Zlotnick
GENERATING AN ERROR POLICY FOR A MACHINE LEARNING ENGINE

Publication number: 20240202575

Abstract: A computer hardware system includes a slice generator and a policy generator and performs the following. The slice generator slices a first dataset including true values and predicted values of a class variable into a plurality of slices each defining a plurality of observations within the first dataset. A first one and another one of the plurality of slices are selected, and a union of observations is generated by adding observations within the selected another one to observations within the selected first one of the plurality of slices. The selecting another one of the plurality of slices and the generating the union is repeated until a number of observations within the union reaches a predetermined value. Using the policy generator and after the number of observations within the union reaches the predetermined value, an error policy is generated. The predicted values were generated by a machine learning engine.

Type: Application

Filed: December 20, 2022

Publication date: June 20, 2024

Inventors: Samuel Solomon Ackerman, Orna Raz, Eitan Daniel Farchi, Marcel Zalmanovici
GENERATING DATA SLICE RULES FOR DATA GENERATION

Publication number: 20230274169

Abstract: An example system includes a processor to receive a data set. The processor can generate a data slice rule based on a data observation for a data point in the data set. The processor can generate an instance of data based on the generated data slice rule.

Type: Application

Filed: February 28, 2022

Publication date: August 31, 2023

Inventors: Orna RAZ, George KOUR, Ramasuri NARAYANAM, Samuel Solomon ACKERMAN, Marcel ZALMANOVICI
Performance measurement of predictors

Patent number: 11734143

Abstract: A method, apparatus and a product for determining a performance measurement of predictors. The method comprises obtaining a dataset comprising data instances. Each data instance is associated with a label; obtaining a predictor. The predictor is configured to provide a prediction of a label for a data instance; determining a plurality of data slices that are subsets of the dataset. computing, for each data slice in the plurality of data slices and based on an application of the predictor on each data instance that is mapped to the data slice, a performance measurement that is indicative of a successful label prediction for a data instance comprised by the data slice, whereby obtaining a plurality of performance measurements; based on the plurality of performance measurements, computing a performance measurement of the predictor over the dataset; if the performance measurement of the predictor is below a threshold, performing a mitigating action.

Type: Grant

Filed: April 10, 2020

Date of Patent: August 22, 2023

Assignee: International Business Machines Corporation

Inventors: Orna Raz, Eitan Farchi, Marcel Zalmanovici
RANKING DATA SLICES USING MEASURES OF INTEREST

Publication number: 20230237343

Abstract: An example system includes a processor to receive a test set, data slices, and a measure of interest. The processor can rank the data slices based on the test set, the data slices, and the set of measures of interest. The test set includes data points from the same feature space used to train a machine learning model. Each data slice is ranked according to generated slice grades representing unique information contribution of each data slice to the measure of interest with respect to the other data slices. The processor can then present the ranked data slices.

Type: Application

Filed: January 26, 2022

Publication date: July 27, 2023

Inventors: Orna RAZ, Samuel Solomon ACKERMAN, Marcel ZALMANOVICI, Eitan Daniel FARCHI, Ramasuri NARAYANAM
METHODS AND SYSTEMS FOR AUTOMATICALLY IDENTIFY IN A DATASET INSUFFICIENT DATA FOR LEARNING, OR RECORDS WITH ANOMALOUS COMBINATIONS OF FEATURE VALUES

Publication number: 20230205847

Abstract: Systems and methods for automatically identifying in a dataset insufficient data for learning, or records with anomalous combinations of feature values, by partition of numeric and/or categorical data space into human-interpretable regions are disclosed. The method comprises: receiving a dataset of numeric and/or categorical features with a plurality of observations. Calculating observation density for each observation according to a distance or anomaly based metric, and receiving a density measurement. Partitioning the dataset along the numeric and/or categorical features according to the density measurement of each observation by a perpendicular cut along the feature spaces, receiving a map of a plurality of hyper-rectangular shapes representing various levels of density including empty spaces.

Type: Application

Filed: December 26, 2021

Publication date: June 29, 2023

Inventors: Samuel Solomon Ackerman, Orna Raz, Marcel Zalmanovici, Eitan Daniel Farchi, Avi Ziv
Optimizing hierarchical classification with adaptive node collapses

Patent number: 11676043

Abstract: A mechanism is provided in a data processing system having a processor and a memory. The memory comprises instructions which are executed by the processor to cause the processor to implement a training system for finding an optimal surface for hierarchical classification task on an ontology. The training system receives a training data set and a hierarchical classification ontology data structure. The training system generates a neural network architecture based on the training data set and the hierarchical classification ontology data structure. The neural network architecture comprises an indicative layer, a parent tier (PT) output and a lower leaf tier (LLT) output. The training system trains the neural network architecture to classify the training data set to leaf nodes at the LLT output and parent nodes at the PT output. The indicative layer in the neural network architecture determines a surface that passes through each path from a root to a leaf node in the hierarchical ontology data structure.

Type: Grant

Filed: March 4, 2019

Date of Patent: June 13, 2023

Assignee: International Business Machines Corporation

Inventors: Pathirage Dinindu Sujan Udayanga Perera, Orna Raz, Ramani Routray, Vivek Krishnamurthy, Sheng Hua Bao, Eitan D. Farchi
AUTOMATIC DETECTION OF CHANGES IN DATA SET RELATIONS

Publication number: 20230102152

Abstract: A system, program product, and method for automatic detection of data drift in a data set are presented. The method includes determining changes to relations in the data set through generating baseline and production data sets. The method further includes generating a production data set with some inserted data distortion, and defining, for a plurality of features in the baseline data set, potential relations for participant features. The method also includes determining a first likelihood and a second likelihood of each potential relation in the baseline and production data sets, respectively, for the participant features. The method further includes comparing each first likelihood with each second likelihood, generating a comparison value that is compared with a threshold value, and determining, subject to the comparison value exceeding the threshold value, the potential relation in the baseline data set does not describe a relation in the production data set.

Type: Application

Filed: September 24, 2021

Publication date: March 30, 2023

Inventors: Eliran Roffe, Samuel Solomon Ackerman, Eitan Daniel Farchi, Orna Raz
Identifying data drifts that have an adverse effect on predictors

Patent number: 11568169

Abstract: A method, apparatus and product for identifying data drifts.

Type: Grant

Filed: April 28, 2019

Date of Patent: January 31, 2023

Assignee: International Business Machines Corporation

Inventors: Eitan Farchi, Orna Raz, Marcel Zalmanovici
Method and apparatus for employing machine learning solutions

Patent number: 11556847

Abstract: A method, system and computer program product, the method comprising: obtaining computer code of an employed system comprising a plurality of components; obtaining data related to operating the plurality of components; based on the computer code and the data, identifying: a first component from the plurality of components, to be maintained; and a second component from the plurality of components, to be at least partly replaced by a machine learning component; and providing to a user an identification of the first component and the second component.

Type: Grant

Filed: October 17, 2019

Date of Patent: January 17, 2023

Assignee: International Business Machines Corporation

Inventors: Eitan Daniel Farchi, Howard Michael Hess, Orna Raz
Estimating feasibility and effort for a machine learning solution

Patent number: 11556810

Abstract: A method, computer system, and a computer program product for assessing a likelihood of success associated with developing at least one machine learning (ML) solution is provided. The present invention may include generating a set of questions based on a set of raw training data. The present invention may also include computing a feasibility score based on an answer corresponding with each question from the generated set of questions. The present invention may then include, in response to determining that the computed feasibility score satisfies a threshold, computing a level of effort associated with developing the at least one ML solution to address a problem. The present invention may further include presenting, to a user, a plurality of results associated with assessing the likelihood of success of the at least one ML solution.

Type: Grant

Filed: July 11, 2019

Date of Patent: January 17, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Pathirage Dinindu Sujan Udayanga Perera, Orna Raz, Ramani Routray, Eitan Daniel Farchi

1 2 3 next