Patents by Inventor Marcel Zalmanovici

Marcel Zalmanovici has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DETECTING LABELS OF A DATA CATALOG INCORRECTLY ASSIGNED TO DATA SET FIELDS

Publication number: 20250013629

Abstract: Described are techniques for detecting labels incorrectly assigned to data set fields. The data of each data set field, such as those data set fields assigned to the same label, are represented using a set of characteristics. The data set fields are then clustered into clusters based on the characteristics of the data of the data set fields. Those clusters of data set fields with a homogeneity (being assigned the same label) that exceeds a first threshold value and is below a second threshold value are identified. One or labels assigned to the data set fields of the identified clusters are identified as being suspect for incorrect assignments by having a frequency below a third threshold value (e.g., 3%), which may be user-designated. The label(s) identified as being suspect for incorrect assignment are then presented to a user for review.

Type: Application

Filed: July 8, 2023

Publication date: January 9, 2025

Inventors: Orna Raz, Yannick Saillet, Maya Zohar, Marcel Zalmanovici
PROVIDING AND COMPARING CUSTOMIZED RISK SCORES FOR ARTIFICIAL INTELLIGENCE MODELS

Publication number: 20240362337

Abstract: One or more systems, devices, computer program products and/or computer-implemented methods provided herein relate to risk assessment for artificial intelligence models, and more specifically, to the generation of customized risk scores and converted comparable scores. In an embodiment, the customized risk assessment scores can be based on a risk profile determined from risk assessment requirements and measurements of an artificial intelligence model. In another embodiment, one or more customized risk assessment scores can be converted to a converted risk assessment score that is comparable to a customized risk assessment score or another converted risk assessment score.

Type: Application

Filed: April 28, 2023

Publication date: October 31, 2024

Inventors: Abigail Goldsteen, Michael Hind, Jacquelyn Martino, David John Piorkowski, Orna Raz, John Thomas Richards, Moninder Singh, Marcel Zalmanovici
Method and apparatus for enhancing effectivity of machine learning solutions

Patent number: 12056580

Abstract: A method, system and computer program product, the method comprising: creating a model representing underperforming cases; from a case collection having a total performance, and which comprises for each of a multiplicity of records: a value for each feature from a collection of features, a ground truth label and a prediction of a machine learning (ML) engine, obtaining one or more features; dividing the records into groups, based on values of the features in each record; for one group of the groups, calculating a performance parameter of the ML engine over the portion of the records associated with the group; subject to the performance parameter of the group being below the total performance in at least a predetermined threshold: determining a characteristic for the group; adding the characteristic of the group to the model; and providing the model to a user, thus indicating under-performing parts of the test collection.

Type: Grant

Filed: October 24, 2019

Date of Patent: August 6, 2024

Assignee: International Business Machines Corporation

Inventors: Orna Raz, Marcel Zalmanovici, Aviad Zlotnick
GENERATING AN ERROR POLICY FOR A MACHINE LEARNING ENGINE

Publication number: 20240202575

Abstract: A computer hardware system includes a slice generator and a policy generator and performs the following. The slice generator slices a first dataset including true values and predicted values of a class variable into a plurality of slices each defining a plurality of observations within the first dataset. A first one and another one of the plurality of slices are selected, and a union of observations is generated by adding observations within the selected another one to observations within the selected first one of the plurality of slices. The selecting another one of the plurality of slices and the generating the union is repeated until a number of observations within the union reaches a predetermined value. Using the policy generator and after the number of observations within the union reaches the predetermined value, an error policy is generated. The predicted values were generated by a machine learning engine.

Type: Application

Filed: December 20, 2022

Publication date: June 20, 2024

Inventors: Samuel Solomon Ackerman, Orna Raz, Eitan Daniel Farchi, Marcel Zalmanovici
GENERATING DATA SLICE RULES FOR DATA GENERATION

Publication number: 20230274169

Abstract: An example system includes a processor to receive a data set. The processor can generate a data slice rule based on a data observation for a data point in the data set. The processor can generate an instance of data based on the generated data slice rule.

Type: Application

Filed: February 28, 2022

Publication date: August 31, 2023

Inventors: Orna RAZ, George KOUR, Ramasuri NARAYANAM, Samuel Solomon ACKERMAN, Marcel ZALMANOVICI
Performance measurement of predictors

Patent number: 11734143

Abstract: A method, apparatus and a product for determining a performance measurement of predictors. The method comprises obtaining a dataset comprising data instances. Each data instance is associated with a label; obtaining a predictor. The predictor is configured to provide a prediction of a label for a data instance; determining a plurality of data slices that are subsets of the dataset. computing, for each data slice in the plurality of data slices and based on an application of the predictor on each data instance that is mapped to the data slice, a performance measurement that is indicative of a successful label prediction for a data instance comprised by the data slice, whereby obtaining a plurality of performance measurements; based on the plurality of performance measurements, computing a performance measurement of the predictor over the dataset; if the performance measurement of the predictor is below a threshold, performing a mitigating action.

Type: Grant

Filed: April 10, 2020

Date of Patent: August 22, 2023

Assignee: International Business Machines Corporation

Inventors: Orna Raz, Eitan Farchi, Marcel Zalmanovici
RANKING DATA SLICES USING MEASURES OF INTEREST

Publication number: 20230237343

Abstract: An example system includes a processor to receive a test set, data slices, and a measure of interest. The processor can rank the data slices based on the test set, the data slices, and the set of measures of interest. The test set includes data points from the same feature space used to train a machine learning model. Each data slice is ranked according to generated slice grades representing unique information contribution of each data slice to the measure of interest with respect to the other data slices. The processor can then present the ranked data slices.

Type: Application

Filed: January 26, 2022

Publication date: July 27, 2023

Inventors: Orna RAZ, Samuel Solomon ACKERMAN, Marcel ZALMANOVICI, Eitan Daniel FARCHI, Ramasuri NARAYANAM
METHODS AND SYSTEMS FOR AUTOMATICALLY IDENTIFY IN A DATASET INSUFFICIENT DATA FOR LEARNING, OR RECORDS WITH ANOMALOUS COMBINATIONS OF FEATURE VALUES

Publication number: 20230205847

Abstract: Systems and methods for automatically identifying in a dataset insufficient data for learning, or records with anomalous combinations of feature values, by partition of numeric and/or categorical data space into human-interpretable regions are disclosed. The method comprises: receiving a dataset of numeric and/or categorical features with a plurality of observations. Calculating observation density for each observation according to a distance or anomaly based metric, and receiving a density measurement. Partitioning the dataset along the numeric and/or categorical features according to the density measurement of each observation by a perpendicular cut along the feature spaces, receiving a map of a plurality of hyper-rectangular shapes representing various levels of density including empty spaces.

Type: Application

Filed: December 26, 2021

Publication date: June 29, 2023

Inventors: Samuel Solomon Ackerman, Orna Raz, Marcel Zalmanovici, Eitan Daniel Farchi, Avi Ziv
Identifying data drifts that have an adverse effect on predictors

Patent number: 11568169

Abstract: A method, apparatus and product for identifying data drifts.

Type: Grant

Filed: April 28, 2019

Date of Patent: January 31, 2023

Assignee: International Business Machines Corporation

Inventors: Eitan Farchi, Orna Raz, Marcel Zalmanovici
Generating training sets to train machine learning models

Patent number: 11514691

Abstract: A computer system trains a machine learning model. A vector representation is generated for each document in a collection of documents. The documents are clustered based on the vector representations of the documents to produce a plurality of clusters. A training set is produced by selecting one or more documents from each cluster, wherein the selected documents represent a sample of the collection of documents to train the machine learning model. The machine learning model is trained by applying the training set to the machine learning model. Embodiments of the present invention further include a method and program product for training a machine learning model in substantially the same manner described above.

Type: Grant

Filed: June 12, 2019

Date of Patent: November 29, 2022

Assignee: International Business Machines Corporation

Inventors: Pathirage D. S. U. Perera, Eitan D. Farchi, Orna Raz, Ramani Routray, Sheng Hua Bao, Marcel Zalmanovici
Classifier confidence as a means for identifying data drift

Patent number: 11481667

Abstract: Embodiments of the present systems and methods may provide improved machine learning performance even though data drift has occurred. For example, a method may comprise providing a machine learning model in a computer system, operating the machine learning model using a first dataset to obtain results of the first dataset, operating the machine learning model using a second dataset to obtain results of the second dataset, performing statistical testing on a confidence distribution of results of the first dataset and of results of the second dataset to determine a difference in a result confidence distribution between the first dataset and of the second dataset, and determining whether data included in the second dataset has data drift relative to the first dataset based on the difference in a result confidence distribution between the first dataset and of the second dataset.

Type: Grant

Filed: January 24, 2019

Date of Patent: October 25, 2022

Assignee: International Business Machines Corporation

Inventors: Orna Raz, Marcel Zalmanovici, Aviad Zlotnick
GENERATING DATA SLICES FOR MACHINE LEARNING VALIDATION

Publication number: 20220172124

Abstract: A system and method for generating data slices for validating a classifier and validating the classifier. The classifier is trained using a training data set to train the underlying machine learning algorithm. Data is passed through the trained classifier to obtain results. The results are scored to determine the likelihood that the classifier correctly classified the data. Features are identified in the data set that can be used to validate the classifier. Based on the identified features at least one data slice in the data set is identified. The classifier is validated using the at least one data slice.

Type: Application

Filed: December 2, 2020

Publication date: June 2, 2022

Inventors: Orna Raz, Marcel Zalmanovici, Eitan Daniel Farchi, Raviv Gal, Avi Ziv
Mitigating governance impact on machine learning

Patent number: 11314892

Abstract: A method, a computerized apparatus and a computer program product for mitigating governance and regulation implications on machine learning. A governance impact assessment is generated for a partial data set generated by applying a data governance enforcement on a data set of instances comprising valuations of a feature vector. The partial data set comprises partial instances each comprising partial feature vectors. The governance impact assessment comprises information about data excluded from the data set. A machine learning model trained based on the partial data set and configured to provide an estimated prediction for a partial instance is obtained. A set of core features is determined. A bias introduced by the data governance is identified based on a core feature being affected by the data governance. In response to identifying a bias, an anti-bias procedure is applied on the machine learning model, whereby mitigating the bias introduced by the data governance.

Type: Grant

Filed: June 26, 2019

Date of Patent: April 26, 2022

Assignee: International Business Machines Corporation

Inventors: Sima Nadler, Orna Raz, Marcel Zalmanovici
PERFORMANCE MEASUREMENT OF PREDICTORS

Publication number: 20210319354

Abstract: A method, apparatus and a product for determining a performance measurement of predictors. The method comprises obtaining a dataset comprising data instances. Each data instance is associated with a label; obtaining a predictor. The predictor is configured to provide a prediction of a label for a data instance; determining a plurality of data slices that are subsets of the dataset. computing, for each data slice in the plurality of data slices and based on an application of the predictor on each data instance that is mapped to the data slice, a performance measurement that is indicative of a successful label prediction for a data instance comprised by the data slice, whereby obtaining a plurality of performance measurements; based on the plurality of performance measurements, computing a performance measurement of the predictor over the dataset; if the performance measurement of the predictor is below a threshold, performing a mitigating action.

Type: Application

Filed: April 10, 2020

Publication date: October 14, 2021

Inventors: ORNA RAZ, Eitan Farchi, Marcel Zalmanovici
METHOD AND APPARATUS FOR ENHANCING EFFECTIVITY OF MACHINE LEARNING SOLUTIONS

Publication number: 20210125080

Abstract: A method, system and computer program product, the method comprising: creating a model representing underperforming cases; from a case collection having a total performance, and which comprises for each of a multiplicity of records: a value for each feature from a collection of features, a ground truth label and a prediction of a machine learning (ML) engine, obtaining one or more features; dividing the records into groups, based on values of the features in each record; for one group of the groups, calculating a performance parameter of the ML engine over the portion of the records associated with the group; subject to the performance parameter of the group being below the total performance in at least a predetermined threshold: determining a characteristic for the group; adding the characteristic of the group to the model; and providing the model to a user, thus indicating under-performing parts of the test collection.

Type: Application

Filed: October 24, 2019

Publication date: April 29, 2021

Inventors: ORNA RAZ, Marcel Zalmanovici, Aviad Zlotnick
MITIGATING GOVERNANCE IMPACT ON MACHINE LEARNING

Publication number: 20200410129

Abstract: A method, a computerized apparatus and a computer program product for mitigating governance and regulation implications on machine learning. A governance impact assessment is generated for a partial data set generated by applying a data governance enforcement on a data set of instances comprising valuations of a feature vector. The partial data set comprises partial instances each comprising partial feature vectors. The governance impact assessment comprises information about data excluded from the data set. A machine learning model trained based on the partial data set and configured to provide an estimated prediction for a partial instance is obtained. A set of core features is determined. A bias introduced by the data governance is identified based on a core feature being affected by the data governance. In response to identifying a bias, an anti-bias procedure is applied on the machine learning model, whereby mitigating the bias introduced by the data governance.

Type: Application

Filed: June 26, 2019

Publication date: December 31, 2020

Inventors: Sima Nadler, Orna Raz, Marcel Zalmanovici
GENERATING TRAINING SETS TO TRAIN MACHINE LEARNING MODELS

Publication number: 20200394461

Abstract: A computer system trains a machine learning model. A vector representation is generated for each document in a collection of documents. The documents are clustered based on the vector representations of the documents to produce a plurality of clusters. A training set is produced by selecting one or more documents from each cluster, wherein the selected documents represent a sample of the collection of documents to train the machine learning model. The machine learning model is trained by applying the training set to the machine learning model. Embodiments of the present invention further include a method and program product for training a machine learning model in substantially the same manner described above.

Type: Application

Filed: June 12, 2019

Publication date: December 17, 2020

Inventors: Pathirage D. S. U. Perera, Eitan D. Farchi, Orna Raz, Ramani Routray, Sheng Hua Bao, Marcel Zalmanovici
IDENTIFYING DATA DRIFTS

Publication number: 20200342310

Abstract: A method, apparatus and product for identifying data drifts. The method comprising: obtaining a seen dataset, wherein the seen dataset comprises seen instances, each of which comprising feature values in a feature space; determining a first measurement of a statistical metric of the seen dataset; obtaining an unseen dataset, wherein the unseen dataset comprises unseen instances, each of which comprising features values in the feature space; determining a second measurement of the statistical metric of the unseen dataset; identifying a data drift in the unseen dataset with respect to the seen dataset based on the first and second measurements of the statistical metric; and performing a responsive action based on the identification of the data drift.

Type: Application

Filed: April 28, 2019

Publication date: October 29, 2020

Inventors: Eitan Farchi, Orna Raz, Marcel Zalmanovici, Aviad Zlotnick
IDENTIFYING DATA DRIFTS THAT HAVE AN ADVERSE EFFECT ON PREDICTORS

Publication number: 20200342260

Abstract: A method, apparatus and product for identifying data drifts.

Type: Application

Filed: April 28, 2019

Publication date: October 29, 2020

Inventors: Eitan Farchi, Orna Raz, Marcel Zalmanovici
CLASSIFIER CONFIDENCE AS A MEANS FOR IDENTIFYING DATA DRIFT

Publication number: 20200242505

Abstract: Embodiments of the present systems and methods may provide improved machine learning performance even though data drift has occurred. For example, a method may comprise providing a machine learning model in a computer system, operating the machine learning model using a first dataset to obtain results of the first dataset, operating the machine learning model using a second dataset to obtain results of the second dataset, performing statistical testing on a confidence distribution of results of the first dataset and of results of the second dataset to determine a difference in a result confidence distribution between the first dataset and of the second dataset, and determining whether data included in the second dataset has data drift relative to the first dataset based on the difference in a result confidence distribution between the first dataset and of the second dataset.

Type: Application

Filed: January 24, 2019

Publication date: July 30, 2020

Inventors: Orna Raz, Marcel Zalmanovici, Aviad Zlotnick

1 2 next