Patents by Inventor Abigail Goldsteen

Abigail Goldsteen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Discovery of personal data in machine learning models

Patent number: 11893132

Abstract: A method, computer system, and a computer program product for personal data discovery is provided. The present invention may include determining at least one feature used to train a target machine learning (ML) model. The present invention may also include mapping the determined at least one feature to at least one location of a data store including at least one personal data associated with the determined at least one feature. The present invention may further include retrieving a data record of the at least one personal data associated with the mapped at least one feature from the at least one location of the data store. The present invention may also include determining that the target ML model includes a trace of the retrieved data record. The present invention may further include marking the target ML model as containing the at least one personal data.

Type: Grant

Filed: February 23, 2021

Date of Patent: February 6, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Abigail Goldsteen, Micha Gideon Moffie, Ariel Farkash
Training anonymized machine learning models via generalized data generated using received trained machine learning models

Patent number: 11841977

Abstract: An example system includes a processor to receive training data and predictions on the training data of a trained machine learning model to be anonymized. The processor is to generate generalized data from training data based on the predictions of the trained machine learning model on the training data. The processor is to train an anonymized machine learning model using the generalized data.

Type: Grant

Filed: February 11, 2021

Date of Patent: December 12, 2023

Assignee: International Business Machines Corporation

Inventors: Abigail Goldsteen, Ariel Farkash, Micha Gideon Moffie, Gilad Ezov, Ron Shmelkin
VERIFICATION OF DATA REMOVAL FROM MACHINE LEARNING MODELS

Publication number: 20220309381

Abstract: An example system includes a processor to receive one or more target data samples from a training set used to train a machine learning model, a training data sample including a different data sample from the training set, and a forgotten model including the machine learning model with a forgetting mechanism applied on the target data sample. The processor can calculate a model uncertainty or a model similarity based on the forgotten model, the target data sample, and the training data sample. The processor can verify a removal of the target data sample from the forgotten model based on the model similarity or the model uncertainty.

Type: Application

Filed: March 23, 2021

Publication date: September 29, 2022

Inventors: Abigail GOLDSTEEN, Ron SHMELKIN
DATA MARK CLASSIFICATION TO VERIFY DATA REMOVAL

Publication number: 20220300837

Abstract: A method, computer system, and a computer program product for testing a data removal are provided. Data elements are marked with a respective mark per represented entity. The marked data elements, with labels indicating the respective marks, are input into a machine learning model to form a trained machine learning model. The trained machine learning model is configured to perform a dual task that includes a main task and a secondary task that includes a classification based on the labels. A forgetting mechanism is applied to the trained machine learning model to remove a data element including a test mark of the marked data elements. A test data element marked with the test mark is input into the revised machine learning model. The classification of the secondary task of an output of the revised machine learning model is determined for the input test data element.

Type: Application

Filed: March 22, 2021

Publication date: September 22, 2022

Inventors: RON SHMELKIN, Abigail Goldsteen, GILAD EZOV, ARIEL FARKASH
FORGETTING DATA SAMPLES FROM PRETRAINED NEURAL NETWORK MODELS

Publication number: 20220300822

Abstract: A method for forgetting data samples from a pretrained neural network (NN) model is provided. The method includes training an adversarial model to classify training data samples as members of the NN model and test data samples as non-members of the NN model. The method includes performing the following iteratively until the NN model has forgotten a specified threshold of data samples to be forgotten: (1) classifying the data samples as members or non-members using the trained adversarial model; (2) for the member data samples, determining a subset that includes data samples to be forgotten; (3) labeling the data samples within the subset as non-members and updating the NN model based on weight update techniques that cause the NN model to forget the data samples; (4) retraining the NN model without the data samples that have been forgotten; and (5) retraining the adversarial model for the next iteration.

Type: Application

Filed: March 17, 2021

Publication date: September 22, 2022

Inventors: Ron SHMELKIN, Abigail GOLDSTEEN, Ariel FARKASH
MEMBERSHIP LEAKAGE QUANTIFICATION TO VERIFY DATA REMOVAL

Publication number: 20220284341

Abstract: A method, computer system, and a computer program product for testing a data removal from a trained machine learning model trained with a training data set are provided. A new machine learning model is trained by using an altered data set that includes training data from the training data set. The altered data set is without removal data. A first forgetting mechanism is applied to the trained machine learning model to form a first revised machine learning model. The applying includes removing the removal data from the trained machine learning model. A first membership leakage quantification on the first revised machine learning model is performed to quantify a first membership leakage of the removal data and that uses the new machine learning model for comparison. A first leakage score is determined from the first membership leakage quantification to test the forgetting mechanism.

Type: Application

Filed: March 3, 2021

Publication date: September 8, 2022

Inventors: Abigail Goldsteen, RON SHMELKIN
DISCOVERY OF PERSONAL DATA IN MACHINE LEARNING MODELS

Publication number: 20220269814

Abstract: A method, computer system, and a computer program product for personal data discovery is provided. The present invention may include determining at least one feature used to train a target machine learning (ML) model. The present invention may also include mapping the determined at least one feature to at least one location of a data store including at least one personal data associated with the determined at least one feature. The present invention may further include retrieving a data record of the at least one personal data associated with the mapped at least one feature from the at least one location of the data store. The present invention may also include determining that the target ML model includes a trace of the retrieved data record. The present invention may further include marking the target ML model as containing the at least one personal data.

Type: Application

Filed: February 23, 2021

Publication date: August 25, 2022

Inventors: Abigail Goldsteen, Micha Gideon Moffie, ARIEL FARKASH
TRAINING ANONYMIZED MACHINE LEARNING MODELS VIA GENERALIZED DATA GENERATED USING RECEIVED TRAINED MACHINE LEARNING MODELS

Publication number: 20220253554

Abstract: An example system includes a processor to receive training data and predictions on the training data of a trained machine learning model to be anonymized. The processor is to generate generalized data from training data based on the predictions of the trained machine learning model on the training data. The processor is to train an anonymized machine learning model using the generalized data.

Type: Application

Filed: February 11, 2021

Publication date: August 11, 2022

Inventors: Abigail GOLDSTEEN, Ariel FARKASH, Micha Gideon MOFFIE, Gilad EZOV, Ron SHMELKIN
Data generalization for predictive models

Patent number: 11281728

Abstract: A method, apparatus and a product for data generalization for predictive models. The method comprising: based on a labeled dataset, determining a plurality of buckets, each of which has an associated label; determining a plurality of clusters, grouping similar instances in the same bucket; based on the plurality of clusters, determining an alternative set of features comprising a set of generalized features, wherein each generalized feature corresponds to a cluster of the plurality of clusters, wherein a generalized feature that corresponds to a cluster is indicative of the instance being mapped to the corresponding cluster; obtaining a second instance; determining a generalized second instance that comprises a valuation of the alternative set of features for the second instance; and based on the generalized second instance, determining a label for the second instance.

Type: Grant

Filed: August 6, 2019

Date of Patent: March 22, 2022

Assignee: International Business Machines Corporation

Inventors: Gilad Ezov, Ariel Farkash, Abigail Goldsteen, Ron Shmelkin, Micha Gideon Moffie
Verifying purpose of data usage at sub-application granularity

Patent number: 11240044

Abstract: Embodiments of the present systems and methods may provide techniques for verifying the correct application purpose for applications that serve multiple purposes and to determine the correct purpose for each requested data access. For example, in an embodiment, a method for controlling application access to data implemented in a computer comprising a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor may comprise: receiving an application comprising a plurality of application parts, each application part associated with a declared data access purpose and generating a cryptographic certificate for each application part to be certified by determining whether a declared data access purpose for each application part to be certified is correct and the only data access purpose for that part, wherein the declared purpose is included in purpose information associated with each application part to be certified.

Type: Grant

Filed: November 22, 2018

Date of Patent: February 1, 2022

Assignee: International Business Machines Corporation

Inventors: Ariel Farkash, Abigail Goldsteen, Micha Gideon Moffie
Data protection using functional encryption

Patent number: 11182491

Abstract: A method of limiting data usage for certified purposes by using functional encryption, comprising: receiving from a software publisher an application code and declared privacy information, the declared privacy information specifies at least one declared usage for at least one data type; analyzing the application's usage of data collected by the application, to identify an actual usage of the at least one data type by a function; identifying when the actual usage is compliant with the at least one declared usage according to the analysis; in response to the identification, creating a pair of a public key and a master private key; creating a function private key for the function using the master private key; and sending the function private key to the software publisher to be used for operating the function on data which is encrypted using the public key.

Type: Grant

Filed: February 4, 2020

Date of Patent: November 23, 2021

Assignee: International Business Machines Corporation

Inventors: Abigail Goldsteen, Ron Shmelkin, Gilad Ezov, Muhammad Barham
DATA PROTECTION USING FUNCTIONAL ENCRYPTION

Publication number: 20210240840

Abstract: A method of limiting data usage for certified purposes by using functional encryption, comprising: receiving from a software publisher an application code and declared privacy information, the declared privacy information specifies at least one declared usage for at least one data type; analyzing the application's usage of data collected by the application, to identify an actual usage of the at least one data type by a function; identifying when the actual usage is compliant with the at least one declared usage according to the analysis; in response to the identification, creating a pair of a public key and a master private key; creating a function private key for the function using the master private key; and sending the function private key to the software publisher to be used for operating the function on data which is encrypted using the public key.

Type: Application

Filed: February 4, 2020

Publication date: August 5, 2021

Inventors: ABIGAIL GOLDSTEEN, RON SHMELKIN, GILAD EZOV, MUHAMMAD BARHAM
DATA GENERALIZATION FOR PREDICTIVE MODELS

Publication number: 20210042356

Abstract: A method, apparatus and a product for data generalization for predictive models. The method comprising: based on a labeled dataset, determining a plurality of buckets, each of which has an associated label; determining a plurality of clusters, grouping similar instances in the same bucket; based on the plurality of clusters, determining an alternative set of features comprising a set of generalized features, wherein each generalized feature corresponds to a cluster of the plurality of clusters, wherein a generalized feature that corresponds to a cluster is indicative of the instance being mapped to the corresponding cluster; obtaining a second instance; determining a generalized second instance that comprises a valuation of the alternative set of features for the second instance; and based on the generalized second instance, determining a label for the second instance.

Type: Application

Filed: August 6, 2019

Publication date: February 11, 2021

Inventors: GILAD EZOV, ARIEL FARKASH, Abigail Goldsteen, RON SHMELKIN, Micha Gideon Moffie
DATA GENERALIZATION FOR PREDICTIVE MODELS

Publication number: 20210042629

Abstract: A method, apparatus and a product for data generalization for predictive models. The method comprising: obtaining a training dataset that comprises a plurality of training instances and predicted labels thereof, wherein each training instance is a valuation of a set of features, wherein the set of features comprises a feature having a domain, wherein the predicted label of each training instance is a label predicted thereto by a predictive model; training an auxiliary model using the training dataset; based on the auxiliary model, determining an alternative set of features that is a generalization of the set of features, wherein the alternative set of features comprises a generalized feature having a generalized domain, wherein each value in the generalized domain corresponds to one or more values in the domain; obtaining a generalized instance having a valuation of the alternative set of features; and determining a label for the generalized instance.

Type: Application

Filed: August 6, 2019

Publication date: February 11, 2021

Inventors: GILAD EZOV, ARIEL FARKASH, Abigail Goldsteen, RON SHMELKIN, Micha Gideon Moffie
Method for watermarking through format preserving encryption

Patent number: 10831869

Abstract: Embodiments of the present systems and methods may provide data watermarking without reliance on error-tolerant fields, thereby providing for the incorporation of watermarks in data that was not considered suitable for watermarking. For example, in an embodiment, a computer-implemented method for watermarking data may comprise inserting watermark data into a field that requires format-preserving encryption.

Type: Grant

Filed: July 2, 2018

Date of Patent: November 10, 2020

Assignee: International Business Machines Corporation

Inventors: Abigail Goldsteen, Lev Greenberg, Ariel Farkash, Boris Rozenberg, Omri Soceanu
PRIVACY VULNERABILITY SCANNING OF SOFTWARE APPLICATIONS

Publication number: 20200320202

Abstract: Conducting a privacy vulnerability assessment of a software application that comprises program code, by performing at least one of: (i) evaluating the program code to identify code segments presenting a potential dissemination of specified data to an unauthorized destination, (ii) detecting one or more execution paths in the software application which use the specified data for an unauthorized purpose, and (iii) analyzing the content of data flows from the software application to detect the specified data in the data flows. Then, generating one or more vulnerability summaries, based, at least in part, on the results of the evaluating, the detecting, and the analyzing.

Type: Application

Filed: April 4, 2019

Publication date: October 8, 2020

Inventors: ARIEL FARKASH, Abigail Goldsteen, RON SHMELKIN
VERIFYING PURPOSE OF DATA USAGE AT SUB-APPLICATION GRANULARITY

Publication number: 20200169421

Abstract: Embodiments of the present systems and methods may provide techniques for verifying the correct application purpose for applications that serve multiple purposes and to determine the correct purpose for each requested data access. For example, in an embodiment, a method for controlling application access to data implemented in a computer comprising a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor may comprise: receiving an application comprising a plurality of application parts, each application part associated with a declared data access purpose and generating a cryptographic certificate for each application part to be certified by determining whether a declared data access purpose for each application part to be certified is correct and the only data access purpose for that part, wherein the declared purpose is included in purpose information associated with each application part to be certified.

Type: Application

Filed: November 22, 2018

Publication date: May 28, 2020

Inventors: ARIEL FARKASH, Abigail Goldsteen, Micha Gideon Moffie
Digital certificate for verifying application purpose of data usage

Patent number: 10616206

Abstract: A method of creating an application purpose certificate, comprising: receiving from a software publisher an application code and declared privacy information, the declared privacy information includes at least one allowed usage purpose for each of a plurality of data types; analyzing the application's usage of data of each of the plurality of data types; verifying the usage is compliant with the least one allowed usage purpose according to the analysis; creating an encrypted digital purpose certificate, the digital purpose certificate is unique for the application code; and sending the digital purpose certificate to the software publisher to be bundled with the application code and a publisher authentication certificate.

Type: Grant

Filed: September 27, 2016

Date of Patent: April 7, 2020

Assignee: International Business Machines Corporation

Inventors: Sima Nadler, Abigail Goldsteen
METHOD FOR WATERMARKING THROUGH FORMAT PRESERVING ENCRYPTION

Publication number: 20200004935

Abstract: Embodiments of the present systems and methods may provide data watermarking without reliance on error-tolerant fields, thereby providing for the incorporation of watermarks in data that was not considered suitable for watermarking. For example, in an embodiment, a computer-implemented method for watermarking data may comprise inserting watermark data into a field that requires format-preserving encryption.

Type: Application

Filed: July 2, 2018

Publication date: January 2, 2020

Inventors: ABIGAIL GOLDSTEEN, Lev Greenberg, Ariel Farkash, Boris Rozenberg, Omri Soceanu
Data security system with identifiable format-preserving encryption

Patent number: 10148423

Abstract: A data security method including creating a token-including plaintext by including a predefined token into a plaintext, generating a cyphertext by encrypting the token-including plaintext using format-preserving encryption, generating a decrypted cyphertext by decrypting an input text, determining whether the decrypted cyphertext includes a first predefined token, if the decrypted cyphertext includes the first predefined token, recreating the plaintext by removing the first predefined token from the decrypted cyphertext, and if the decrypted cyphertext does not include the first predefined token, using the input text as the plaintext.

Type: Grant

Filed: July 20, 2015

Date of Patent: December 4, 2018

Assignee: International Business Machines Corporation

Inventors: Ariel Farkash, Abigail Goldsteen, Micha Moffie

1 2 next