Patents by Inventor Stephen Korte

Stephen Korte has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DETERMINING JOURNALIST RISK OF A DATASET USING POPULATION EQUIVALENCE CLASS DISTRIBUTION ESTIMATION

Publication number: 20230307104

Abstract: Methods and systems to de-identify a longitudinal dataset of personal records based on journalistic risk computed from a sample set of the personal records, including determining a similarity distribution of the sample set based on quasi-identifiers of the respective personal records, converting the similarity distribution of the sample set to an equivalence class distribution, and computing journalistic risk based on the equivalence distribution. In an embodiment, multiple similarity measures are determined for a personal record based on comparisons with multiple combinations of other personal records of the sample set, and an average of the multiple similarity measures is rounded. In an embodiment, similarity measures are determined for a subset of the sample set and, for each similarity measure, the number of records having the similarity measure is projected to the subset of personal records. Journalistic risk may be computed for multiple types of attacks.

Type: Application

Filed: May 26, 2023

Publication date: September 28, 2023

Inventors: Stephen Korte, Luk Arbuckle, Andrew Baker, Khaled El Emam, Sean Rose
Determining journalist risk of a dataset using population equivalence class distribution estimation

Patent number: 11664098

Abstract: Methods and systems to de-identify a longitudinal dataset of personal records based on journalistic risk computed from a sample set of the personal records, including determining a similarity distribution of the sample set based on quasi-identifiers of the respective personal records, converting the similarity distribution of the sample set to an equivalence class distribution, and computing journalistic risk based on the equivalence distribution. In an embodiment, multiple similarity measures are determined for a personal record based on comparisons with multiple combinations of other personal records of the sample set, and an average of the multiple similarity measures is rounded. In an embodiment, similarity measures are determined for a subset of the sample set and, for each similarity measure, the number of records having the similarity measure is projected to the subset of personal records. Journalistic risk may be computed for multiple types of attacks.

Type: Grant

Filed: December 23, 2021

Date of Patent: May 30, 2023

Assignee: PRIVACY ANALYTICS INC.

Inventors: Stephen Korte, Luk Arbuckle, Andrew Baker, Khaled El Emam, Sean Rose
DETERMINING JOURNALIST RISK OF A DATASET USING POPULATION EQUIVALENCE CLASS DISTRIBUTION ESTIMATION

Publication number: 20220115101

Abstract: Methods and systems to de-identify a longitudinal dataset of personal records based on journalistic risk computed from a sample set of the personal records, including determining a similarity distribution of the sample set based on quasi-identifiers of the respective personal records, converting the similarity distribution of the sample set to an equivalence class distribution, and computing journalistic risk based on the equivalence distribution. In an embodiment, multiple similarity measures are determined for a personal record based on comparisons with multiple combinations of other personal records of the sample set, and an average of the multiple similarity measures is rounded. In an embodiment, similarity measures are determined for a subset of the sample set and, for each similarity measure, the number of records having the similarity measure is projected to the subset of personal records. Journalistic risk may be computed for multiple types of attacks.

Type: Application

Filed: December 23, 2021

Publication date: April 14, 2022

Inventors: Stephen Korte, Luk Arbuckle, Andrew Baker, Khaled El Emam, Sean Rose
Determining journalist risk of a dataset using population equivalence class distribution estimation

Patent number: 11238960

Abstract: A system, method and computer readable memory for determining journalist risk of a dataset using population equivalence class distribution estimation. The dataset may be a cross-sectional data set or a longitudinal dataset. The determine risk of identification can be determined and used in de-identification process of the dataset.

Type: Grant

Filed: November 27, 2015

Date of Patent: February 1, 2022

Assignee: Privacy Analytics Inc.

Inventors: Stephen Korte, Luk Arbuckle, Andrew Baker, Khaled El Emam, Sean Rose
Re-identification risk measurement estimation of a dataset

Patent number: 10685138

Abstract: There is provided a system and method executed by a processor for estimating re-identification risk of a single individual in a dataset. The individual, subject or patient is described by a data subject profile such as a record in the dataset. A population distribution is retrieved from a storage device, the population distribution is determined by one or more quasi-identifying fields identified in the data subject profile. An information score is then assigned to each quasi-identifying (QI) value of the one or more quasi-identifying fields associated with the data subject profile. The assigned information scores of the quasi-identifying values for the data subject profile are aggregated into an aggregated information value. An anonymity value is then calculated from the aggregated information value and a size of a population associated with the dataset. A re-identification metric for the individual from the anonymity value is then calculated.

Type: Grant

Filed: April 1, 2016

Date of Patent: June 16, 2020

Assignee: PRIVACY ANALYTICS INC.

Inventors: Martin Scaiano, Stephen Korte, Andrew Baker, Geoffrey Green, Khaled El Emam, Luk Arbuckle
Smart suppression using re-identification risk measurement

Patent number: 10423803

Abstract: System and method to produce an anonymized cohort, members of the cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits to request in an anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and calculating an anonymity histogram of the dataset. For each patient record within the dataset, the method anonymizes the dataset by calculating using a threshold selector whether a predetermined patient profile within the dataset should be perturbed, calculating using a value selector whether a value within the indicated patient profile should be perturbed, and suppressing an indicated value within the indicated patient profile. The anonymized dataset then is returned.

Type: Grant

Filed: December 23, 2016

Date of Patent: September 24, 2019

Assignee: PRIVACY ANALYTICS INC.

Inventors: Martin Scaiano, Andrew Baker, Stephen Korte
Re-identification risk prediction

Patent number: 10380381

Abstract: System and method to predict risk of re-identification of a cohort if the cohort is anonymized using a de-identification strategy. An input anonymity histogram and de-identification strategy is used to predict the anonymity histogram that would result from applying the de-identification strategy to the dataset. System embodiments compute a risk of re-identification from the predicted anonymity histogram.

Type: Grant

Filed: January 9, 2017

Date of Patent: August 13, 2019

Assignee: PRIVACY ANALYTICS INC.

Inventors: Martin Scaiano, Andrew Baker, Stephen Korte
Asymmetric journalist risk model of data re-identification

Patent number: 10242213

Abstract: System and method to produce an anonymized cohort, members of the cohort having less than a predetermined risk of re-identification. The system includes a user-facing communication interface to receive an anonymized cohort request comprising traits to include in members of the cohort; a data source-facing communication channel to query a data source, to find anonymized records that possess at least some of the requested traits; and a processor programmed to carry out the instructions of: forming a dataset from at least some of the anonymized records; calculating a risk of re-identification of the anonymized records in the dataset based upon the data query; perturbing anonymized records in the dataset that exceed a predetermined risk of re-identification, until the risk of re-identification is not greater than the pre-determined threshold, to produce the anonymized cohort; and providing, via a user-facing communication channel, the anonymized cohort.

Type: Grant

Filed: September 21, 2016

Date of Patent: March 26, 2019

Assignee: PRIVACY ANALYTICS INC.

Inventors: Martin Scaiano, Andrew Baker, Stephen Korte, Khaled El Emam
Method of re-identification risk measurement and suppression on a longitudinal dataset

Patent number: 9990515

Abstract: In longitudinal datasets, it is usually unrealistic that an adversary would know the value of every quasi-identifier. De-identifying a dataset under this assumption results in high levels of generalization and suppression as every patient is unique. Adversary power gives an upper bound on the number of values an adversary knows about a patient. Considering all subsets of quasi-identifiers with the size of the adversary power is computationally infeasible. A method is provided to assess re-identification risk by determining a representative risk which can be used as a proxy for the overall risk measurement and enable suppression of identifiable quasi-identifiers.

Type: Grant

Filed: November 30, 2015

Date of Patent: June 5, 2018

Assignee: PRIVACY ANALYTICS INC.

Inventors: Andrew Baker, Luk Arbuckle, Khaled El Emam, Ben Eze, Stephen Korte, Sean Rose, Cristina Ilie
RE-IDENTIFICATION RISK MEASUREMENT ESTIMATION OF A DATASET

Publication number: 20180114037

Abstract: There is provided a system and method executed by a processor for estimating re-identification risk of a single individual in a dataset. The individual, subject or patient is described by a data subject profile such as a record in the dataset. A population distribution is retrieved from a storage device, the population distribution is determined by one or more quasi-identifying fields identified in the data subject profile. An information score is then assigned to each quasi-identifying (QI) value of the one or more quasi-identifying fields associated with the data subject profile. The assigned information scores of the quasi-identifying values for the data subject profile are aggregated into an aggregated information value. An anonymity value is then calculated from the aggregated information value and a size of a population associated with the dataset. A re-identification metric for the individual from the anonymity value is then calculated.

Type: Application

Filed: April 1, 2016

Publication date: April 26, 2018

Inventors: Martin SCAIANO, Stephen KORTE, Andrew BAKER, Geoffrey GREEN, Khaled EL EMAM, Luk ARBUCKLE
RE-IDENTIFICATION RISK PREDICTION

Publication number: 20170124351

Abstract: System and method to predict risk of re-identification of a cohort if the cohort is anonymized using a de-identification strategy. An input anonymity histogram and de-identification strategy is used to predict the anonymity histogram that would result from applying the de-identification strategy to the dataset. System embodiments compute a risk of re-identification from the predicted anonymity histogram.

Type: Application

Filed: January 9, 2017

Publication date: May 4, 2017

Inventors: Martin Scaiano, Andrew Baker, Stephen Korte
SMART SUPPRESSION USING RE-IDENTIFICATION RISK MEASUREMENT

Publication number: 20170103232

Abstract: System and method to produce an anonymized cohort, members of the cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits to request in an anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and calculating an anonymity histogram of the dataset. For each patient record within the dataset, the method anonymizes the dataset by calculating using a threshold selector whether a predetermined patient profile within the dataset should be perturbed, calculating using a value selector whether a value within the indicated patient profile should be perturbed, and suppressing an indicated value within the indicated patient profile. The anonymized dataset then is returned.

Type: Application

Filed: December 23, 2016

Publication date: April 13, 2017

Inventors: Martin Scaiano, Andrew Baker, Stephen Korte
ASYMMETRIC JOURNALIST RISK MODEL OF DATA RE-IDENTIFICATION

Publication number: 20170083719

Abstract: System and method to produce an anonymized cohort, members of the cohort having less than a predetermined risk of re-identification. The system includes a user-facing communication interface to receive an anonymized cohort request comprising traits to include in members of the cohort; a data source-facing communication channel to query a data source, to find anonymized records that possess at least some of the requested traits; and a processor programmed to carry out the instructions of: forming a dataset from at least some of the anonymized records; calculating a risk of re-identification of the anonymized records in the dataset based upon the data query; perturbing anonymized records in the dataset that exceed a predetermined risk of re-identification, until the risk of re-identification is not greater than the pre-determined threshold, to produce the anonymized cohort; and providing, via a user-facing communication channel, the anonymized cohort.

Type: Application

Filed: September 21, 2016

Publication date: March 23, 2017

Inventors: Martin Scaiano, Andrew Baker, Stephen Korte, Khaled El Emam
Determining Journalist Risk of a Dataset Using Population Equivalence Class Distribution Estimation

Publication number: 20160155061

Abstract: A system, method and computer readable memory for determining journalist risk of a dataset using population equivalence class distribution estimation. The dataset may be a cross-sectional data set or a longitudinal dataset. The determine risk of identification can be determined and used in de-identification process of the dataset.

Type: Application

Filed: November 27, 2015

Publication date: June 2, 2016

Inventors: Stephen Korte, Luk Arbuckle, Andrew Baker, Khaled El Emam, Sean Rose
METHOD OF RE-IDENTIFICATION RISK MEASUREMENT AND SUPPRESSION ON A LONGITUDINAL DATASET

Publication number: 20160154978

Abstract: In longitudinal datasets, it is usually unrealistic that an adversary would know the value of every quasi-identifier. De-identifying a dataset under this assumption results in high levels of generalization and suppression as every patient is unique. Adversary power gives an upper bound on the number of values an adversary knows about a patient. Considering all subsets of quasi-identifiers with the size of the adversary power is computationally infeasible. A method is provided to assess re-identification risk by determining a representative risk which can be used as a proxy for the overall risk measurement and enable suppression of identifiable quasi-identifiers.

Type: Application

Filed: November 30, 2015

Publication date: June 2, 2016

Inventors: Andrew Baker, Luk Arbuckle, Khaled El Emam, Ben Eze, Stephen Korte, Sean Rose, Cristina Ilie