Patents Assigned to PRIVACY ANALYTICS INC.

Systems and methods of data transformation for data pooling

Patent number: 12651090

Abstract: A data anonymization pipeline system for managing holding and pooling data is disclosed. The data anonymization pipeline system transforms personal data at a source and then stores the transformed data in a safe environment. Furthermore, a re-identification risk assessment is performed before providing access to a user to fetch the de-identified data for secondary purposes.

Type: Grant

Filed: December 5, 2024

Date of Patent: June 9, 2026

Assignee: PRIVACY ANALYTICS INC.

Inventors: Lon Michel Luk Arbuckle, Jordan Elijah Collins, Khaldoun Zine El Abidine, Khaled El Emam
Machine learning for data anonymization

Patent number: 12455984

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for anonymizing unstructured data. In some implementations, a server can receive unstructured data. The server can automatically detect attributes in the unstructured data using a trained machine-learning model and can determine an amount of undetected attributes and detected attributes in the unstructured data. The server can simulate additional attributes for the unstructured data according to the amount of undetected attributes. The server can analyze a risk of disclosure in the unstructured data using the detected attributes and the simulated additional attributes. The server can modify the detected attributes according to the analyzed risk of disclosure and replace the detected attributes with the modified detected attributes in the unstructured data.

Type: Grant

Filed: April 21, 2023

Date of Patent: October 28, 2025

Assignee: Privacy Analytics Inc.

Inventors: Grant Howard George Middleton, Brian Joseph Rasquinha
System and method for regulatory intelligence evaluation, intermediary mapping and thresholding to generate insights, and accelerating conformance through continuous monitoring

Patent number: 12361043

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system of one or more computers located in one or more locations. The system includes: obtaining input data from one or more regulatory resources; analyzing, using a first set of models, the obtained input data to determine insights related to industry regulations; further analyzing, using a second set of models, overlap between the input data and a control matrix, wherein the control matrix summarizes existing regulations; based on the overlap, determining a score to represent a degree of the overlap. Based on the degree of the overlap, the system provides summary of recommended next steps.

Type: Grant

Filed: December 30, 2024

Date of Patent: July 15, 2025

Assignee: Privacy Analytics Inc.

Inventors: Lon Michel Luk Arbuckle, Devyani Priyambada Biswal, Muhammad Oneeb Rehman Mian
System and method for intermediary mapping and de-identification of non-standard datasets

Patent number: 12326880

Abstract: Disclosed is a method for an intermediary mapping an de-identification comprising steps of retrieving datasets and meta data from a data source; selecting a target standard; mapping the retrieved datasets and the metadata to the target standard, wherein the datasets and the metadata are mapped to the target standard using one of, a schema mapping, a variable mapping, or a combination thereof; infer one or more of, variable classifications, variable connections, groupings, disclosure risk settings, and de-identification settings using the dataset mapping and metadata; perform a de-identification propagation using the mapped datasets, the mapped metadata, the inferred variable classifications, the inferred variable connections, the inferred groupings, the inferred disclosure risk settings, the inferred de-identification settings, or a combination thereof.

Type: Grant

Filed: October 10, 2023

Date of Patent: June 10, 2025

Assignee: Privacy Analytics Inc.

Inventors: Muhammad Oneeb Rehman Mian, David Nicholas Maurice Di Valentino, George Wesley Bradley
Mixed noise mechanism for data anonymization

Patent number: 12321494

Abstract: A method includes collecting one or more datasets of information. The method also includes separating the one or more datasets into respective blocks of data. The method further includes determining whether the information within the blocks of data are consistent, or if one or more violations occur within the blocks of data. In addition, the method includes applying a first noise function based on the determination that the information within the blocks of data are consistent, wherein the first noise function is applied when a loss of privacy and/or confidentiality exceeds a threshold. The method also includes displaying the blocks of data with the first noise function.

Type: Grant

Filed: September 30, 2022

Date of Patent: June 3, 2025

Assignee: Privacy Analytics Inc.

Inventors: Lon Michel Luk Arbuckle, Devyani Biswal
Systems and methods of data transformation for data pooling

Patent number: 12189820

Abstract: A data anonymization pipeline system for managing holding and pooling data is disclosed. The data anonymization pipeline system transforms personal data at a source and then stores the transformed data in a safe environment. Furthermore, a re-identification risk assessment is performed before providing access to a user to fetch the de-identified data for secondary purposes.

Type: Grant

Filed: March 30, 2023

Date of Patent: January 7, 2025

Assignee: Privacy Analytics Inc.

Inventors: Lon Michel Luk Arbuckle, Jordan Elijah Collins, Khaldoun Zine El Abidine, Khaled El Emam
System and method for active learning to detect personally identifying information

Patent number: 12182307

Abstract: Using active learning to detect Protected Health Information (“PHI”) in documents stored as unannotated natural language data by selecting an initial chunk of text from the documents; forming a gold standard data via annotating the text by a human, the annotating identifies and tags PHI required to de-identify the text; training, using machine learning and the text before and after the annotating, a model having rules for PHI detection; querying, using a strategy, the documents to select a next chunk of text; machine annotating the text using the trained model; updating the gold standard data via correcting the machine annotation of the text by the human, wherein an amount of corrections in the updated gold standard data indicates a quality of the machine annotation; and iterating the steps starting at training, until the quality of the machine annotation is higher than a predetermined quality threshold.

Type: Grant

Filed: September 12, 2018

Date of Patent: December 31, 2024

Assignee: Privacy Analytics Inc.

Inventors: Muqun Li, Hazel Joyce Nicholls, Martin Scaiano
Geo-clustering for data de-identification

Patent number: 12142383

Abstract: Methods and systems to de-identify data records, including to merge pairs of clusters data records of individuals until a number of data records of each cluster meets a minimum size threshold, de-identify the clusters when each cluster meets the minimum size threshold, assess a risk of re-identification of the de-identified clusters based on k-anonymity, increase the minimum size threshold and re-perform the merge, the de-identify, and the assess a risk, if the assessed risk does not meet a risk criterion, and present the de-identified clusters on a display when the assessed risk meets the risk criterion.

Type: Grant

Filed: May 31, 2022

Date of Patent: November 12, 2024

Assignee: Privacy Analytics Inc.

Inventors: Andrew Richard Baker, Khaled El Emam
Smart de-identification using date jittering

Patent number: 12135821

Abstract: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.

Type: Grant

Filed: September 1, 2023

Date of Patent: November 5, 2024

Assignee: PRIVACY ANALYTICS INC.

Inventors: Sean Rose, Weilong Song, Martin Scaiano
System and method for intermediary mapping and de-identification of non-standard datasets

Patent number: 11782956

Abstract: Disclosed is a method for an intermediary mapping an de-identification comprising steps of retrieving datasets and meta data from a data source; selecting a target standard; mapping the retrieved datasets and the metadata to the target standard, wherein the datasets and the metadata are mapped to the target standard using one of, a schema mapping, a variable mapping, or a combination thereof; infer one or more of, variable classifications, variable connections, groupings, disclosure risk settings, and de-identification settings using the dataset mapping and metadata; perform a de-identification propagation using the mapped datasets, the mapped metadata, the inferred variable classifications, the inferred variable connections, the inferred groupings, the inferred disclosure risk settings, the inferred de-identification settings, or a combination thereof.

Type: Grant

Filed: October 20, 2021

Date of Patent: October 10, 2023

Assignee: PRIVACY ANALYTICS INC.

Inventors: Muhammad Oneeb Rehman Mian, David Nicholas Maurice Di Valentino, George Wesley Bradley
Smart de-identification using date jittering

Patent number: 11748517

Abstract: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.

Type: Grant

Filed: April 27, 2022

Date of Patent: September 5, 2023

Assignee: Privacy Analytics Inc.

Inventors: Sean Rose, Weilong Song, Martin Scaiano
Determining journalist risk of a dataset using population equivalence class distribution estimation

Patent number: 11664098

Abstract: Methods and systems to de-identify a longitudinal dataset of personal records based on journalistic risk computed from a sample set of the personal records, including determining a similarity distribution of the sample set based on quasi-identifiers of the respective personal records, converting the similarity distribution of the sample set to an equivalence class distribution, and computing journalistic risk based on the equivalence distribution. In an embodiment, multiple similarity measures are determined for a personal record based on comparisons with multiple combinations of other personal records of the sample set, and an average of the multiple similarity measures is rounded. In an embodiment, similarity measures are determined for a subset of the sample set and, for each similarity measure, the number of records having the similarity measure is projected to the subset of personal records. Journalistic risk may be computed for multiple types of attacks.

Type: Grant

Filed: December 23, 2021

Date of Patent: May 30, 2023

Assignee: PRIVACY ANALYTICS INC.

Inventors: Stephen Korte, Luk Arbuckle, Andrew Baker, Khaled El Emam, Sean Rose
Systems and methods of data transformation for data pooling

Patent number: 11620408

Abstract: A data anonymization pipeline system for managing holding and pooling data is disclosed. The data anonymization pipeline system transforms personal data at a source and then stores the transformed data in a safe environment. Furthermore, a re-identification risk assessment is performed before providing access to a user to fetch the de-identified data for secondary purposes.

Type: Grant

Filed: March 27, 2020

Date of Patent: April 4, 2023

Assignee: Privacy Analytics Inc.

Inventors: Lon Michel Luk Arbuckle, Jordan Elijah Collins, Khaldoun Zine El Abidine, Khaled El Emam
Geo-clustering for data de-identification

Patent number: 11380441

Abstract: The present disclosure is related to a method of geo-clustering of data for de-identification of a dataset. The method includes generating a plurality of geoclusters based on a plurality of geocodes. The geocodes may include ZIP codes or postal codes. The method further includes identifying the geoclusters having the smallest population. The geocluster having the smallest population is iteratively merged with the nearest geocluster until a minimum population threshold is met. Once the smallest geocluster meets the minimum population threshold, the plurality of geoclusters can be used to cluster the geocodes within a dataset to be de-identified.

Type: Grant

Filed: May 10, 2017

Date of Patent: July 5, 2022

Assignee: PRIVACY ANALYTICS INC.

Inventors: Andrew Richard Baker, Khaled El Emam
Smart de-identification using date jittering

Patent number: 11334685

Abstract: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.

Type: Grant

Filed: February 26, 2020

Date of Patent: May 17, 2022

Assignee: PRIVACY ANALYTICS INC.

Inventors: Sean Rose, Weilong Song, Martin Scaiano
Determining journalist risk of a dataset using population equivalence class distribution estimation

Patent number: 11238960

Abstract: A system, method and computer readable memory for determining journalist risk of a dataset using population equivalence class distribution estimation. The dataset may be a cross-sectional data set or a longitudinal dataset. The determine risk of identification can be determined and used in de-identification process of the dataset.

Type: Grant

Filed: November 27, 2015

Date of Patent: February 1, 2022

Assignee: Privacy Analytics Inc.

Inventors: Stephen Korte, Luk Arbuckle, Andrew Baker, Khaled El Emam, Sean Rose
System and method for local thresholding of re-identification risk measurement and mitigation

Patent number: 10803201

Abstract: System and method to produce an anonymized electronic data product having an individually-determined threshold of re-identification risk, and adjusting re-identification risk measurement parameters based on individual characteristics such as geographic location, in order to provide an anonymized electronic data product having a sensitivity-based reduced risk of re-identification.

Type: Grant

Filed: February 26, 2018

Date of Patent: October 13, 2020

Assignee: PRIVACY ANALYTICS INC.

Inventors: Hazel Joyce Nicholls, Andrew Richard Baker, Yasser Jafer, Martin Scaiano
Re-identification risk measurement estimation of a dataset

Patent number: 10685138

Abstract: There is provided a system and method executed by a processor for estimating re-identification risk of a single individual in a dataset. The individual, subject or patient is described by a data subject profile such as a record in the dataset. A population distribution is retrieved from a storage device, the population distribution is determined by one or more quasi-identifying fields identified in the data subject profile. An information score is then assigned to each quasi-identifying (QI) value of the one or more quasi-identifying fields associated with the data subject profile. The assigned information scores of the quasi-identifying values for the data subject profile are aggregated into an aggregated information value. An anonymity value is then calculated from the aggregated information value and a size of a population associated with the dataset. A re-identification metric for the individual from the anonymity value is then calculated.

Type: Grant

Filed: April 1, 2016

Date of Patent: June 16, 2020

Assignee: PRIVACY ANALYTICS INC.

Inventors: Martin Scaiano, Stephen Korte, Andrew Baker, Geoffrey Green, Khaled El Emam, Luk Arbuckle
Smart de-identification using date jittering

Patent number: 10586074

Abstract: System and method to produce an anonymized cohort having less than a predetermined risk of re-identification. The method includes receiving a data query of requested traits for the anonymized cohort, querying a data source to find records that possess at least some of the traits, forming a dataset from at least some of the records, and grouping the dataset in time into a first boundary group, a second boundary group, and one or more non-boundary groups temporally between the first boundary group and second boundary group. For each non-boundary group, calculating maximum time limits the non-boundary group can be time-shifted without overlapping an adjacent group, calculating a group jitter amount, capping the group jitter amount by the maximum time limits and by respective predetermined jitter limits, and jittering said non-boundary group by the capped group jitter amount to produce an anonymized dataset. Return the anonymized dataset.

Type: Grant

Filed: April 30, 2019

Date of Patent: March 10, 2020

Assignee: PRIVACY ANALYTICS INC.

Inventors: Sean Rose, Weilong Song, Martin Scaiano
Methods and systems for watermarking of anonymized datasets

Patent number: 10424406

Abstract: A method includes receiving an initial dataset. Each record of the initial dataset comprises a set of quasi-identifier attributes and a set of non-quasi-identifier attributes. A processor assigns a link identifier to each record and replaces each set of quasi-identifier attributes with a range to form a generalized set. The processor removes duplicate records based on identical generalized sets to generate de-duplicated records. The processor generates a randomized record by replacing the generalized set of each de-duplicated record with a corresponding set of random values. The processor passes the set of random values of each randomized record through multiple hash functions to generate multiple outputs. The multiple outputs are mapped to a Bloom filter. The processor forms a dataset by combining each randomized record with one or more sets of non-quasi-identifier attributes. The set of random values is a fingerprint for a corresponding record of the dataset.

Type: Grant

Filed: February 12, 2017

Date of Patent: September 24, 2019

Assignee: PRIVACY ANALYTICS INC.

Inventors: Yasser Jafer, Khaled El Emam

1 2 next