Patents by Inventor Aris Gkoulalas-Divanis

Aris Gkoulalas-Divanis has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Augmenting datasets with selected de-identified data records

Patent number: 11093640

Abstract: A computer system utilizes a dataset to support a research study. Regions of interestingness are determined within a model of data records of a first dataset that are authorized for the research study by associated entities. Data records from a second dataset are represented within the model, wherein the data records from the second dataset are relevant for supporting objectives of the research study. Data records from the second dataset that fail to satisfy de-identification requirements are removed. A resulting dataset is generated that including the first dataset records within a selected region of interestingness and selected records of the second dataset within the same region. The second dataset records within the resulting dataset are de-identified based on the de-identification requirements. Embodiments of the present invention further include a method and program product for utilizing a dataset to support a research study in substantially the same manner described above.

Type: Grant

Filed: April 12, 2018

Date of Patent: August 17, 2021

Assignee: International Business Machines Corporation

Inventor: Aris Gkoulalas-Divanis
Augmenting datasets with selected de-identified data records

Patent number: 11093646

Abstract: A computer system utilizes a dataset to support a research study. Regions of interestingness are determined within a model of data records of a first dataset that are authorized for the research study by associated entities. Data records from a second dataset are represented within the model, wherein the data records from the second dataset are relevant for supporting objectives of the research study. Data records from the second dataset that fail to satisfy de-identification requirements are removed. A resulting dataset is generated that including the first dataset records within a selected region of interestingness and selected records of the second dataset within the same region. The second dataset records within the resulting dataset are de-identified based on the de-identification requirements. Embodiments of the present invention further include a method and program product for utilizing a dataset to support a research study in substantially the same manner described above.

Type: Grant

Filed: June 24, 2019

Date of Patent: August 17, 2021

Assignee: International Business Machines Corporation

Inventor: Aris Gkoulalas-Divanis
Coordinated de-identification of a dataset across a network

Patent number: 11093639

Abstract: Methods, systems, and computer program products are provided. A network device receives, from a client device, a description of a dataset to be de-identified, and a list of one or more data de-identification techniques selected from groups consisting of a group of data masking techniques and a group of data pseudonymization techniques, and their configuration options supported by the client device. A first technique, from the at least one group of techniques and its configuration options supported by the client device and the network device are determined. The network device receives a dataset produced at the client device by applying the first technique and selected configuration options to corresponding attributes from the client device. The network device applies a de-identification technique to the dataset to produce a resulting set of de-identified data, wherein the de-identification technique is coordinated with the first technique and its configuration options to de-identify the dataset.

Type: Grant

Filed: February 23, 2018

Date of Patent: August 17, 2021

Assignee: International Business Machines Corporation

Inventor: Aris Gkoulalas-Divanis
DATA DEDUPLICATION IN DATA PLATFORMS

Publication number: 20210216511

Abstract: One embodiment of the invention provides a method for data deduplication storage management in a data platform including a plurality of data stores. The method comprises, for each data store of the plurality of data stores, determining a corresponding multi-level signature mapping data content of the data store into an ordered logical form comprising a plurality of data abstraction levels, determining a data similarity between the data store and each other data store of the plurality of data stores based on the multi-level signature corresponding to the data store and another multi-level signature corresponding to the other data store, and determining data usage of the data content of the data store. The method further comprises improving storage in the data platform by detecting duplicate data across the plurality of data stores based on each data similarity determined and each data usage determined.

Type: Application

Filed: January 10, 2020

Publication date: July 15, 2021

Inventors: Rohit Ranchal, Aris Gkoulalas-Divanis, Paul R. Bastide
DATA LINEAGE AND DATA PROVENANCE ENHANCEMENT

Publication number: 20210216229

Abstract: One embodiment of the invention provides a method for data lineage and data provenance enhancement. The method comprises arranging a data set into a logical ordering, and partitioning the data set into at least one set of partitions based on the logical ordering. The method further comprises, for each partition of the at least one set of partitions, determining a corresponding score for the partition, and determining a data similarity between the partition and each other partition of each other data set based on the corresponding score for the partition and another score corresponding to the other partition. The method further comprises determining data lineage of the data set based on each data similarity determined.

Type: Application

Filed: January 10, 2020

Publication date: July 15, 2021

Inventors: Paul R. Bastide, Aris Gkoulalas-Divanis, Rohit Ranchal
Iterative execution of data de-identification processes

Patent number: 11036886

Abstract: A computer system de-identifies data by selecting one or more attributes of a dataset and determining a set of data de-identification techniques associated with each attribute. Each de-identification technique is evaluated with respect to an impact on data privacy and an impact on data utility based on a series of metrics, and a data de-identification technique is recommended for each attribute based on the evaluation. The dataset is de-identified by applying the de-identification technique that is recommended for each attribute. Embodiments of the present invention further include a method and program product for de-identifying data in substantially the same manner described above.

Type: Grant

Filed: June 20, 2019

Date of Patent: June 15, 2021

Assignee: International Business Machines Corporation

Inventor: Aris Gkoulalas-Divanis
Iterative execution of data de-identification processes

Patent number: 11036884

Abstract: A computer system de-identifies data by selecting one or more attributes of a dataset and determining a set of data de-identification techniques associated with each attribute. Each de-identification technique is evaluated with respect to an impact on data privacy and an impact on data utility based on a series of metrics, and a data de-identification technique is recommended for each attribute based on the evaluation. The dataset is de-identified by applying the de-identification technique that is recommended for each attribute. Embodiments of the present invention further include a method and program product for de-identifying data in substantially the same manner described above.

Type: Grant

Filed: February 26, 2018

Date of Patent: June 15, 2021

Assignee: International Business Machines Corporation

Inventor: Aris Gkoulalas-Divanis
RESILIENT WATERMARKING

Publication number: 20210173903

Abstract: A method, system, and computer program product for detecting data tampering with resilient watermarking is provided. The method accesses a first relational data set on a data repository. The first relational data set includes a plurality of data elements. The first relational data set is sorted to generate a first sorted list and a second sorted list of the plurality of data elements. The method generates a watermark from the first sorted list and the second sorted list. The watermark contains a hash corresponding to the first sorted list and the second sorted list of the plurality of data elements. In response to an access request for the first relational data set, the method verifies an integrity of the first relational data set based on the watermark.

Type: Application

Filed: December 10, 2019

Publication date: June 10, 2021

Inventors: Olivia Choudhury, ARIS GKOULALAS-DIVANIS
ATTRIBUTE-BASED QUASI-IDENTIFIER DISCOVERY

Publication number: 20210176215

Abstract: A method, system, and computer program product for privacy protection of records based on attribute-based determination of quasi-identifiers within the records is provided. The method receives a first set of records containing a first set of attributes for a set of individuals. The method receives a second set of records for the set of individuals, with the second set of records containing a second set of attributes. A first set of quasi-identifiers, based on the first set of attributes, is accessed for the first set of records. The method determines a set of new attributes of the second set of attributes based on the first set of attributes. A second set of quasi-identifiers is generated based on the first set of quasi-identifiers and the set of new attributes. The method generates an anonymized set of records from the second set of records based on the second set of quasi-identifiers.

Type: Application

Filed: December 10, 2019

Publication date: June 10, 2021

Inventors: ARIS GKOULALAS-DIVANIS, Rohit Ranchal, Paul R. Bastide
Method/system for the online identification and blocking of privacy vulnerabilities in data streams

Patent number: 11030340

Abstract: A method, system and computer program product for providing privacy protection to data streams in a distributed computing environment. The method includes concurrently processing, by a plurality of computer machines, data streams of attributes containing data values received by each of the plurality of local computer machines; indexing the data values for each attribute of the plurality of data streams received by each of the plurality of local computer machines; providing the indexed data values to a main computer machine; integrating, by the main computer machine, the local computer machine indexed data values into a global index data structure for the plurality of data streams; and identifying privacy vulnerabilities of the attributes that are direct identifiers and quasi-identifiers based on the global index.

Type: Grant

Filed: July 17, 2018

Date of Patent: June 8, 2021

Assignee: International Business Machines Corporation

Inventors: Spyridon Antonatos, Stefano Braghin, Aris Gkoulalas-Divanis, Olivier Verscheure
ANONYMIZING DATA FOR PRESERVING PRIVACY DURING USE FOR FEDERATED MACHINE LEARNING

Publication number: 20210150269

Abstract: A computer-implemented method for training a global federated learning model using an aggregator server includes training multiple local models at respective local nodes. Each local node selects a set of attributes from its training dataset for training its local model. Each local node generates an anonymized training dataset by using a syntactic anonymization method, and by selecting quasi-identifying attributes from training attributes, and generalizing the quasi-identifying attributes using a syntactic algorithm. Further, each local node computes a syntactic mapping based on equivalence classes produced in the anonymized training dataset. The aggregator server computes a union of mappings received from all the local nodes. Further, federated learning includes training the global federated learning model by iteratively sending, by the local nodes to the aggregator server, parameter updates computed over the local models.

Type: Application

Filed: November 18, 2019

Publication date: May 20, 2021

Inventors: OLIVIA CHOUDHURY, ARIS GKOULALAS-DIVANIS, THEODOROS SALONIDIS, ISSA SYLLA
Identification of optimal data utility-preserving anonymization techniques by evaluation of a plurality of anonymization techniques on sample data sets that correspond to different anonymization categories

Patent number: 11003795

Abstract: Systems, methods and computer readable media are provided herein for de-identification of a dataset. Each of a plurality of anonymization techniques are assigned to a corresponding one of a plurality of anonymization categories, with each anonymization category corresponding to particular types of operations applied by the anonymization techniques. A sample dataset is generated from the dataset for each anonymization category based on a sampling technique associated with that anonymization category, wherein the sampling technique is selected based on a particular category of anonymization techniques. Each anonymization technique is applied to the sample dataset corresponding to the anonymization category assigned for the anonymization technique, and each anonymization technique is evaluated with respect to data utility based on a utility of the anonymized sample data produced.

Type: Grant

Filed: June 18, 2019

Date of Patent: May 11, 2021

Assignee: International Business Machines Corporation

Inventor: Aris Gkoulalas-Divanis
Identification of optimal data utility-preserving anonymization techniques by evaluation of a plurality of anonymization techniques on sample data sets that correspond to different anonymization categories

Patent number: 11003793

Abstract: Systems, methods and computer readable media are provided herein for de-identification of a dataset. Each of a plurality of anonymization techniques are assigned to a corresponding one of a plurality of anonymization categories, with each anonymization category corresponding to particular types of operations applied by the anonymization techniques. A sample dataset is generated from the dataset for each anonymization category based on a sampling technique associated with that anonymization category, wherein the sampling technique is selected based on a particular category of anonymization techniques. Each anonymization technique is applied to the sample dataset corresponding to the anonymization category assigned for the anonymization technique, and each anonymization technique is evaluated with respect to data utility based on a utility of the anonymized sample data produced.

Type: Grant

Filed: February 22, 2018

Date of Patent: May 11, 2021

Assignee: International Business Machines Corporation

Inventor: Aris Gkoulalas-Divanis
BUILT-IN LEGAL FRAMEWORK FILE MANAGEMENT

Publication number: 20210089678

Abstract: An approach is disclosed that enforces a privacy legal framework filesystem along with an operating system (OS) to enforce the privacy legal framework. An access of a datum in a selected file in the filesystem includes accessing a metadata associated with the selected file where the metadata includes a privacy state and an owner consent-based access policy. The consent-based access policy is enforced by the OS.

Type: Application

Filed: September 20, 2019

Publication date: March 25, 2021

Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
POLICY DRIVEN DATA MOVEMENT

Publication number: 20210089680

Abstract: An approach is disclosed for moving personal and sensitive data from a source filesystem to a destination filesystem while enforcing a source privacy legal framework. A request to copy information from a file residing in the source filesystem enabled to enforce the privacy and control legal framework to a destination filesystem is received. Metadata associated with the file and the request is analyzed to determine a copying policy. The copying policy is applied to the contents of the file to ensure compliance with the privacy and control legal framework of the source filesystem.

Type: Application

Filed: September 20, 2019

Publication date: March 25, 2021

Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
CREATING RESEARCH STUDY CORPUS

Publication number: 20210089681

Abstract: An approach is disclosed for building a study cohort by collecting information related to a plurality of people according to a collection request from a user. An authority of the request from the user is validated and if validated, a file policy associated with each file containing personal data is checked to verify a type of processing consented by the corresponding individual for their data. The data is transformed, and information copied related to the plurality of people according to the collection request, the request assessment, the consent information as part of each file's metadata, and the privacy legal framework. The privacy legal framework may be based on source files wherein each source file has a source file consent permission to form copied content. The copied content is used to process the collection request.

Type: Application

Filed: September 20, 2019

Publication date: March 25, 2021

Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
CREDENTIALS FOR CONSENT BASED FILE ACCESS

Publication number: 20210089671

Abstract: An approach is disclosed for providing built-in consent permissions for users, groups, processes, and programs accessing a part of a filesystem. The method includes integrating a consent access control into a plurality of files in a filesystem. A first request from a first requestor to access information from a first file in the filesystem is received. A first access policy for the first requestor is determined. The first access policy includes a first selective data conversion. A second request from a second requestor to access information from the first file in the filesystem is received. A second access policy for the second requestor different from the first access policy for the first requestor is determined.

Type: Application

Filed: September 20, 2019

Publication date: March 25, 2021

Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
MANAGING DATA ON VOLUMES

Publication number: 20210089220

Abstract: An approach is disclosed for placing data to manage access based on access and data sensitivity on volumes, An infrastructure is provided to separate data according to a data sensitivity and a data usage, wherein a highly accessed data is separated from lighted accessed data, The infrastructure facilitates efficiency of access by automatically adjusting placement of data and adjusting encryption policies based on a type of data, a sensitivity of the data, and an access activity to the data. The infrastructure enforces the access to the separated data.

Type: Application

Filed: September 20, 2019

Publication date: March 25, 2021

Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
FILE ACCESS RESTRICTIONS ENFORCEMENT

Publication number: 20210089675

Abstract: An approach is disclosed that enforces restrictions to data in a filesystem based on metadata for a file including a name for an attribute, a type, and a location in the file for the type. The restrictions for file access may be placed directly into a file handler, which is driven by the file structure metadata which identifies types of information, where in the file each type of information is located, and consent information which specifies what type of information is accessible to a requestor retrieving data for a specific purpose.

Type: Application

Filed: September 20, 2019

Publication date: March 25, 2021

Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
Data de-identification across different data sources using a common data model

Patent number: 10936750

Abstract: A computer system migrates and de-identifies data. Data is migrated from a dataset to a common data model that is configured to accommodate data comprising a plurality of different data types to be de-identified. Data is analyzed in the common data model to identify privacy vulnerabilities and determine corresponding data de-identification techniques and configuration options to be applied to the data. The automatically determined data de-identification techniques are applied to the data to address all of the identified privacy vulnerabilities, and the resulting de-identified data is migrated from the common data model back to the dataset. Embodiments of the present invention further include a computer-implemented method and program product for migrating and de-identifying data in substantially the same manner described above.

Type: Grant

Filed: March 1, 2018

Date of Patent: March 2, 2021

Assignee: International Business Machines Corporation

Inventor: Aris Gkoulalas-Divanis

prev 1 2 3 4 5 6 7 next