Patents by Inventor Aris Gkoulalas-Divanis

Aris Gkoulalas-Divanis has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11093640
    Abstract: A computer system utilizes a dataset to support a research study. Regions of interestingness are determined within a model of data records of a first dataset that are authorized for the research study by associated entities. Data records from a second dataset are represented within the model, wherein the data records from the second dataset are relevant for supporting objectives of the research study. Data records from the second dataset that fail to satisfy de-identification requirements are removed. A resulting dataset is generated that including the first dataset records within a selected region of interestingness and selected records of the second dataset within the same region. The second dataset records within the resulting dataset are de-identified based on the de-identification requirements. Embodiments of the present invention further include a method and program product for utilizing a dataset to support a research study in substantially the same manner described above.
    Type: Grant
    Filed: April 12, 2018
    Date of Patent: August 17, 2021
    Assignee: International Business Machines Corporation
    Inventor: Aris Gkoulalas-Divanis
  • Patent number: 11093639
    Abstract: Methods, systems, and computer program products are provided. A network device receives, from a client device, a description of a dataset to be de-identified, and a list of one or more data de-identification techniques selected from groups consisting of a group of data masking techniques and a group of data pseudonymization techniques, and their configuration options supported by the client device. A first technique, from the at least one group of techniques and its configuration options supported by the client device and the network device are determined. The network device receives a dataset produced at the client device by applying the first technique and selected configuration options to corresponding attributes from the client device. The network device applies a de-identification technique to the dataset to produce a resulting set of de-identified data, wherein the de-identification technique is coordinated with the first technique and its configuration options to de-identify the dataset.
    Type: Grant
    Filed: February 23, 2018
    Date of Patent: August 17, 2021
    Assignee: International Business Machines Corporation
    Inventor: Aris Gkoulalas-Divanis
  • Publication number: 20210216229
    Abstract: One embodiment of the invention provides a method for data lineage and data provenance enhancement. The method comprises arranging a data set into a logical ordering, and partitioning the data set into at least one set of partitions based on the logical ordering. The method further comprises, for each partition of the at least one set of partitions, determining a corresponding score for the partition, and determining a data similarity between the partition and each other partition of each other data set based on the corresponding score for the partition and another score corresponding to the other partition. The method further comprises determining data lineage of the data set based on each data similarity determined.
    Type: Application
    Filed: January 10, 2020
    Publication date: July 15, 2021
    Inventors: Paul R. Bastide, Aris Gkoulalas-Divanis, Rohit Ranchal
  • Publication number: 20210216511
    Abstract: One embodiment of the invention provides a method for data deduplication storage management in a data platform including a plurality of data stores. The method comprises, for each data store of the plurality of data stores, determining a corresponding multi-level signature mapping data content of the data store into an ordered logical form comprising a plurality of data abstraction levels, determining a data similarity between the data store and each other data store of the plurality of data stores based on the multi-level signature corresponding to the data store and another multi-level signature corresponding to the other data store, and determining data usage of the data content of the data store. The method further comprises improving storage in the data platform by detecting duplicate data across the plurality of data stores based on each data similarity determined and each data usage determined.
    Type: Application
    Filed: January 10, 2020
    Publication date: July 15, 2021
    Inventors: Rohit Ranchal, Aris Gkoulalas-Divanis, Paul R. Bastide
  • Patent number: 11036886
    Abstract: A computer system de-identifies data by selecting one or more attributes of a dataset and determining a set of data de-identification techniques associated with each attribute. Each de-identification technique is evaluated with respect to an impact on data privacy and an impact on data utility based on a series of metrics, and a data de-identification technique is recommended for each attribute based on the evaluation. The dataset is de-identified by applying the de-identification technique that is recommended for each attribute. Embodiments of the present invention further include a method and program product for de-identifying data in substantially the same manner described above.
    Type: Grant
    Filed: June 20, 2019
    Date of Patent: June 15, 2021
    Assignee: International Business Machines Corporation
    Inventor: Aris Gkoulalas-Divanis
  • Patent number: 11036884
    Abstract: A computer system de-identifies data by selecting one or more attributes of a dataset and determining a set of data de-identification techniques associated with each attribute. Each de-identification technique is evaluated with respect to an impact on data privacy and an impact on data utility based on a series of metrics, and a data de-identification technique is recommended for each attribute based on the evaluation. The dataset is de-identified by applying the de-identification technique that is recommended for each attribute. Embodiments of the present invention further include a method and program product for de-identifying data in substantially the same manner described above.
    Type: Grant
    Filed: February 26, 2018
    Date of Patent: June 15, 2021
    Assignee: International Business Machines Corporation
    Inventor: Aris Gkoulalas-Divanis
  • Publication number: 20210176215
    Abstract: A method, system, and computer program product for privacy protection of records based on attribute-based determination of quasi-identifiers within the records is provided. The method receives a first set of records containing a first set of attributes for a set of individuals. The method receives a second set of records for the set of individuals, with the second set of records containing a second set of attributes. A first set of quasi-identifiers, based on the first set of attributes, is accessed for the first set of records. The method determines a set of new attributes of the second set of attributes based on the first set of attributes. A second set of quasi-identifiers is generated based on the first set of quasi-identifiers and the set of new attributes. The method generates an anonymized set of records from the second set of records based on the second set of quasi-identifiers.
    Type: Application
    Filed: December 10, 2019
    Publication date: June 10, 2021
    Inventors: ARIS GKOULALAS-DIVANIS, Rohit Ranchal, Paul R. Bastide
  • Publication number: 20210173903
    Abstract: A method, system, and computer program product for detecting data tampering with resilient watermarking is provided. The method accesses a first relational data set on a data repository. The first relational data set includes a plurality of data elements. The first relational data set is sorted to generate a first sorted list and a second sorted list of the plurality of data elements. The method generates a watermark from the first sorted list and the second sorted list. The watermark contains a hash corresponding to the first sorted list and the second sorted list of the plurality of data elements. In response to an access request for the first relational data set, the method verifies an integrity of the first relational data set based on the watermark.
    Type: Application
    Filed: December 10, 2019
    Publication date: June 10, 2021
    Inventors: Olivia Choudhury, ARIS GKOULALAS-DIVANIS
  • Patent number: 11030340
    Abstract: A method, system and computer program product for providing privacy protection to data streams in a distributed computing environment. The method includes concurrently processing, by a plurality of computer machines, data streams of attributes containing data values received by each of the plurality of local computer machines; indexing the data values for each attribute of the plurality of data streams received by each of the plurality of local computer machines; providing the indexed data values to a main computer machine; integrating, by the main computer machine, the local computer machine indexed data values into a global index data structure for the plurality of data streams; and identifying privacy vulnerabilities of the attributes that are direct identifiers and quasi-identifiers based on the global index.
    Type: Grant
    Filed: July 17, 2018
    Date of Patent: June 8, 2021
    Assignee: International Business Machines Corporation
    Inventors: Spyridon Antonatos, Stefano Braghin, Aris Gkoulalas-Divanis, Olivier Verscheure
  • Publication number: 20210150269
    Abstract: A computer-implemented method for training a global federated learning model using an aggregator server includes training multiple local models at respective local nodes. Each local node selects a set of attributes from its training dataset for training its local model. Each local node generates an anonymized training dataset by using a syntactic anonymization method, and by selecting quasi-identifying attributes from training attributes, and generalizing the quasi-identifying attributes using a syntactic algorithm. Further, each local node computes a syntactic mapping based on equivalence classes produced in the anonymized training dataset. The aggregator server computes a union of mappings received from all the local nodes. Further, federated learning includes training the global federated learning model by iteratively sending, by the local nodes to the aggregator server, parameter updates computed over the local models.
    Type: Application
    Filed: November 18, 2019
    Publication date: May 20, 2021
    Inventors: OLIVIA CHOUDHURY, ARIS GKOULALAS-DIVANIS, THEODOROS SALONIDIS, ISSA SYLLA
  • Patent number: 11003795
    Abstract: Systems, methods and computer readable media are provided herein for de-identification of a dataset. Each of a plurality of anonymization techniques are assigned to a corresponding one of a plurality of anonymization categories, with each anonymization category corresponding to particular types of operations applied by the anonymization techniques. A sample dataset is generated from the dataset for each anonymization category based on a sampling technique associated with that anonymization category, wherein the sampling technique is selected based on a particular category of anonymization techniques. Each anonymization technique is applied to the sample dataset corresponding to the anonymization category assigned for the anonymization technique, and each anonymization technique is evaluated with respect to data utility based on a utility of the anonymized sample data produced.
    Type: Grant
    Filed: June 18, 2019
    Date of Patent: May 11, 2021
    Assignee: International Business Machines Corporation
    Inventor: Aris Gkoulalas-Divanis
  • Patent number: 11003793
    Abstract: Systems, methods and computer readable media are provided herein for de-identification of a dataset. Each of a plurality of anonymization techniques are assigned to a corresponding one of a plurality of anonymization categories, with each anonymization category corresponding to particular types of operations applied by the anonymization techniques. A sample dataset is generated from the dataset for each anonymization category based on a sampling technique associated with that anonymization category, wherein the sampling technique is selected based on a particular category of anonymization techniques. Each anonymization technique is applied to the sample dataset corresponding to the anonymization category assigned for the anonymization technique, and each anonymization technique is evaluated with respect to data utility based on a utility of the anonymized sample data produced.
    Type: Grant
    Filed: February 22, 2018
    Date of Patent: May 11, 2021
    Assignee: International Business Machines Corporation
    Inventor: Aris Gkoulalas-Divanis
  • Publication number: 20210089671
    Abstract: An approach is disclosed for providing built-in consent permissions for users, groups, processes, and programs accessing a part of a filesystem. The method includes integrating a consent access control into a plurality of files in a filesystem. A first request from a first requestor to access information from a first file in the filesystem is received. A first access policy for the first requestor is determined. The first access policy includes a first selective data conversion. A second request from a second requestor to access information from the first file in the filesystem is received. A second access policy for the second requestor different from the first access policy for the first requestor is determined.
    Type: Application
    Filed: September 20, 2019
    Publication date: March 25, 2021
    Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
  • Publication number: 20210089681
    Abstract: An approach is disclosed for building a study cohort by collecting information related to a plurality of people according to a collection request from a user. An authority of the request from the user is validated and if validated, a file policy associated with each file containing personal data is checked to verify a type of processing consented by the corresponding individual for their data. The data is transformed, and information copied related to the plurality of people according to the collection request, the request assessment, the consent information as part of each file's metadata, and the privacy legal framework. The privacy legal framework may be based on source files wherein each source file has a source file consent permission to form copied content. The copied content is used to process the collection request.
    Type: Application
    Filed: September 20, 2019
    Publication date: March 25, 2021
    Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
  • Publication number: 20210089675
    Abstract: An approach is disclosed that enforces restrictions to data in a filesystem based on metadata for a file including a name for an attribute, a type, and a location in the file for the type. The restrictions for file access may be placed directly into a file handler, which is driven by the file structure metadata which identifies types of information, where in the file each type of information is located, and consent information which specifies what type of information is accessible to a requestor retrieving data for a specific purpose.
    Type: Application
    Filed: September 20, 2019
    Publication date: March 25, 2021
    Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
  • Publication number: 20210089680
    Abstract: An approach is disclosed for moving personal and sensitive data from a source filesystem to a destination filesystem while enforcing a source privacy legal framework. A request to copy information from a file residing in the source filesystem enabled to enforce the privacy and control legal framework to a destination filesystem is received. Metadata associated with the file and the request is analyzed to determine a copying policy. The copying policy is applied to the contents of the file to ensure compliance with the privacy and control legal framework of the source filesystem.
    Type: Application
    Filed: September 20, 2019
    Publication date: March 25, 2021
    Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
  • Publication number: 20210089678
    Abstract: An approach is disclosed that enforces a privacy legal framework filesystem along with an operating system (OS) to enforce the privacy legal framework. An access of a datum in a selected file in the filesystem includes accessing a metadata associated with the selected file where the metadata includes a privacy state and an owner consent-based access policy. The consent-based access policy is enforced by the OS.
    Type: Application
    Filed: September 20, 2019
    Publication date: March 25, 2021
    Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
  • Publication number: 20210089220
    Abstract: An approach is disclosed for placing data to manage access based on access and data sensitivity on volumes, An infrastructure is provided to separate data according to a data sensitivity and a data usage, wherein a highly accessed data is separated from lighted accessed data, The infrastructure facilitates efficiency of access by automatically adjusting placement of data and adjusting encryption policies based on a type of data, a sensitivity of the data, and an access activity to the data. The infrastructure enforces the access to the separated data.
    Type: Application
    Filed: September 20, 2019
    Publication date: March 25, 2021
    Inventors: ARIS GKOULALAS-DIVANIS, CORVILLE O. ALLEN
  • Patent number: 10936750
    Abstract: A computer system migrates and de-identifies data. Data is migrated from a dataset to a common data model that is configured to accommodate data comprising a plurality of different data types to be de-identified. Data is analyzed in the common data model to identify privacy vulnerabilities and determine corresponding data de-identification techniques and configuration options to be applied to the data. The automatically determined data de-identification techniques are applied to the data to address all of the identified privacy vulnerabilities, and the resulting de-identified data is migrated from the common data model back to the dataset. Embodiments of the present invention further include a computer-implemented method and program product for migrating and de-identifying data in substantially the same manner described above.
    Type: Grant
    Filed: March 1, 2018
    Date of Patent: March 2, 2021
    Assignee: International Business Machines Corporation
    Inventor: Aris Gkoulalas-Divanis
  • Patent number: 10936752
    Abstract: A computer system migrates and de-identifies data. Data is migrated from a dataset to a common data model that is configured to accommodate data comprising a plurality of different data types to be de-identified. Data is analyzed in the common data model to identify privacy vulnerabilities and determine corresponding data de-identification techniques and configuration options to be applied to the data. The automatically determined data de-identification techniques are applied to the data to address all of the identified privacy vulnerabilities, and the resulting de-identified data is migrated from the common data model back to the dataset. Embodiments of the present invention further include a computer-implemented method and program product for migrating and de-identifying data in substantially the same manner described above.
    Type: Grant
    Filed: June 20, 2019
    Date of Patent: March 2, 2021
    Assignee: International Business Machines Corporation
    Inventor: Aris Gkoulalas-Divanis