Patents by Inventor Yannick Saillet

Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9760615
    Abstract: A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: September 12, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
  • Patent number: 9760593
    Abstract: A processor receives statistical information about a data set included in a column of a data table. The processor receives additional information about the data set that indicates a data format utilized by the data set and a type of information represented by the data set. The processor generates a data dictionary for compression of the data set based, at least in part, on the statistical information and the additional information. The data dictionary is created such that the data dictionary is capable of compressing data that is statistically predicted to be received at a future point.
    Type: Grant
    Filed: September 30, 2014
    Date of Patent: September 12, 2017
    Assignee: International Business Machines Corporation
    Inventors: Martin A. Oberhofer, Yannick Saillet, Jens Seifert
  • Publication number: 20170242878
    Abstract: A method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.
    Type: Application
    Filed: March 8, 2017
    Publication date: August 24, 2017
    Inventors: Albert Maier, Yannick Saillet, Damir Spisic
  • Publication number: 20170242877
    Abstract: A method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.
    Type: Application
    Filed: February 18, 2016
    Publication date: August 24, 2017
    Inventors: Albert Maier, Yannick Saillet, Damir Spisic
  • Patent number: 9720971
    Abstract: Provided are a method, system, and article of manufacture for discovering transformations applied to a source table to generate a target table. Selection is made of a source table comprising a plurality of rows and a target table resulting from a transformation applied to the rows of the source table. A first pre-processing method is applied with respect to columns in the source and target tables to produce first category pre-processing output. The first category pre-processing output is used to determine first category transformation rules with respect to at least one source table column and at least one target table column. For each unpredicted target column in the target table not predicted by the determined first category transformation rules, a second pre-processing method is applied to columns in the source table and unpredicted target columns to produce second category pre-processing output.
    Type: Grant
    Filed: June 30, 2008
    Date of Patent: August 1, 2017
    Assignee: International Business Machines Corporation
    Inventors: Torsten Bittner, Holger Kache, Mary Ann Roth, Yannick Saillet
  • Publication number: 20170212953
    Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.
    Type: Application
    Filed: April 6, 2017
    Publication date: July 27, 2017
    Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
  • Patent number: 9716700
    Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.
    Type: Grant
    Filed: February 19, 2015
    Date of Patent: July 25, 2017
    Assignee: International Business Machines Corporation
    Inventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
  • Patent number: 9716704
    Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.
    Type: Grant
    Filed: February 26, 2016
    Date of Patent: July 25, 2017
    Assignee: International Business Machines Corporation
    Inventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
  • Publication number: 20170139746
    Abstract: The invention provides for a method for processing a plurality of data sets (105; 106; 108; 110-113; DB1; DB2) in a data repository (104) for storing at least unstructured data, the method comprising:—providing (302) a set of agents (150-168), each agent being operable to trigger the processing of one or more of the data sets, the exe-cution of each of said agents being automatically triggered in case one or more conditions assigned to said agent are met, at least one of the conditions relating to the existence, structure, content and/or annotations of the data set whose processing can be triggered by said agent;—executing (304) a first one of the agents;—updating (306) the annotations (115) of the first data set by the first agent; and—executing (308) a second one of the agents, said execution being triggered by the updated annotations of the first data set meeting the conditions of the second agent, thereby triggering a further up-dating of the annotations of the first data set.
    Type: Application
    Filed: February 18, 2015
    Publication date: May 18, 2017
    Inventors: Albert Maier, Yannick Saillet, Harald C. Smith, Daniel C. Wolfson
  • Publication number: 20170109424
    Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.
    Type: Application
    Filed: October 14, 2015
    Publication date: April 20, 2017
    Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
  • Patent number: 9594797
    Abstract: According to one embodiment of the present invention, a system assesses the quality of column data. The system assigns a pre-defined domain to one or more columns of the data based on a validity condition for the domain, applies the validity condition for the domain assigned to a column to data values in the column to compute a data quality metric for the column, and computes and displays a metric for a group of columns based on the computed data quality metric of at least one column in the group. Embodiments of the present invention further include a method and computer program product for assessing the quality of column data in substantially the same manners described above.
    Type: Grant
    Filed: September 9, 2014
    Date of Patent: March 14, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Thomas Hollifield, Yannick Saillet
  • Patent number: 9558230
    Abstract: According to one embodiment of the present invention, a system assesses the quality of column data. The system assigns a pre-defined domain to one or more columns of the data based on a validity condition for the domain, applies the validity condition for the domain assigned to a column to data values in the column to compute a data quality metric for the column, and computes and displays a metric for a group of columns based on the computed data quality metric of at least one column in the group. Embodiments of the present invention further include a method and computer program product for assessing the quality of column data in substantially the same manners described above.
    Type: Grant
    Filed: February 12, 2013
    Date of Patent: January 31, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Thomas Hollifield, Yannick Saillet
  • Publication number: 20170017705
    Abstract: A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes.
    Type: Application
    Filed: September 30, 2016
    Publication date: January 19, 2017
    Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
  • Patent number: 9542656
    Abstract: Methods and apparatus, including computer program products, implementing and using techniques for integrating and data activities in a process flow. A data transformation activity is invoked through local or remote invocation. The data transformation activity is part of a process flow defined in a standard business process execution language format and is invoked from within the process flow. A system for executing a process flow including one or more control activities and one or more data transformation activities is also described. The system includes a process control engine for executing activities included in the process flow, a data transformation subsystem for storing domain specific definitions of data transformation processes of data in one or more databases, and a control data repository for storing domain specific activity information related to the process flow.
    Type: Grant
    Filed: November 13, 2006
    Date of Patent: January 10, 2017
    Assignee: International Business Machines Corporation
    Inventors: Marion Behnen, Qi Jin, Yannick Saillet, Sriram Srinivasan, Muthukumar Thirunavukkarasu, Hoi J. Yoo
  • Patent number: 9465825
    Abstract: A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes.
    Type: Grant
    Filed: October 21, 2014
    Date of Patent: October 11, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
  • Publication number: 20160248743
    Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.
    Type: Application
    Filed: February 19, 2015
    Publication date: August 25, 2016
    Inventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
  • Publication number: 20160246986
    Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.
    Type: Application
    Filed: February 26, 2016
    Publication date: August 25, 2016
    Inventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
  • Patent number: 9311278
    Abstract: Methods implementing and using techniques for providing a visual editor allowing graphical editing of expressions in an expression language. A graphical user interface is displayed. A first user input of an expression is received. The expression is defined in a logical or textual form, and each component of the expression is represented by a graphical element on the graphical user interface. A syntax of the first user input is verified and an alert is provided to the user in response to detecting a syntax error or an inconsistency of the first user input when verifying the syntax.
    Type: Grant
    Filed: February 23, 2012
    Date of Patent: April 12, 2016
    Assignee: International Business Machines Corporation
    Inventors: Frederick Charles Ernest Briden, Yannick Saillet
  • Publication number: 20160092479
    Abstract: A method, executed by a computer, for de-duplicating data includes receiving a dataset, pivoting the dataset along a set of columns that have a common domain to provide a pivoted dataset, de-duplicating the pivoted dataset to provide a de-duplicated dataset, and using the de-duplicated dataset. De-duplicating the pivoted dataset may include computing similarity scores for records that have different primary keys and merging records that have a similarity score that exceeds a selected threshold value. The method may include determining the set of columns having a common domain by referencing a business catalog and/or conducting a data classification operation on some or all of the columns of the dataset. The method may also include pivoting the dataset along another set of columns that have a different common domain. A computer system and computer program product corresponding to the method are also disclosed herein.
    Type: Application
    Filed: May 20, 2015
    Publication date: March 31, 2016
    Inventors: Namit Kabra, Yannick Saillet
  • Publication number: 20160092497
    Abstract: A processor receives statistical information about a data set included in a column of a data table. The processor receives additional information about the data set that indicates a data format utilized by the data set and a type of information represented by the data set. The processor generates a data dictionary for compression of the data set based, at least in part, on the statistical information and the additional information. The data dictionary is created such that the data dictionary is capable of compressing data that is statistically predicted to be received at a future point.
    Type: Application
    Filed: June 8, 2015
    Publication date: March 31, 2016
    Inventors: Martin A. Oberhofer, Yannick Saillet, Jens Seifert