Patents by Inventor Yannick Saillet

Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Data quality monitoring

Patent number: 9760615

Abstract: A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes.

Type: Grant

Filed: September 30, 2016

Date of Patent: September 12, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
Data dictionary with a reduced need for rebuilding

Patent number: 9760593

Abstract: A processor receives statistical information about a data set included in a column of a data table. The processor receives additional information about the data set that indicates a data format utilized by the data set and a type of information represented by the data set. The processor generates a data dictionary for compression of the data set based, at least in part, on the statistical information and the additional information. The data dictionary is created such that the data dictionary is capable of compressing data that is statistically predicted to be received at a future point.

Type: Grant

Filed: September 30, 2014

Date of Patent: September 12, 2017

Assignee: International Business Machines Corporation

Inventors: Martin A. Oberhofer, Yannick Saillet, Jens Seifert
DATA SAMPLING IN A STORAGE SYSTEM

Publication number: 20170242878

Abstract: A method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.

Type: Application

Filed: March 8, 2017

Publication date: August 24, 2017

Inventors: Albert Maier, Yannick Saillet, Damir Spisic
DATA SAMPLING IN A STORAGE SYSTEM

Publication number: 20170242877

Abstract: A method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.

Type: Application

Filed: February 18, 2016

Publication date: August 24, 2017

Inventors: Albert Maier, Yannick Saillet, Damir Spisic
Discovering transformations applied to a source table to generate a target table

Patent number: 9720971

Abstract: Provided are a method, system, and article of manufacture for discovering transformations applied to a source table to generate a target table. Selection is made of a source table comprising a plurality of rows and a target table resulting from a transformation applied to the rows of the source table. A first pre-processing method is applied with respect to columns in the source and target tables to produce first category pre-processing output. The first category pre-processing output is used to determine first category transformation rules with respect to at least one source table column and at least one target table column. For each unpredicted target column in the target table not predicted by the determined first category transformation rules, a second pre-processing method is applied to columns in the source table and unpredicted target columns to produce second category pre-processing output.

Type: Grant

Filed: June 30, 2008

Date of Patent: August 1, 2017

Assignee: International Business Machines Corporation

Inventors: Torsten Bittner, Holger Kache, Mary Ann Roth, Yannick Saillet
METHOD FOR CLASSIFYING AN UNMANAGED DATASET

Publication number: 20170212953

Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.

Type: Application

Filed: April 6, 2017

Publication date: July 27, 2017

Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
Code analysis for providing data privacy in ETL systems

Patent number: 9716700

Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.

Type: Grant

Filed: February 19, 2015

Date of Patent: July 25, 2017

Assignee: International Business Machines Corporation

Inventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
Code analysis for providing data privacy in ETL systems

Patent number: 9716704

Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.

Type: Grant

Filed: February 26, 2016

Date of Patent: July 25, 2017

Assignee: International Business Machines Corporation

Inventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
PROCESSING DATA SETS IN A BIG DATA REPOSITORY

Publication number: 20170139746

Abstract: The invention provides for a method for processing a plurality of data sets (105; 106; 108; 110-113; DB1; DB2) in a data repository (104) for storing at least unstructured data, the method comprising:—providing (302) a set of agents (150-168), each agent being operable to trigger the processing of one or more of the data sets, the exe-cution of each of said agents being automatically triggered in case one or more conditions assigned to said agent are met, at least one of the conditions relating to the existence, structure, content and/or annotations of the data set whose processing can be triggered by said agent;—executing (304) a first one of the agents;—updating (306) the annotations (115) of the first data set by the first agent; and—executing (308) a second one of the agents, said execution being triggered by the updated annotations of the first data set meeting the conditions of the second agent, thereby triggering a further up-dating of the annotations of the first data set.

Type: Application

Filed: February 18, 2015

Publication date: May 18, 2017

Inventors: Albert Maier, Yannick Saillet, Harald C. Smith, Daniel C. Wolfson
METHOD FOR CLASSIFYING AN UNMANAGED DATASET

Publication number: 20170109424

Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.

Type: Application

Filed: October 14, 2015

Publication date: April 20, 2017

Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
Data quality assessment

Patent number: 9594797

Abstract: According to one embodiment of the present invention, a system assesses the quality of column data. The system assigns a pre-defined domain to one or more columns of the data based on a validity condition for the domain, applies the validity condition for the domain assigned to a column to data values in the column to compute a data quality metric for the column, and computes and displays a metric for a group of columns based on the computed data quality metric of at least one column in the group. Embodiments of the present invention further include a method and computer program product for assessing the quality of column data in substantially the same manners described above.

Type: Grant

Filed: September 9, 2014

Date of Patent: March 14, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Thomas Hollifield, Yannick Saillet
Data quality assessment

Patent number: 9558230

Abstract: According to one embodiment of the present invention, a system assesses the quality of column data. The system assigns a pre-defined domain to one or more columns of the data based on a validity condition for the domain, applies the validity condition for the domain assigned to a column to data values in the column to compute a data quality metric for the column, and computes and displays a metric for a group of columns based on the computed data quality metric of at least one column in the group. Embodiments of the present invention further include a method and computer program product for assessing the quality of column data in substantially the same manners described above.

Type: Grant

Filed: February 12, 2013

Date of Patent: January 31, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Thomas Hollifield, Yannick Saillet
DATA QUALITY MONITORING

Publication number: 20170017705

Abstract: A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes.

Type: Application

Filed: September 30, 2016

Publication date: January 19, 2017

Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
Supporting ETL processing in BPEL-based processes

Patent number: 9542656

Abstract: Methods and apparatus, including computer program products, implementing and using techniques for integrating and data activities in a process flow. A data transformation activity is invoked through local or remote invocation. The data transformation activity is part of a process flow defined in a standard business process execution language format and is invoked from within the process flow. A system for executing a process flow including one or more control activities and one or more data transformation activities is also described. The system includes a process control engine for executing activities included in the process flow, a data transformation subsystem for storing domain specific definitions of data transformation processes of data in one or more databases, and a control data repository for storing domain specific activity information related to the process flow.

Type: Grant

Filed: November 13, 2006

Date of Patent: January 10, 2017

Assignee: International Business Machines Corporation

Inventors: Marion Behnen, Qi Jin, Yannick Saillet, Sriram Srinivasan, Muthukumar Thirunavukkarasu, Hoi J. Yoo
Data quality monitoring

Patent number: 9465825

Abstract: A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes.

Type: Grant

Filed: October 21, 2014

Date of Patent: October 11, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
CODE ANALYSIS FOR PROVIDING DATA PRIVACY IN ETL SYSTEMS

Publication number: 20160248743

Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.

Type: Application

Filed: February 19, 2015

Publication date: August 25, 2016

Inventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
CODE ANALYSIS FOR PROVIDING DATA PRIVACY IN ETL SYSTEMS

Publication number: 20160246986

Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.

Type: Application

Filed: February 26, 2016

Publication date: August 25, 2016

Inventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
Visual editor for editing complex expressions

Patent number: 9311278

Abstract: Methods implementing and using techniques for providing a visual editor allowing graphical editing of expressions in an expression language. A graphical user interface is displayed. A first user input of an expression is received. The expression is defined in a logical or textual form, and each component of the expression is represented by a graphical element on the graphical user interface. A syntax of the first user input is verified and an alert is provided to the user in response to detecting a syntax error or an inconsistency of the first user input when verifying the syntax.

Type: Grant

Filed: February 23, 2012

Date of Patent: April 12, 2016

Assignee: International Business Machines Corporation

Inventors: Frederick Charles Ernest Briden, Yannick Saillet
DATA DE-DUPLICATION

Publication number: 20160092479

Abstract: A method, executed by a computer, for de-duplicating data includes receiving a dataset, pivoting the dataset along a set of columns that have a common domain to provide a pivoted dataset, de-duplicating the pivoted dataset to provide a de-duplicated dataset, and using the de-duplicated dataset. De-duplicating the pivoted dataset may include computing similarity scores for records that have different primary keys and merging records that have a similarity score that exceeds a selected threshold value. The method may include determining the set of columns having a common domain by referencing a business catalog and/or conducting a data classification operation on some or all of the columns of the dataset. The method may also include pivoting the dataset along another set of columns that have a different common domain. A computer system and computer program product corresponding to the method are also disclosed herein.

Type: Application

Filed: May 20, 2015

Publication date: March 31, 2016

Inventors: Namit Kabra, Yannick Saillet
DATA DICTIONARY WITH A REDUCED NEED FOR REBUILDING

Publication number: 20160092497

Abstract: A processor receives statistical information about a data set included in a column of a data table. The processor receives additional information about the data set that indicates a data format utilized by the data set and a type of information represented by the data set. The processor generates a data dictionary for compression of the data set based, at least in part, on the statistical information and the additional information. The data dictionary is created such that the data dictionary is capable of compressing data that is statistically predicted to be received at a future point.

Type: Application

Filed: June 8, 2015

Publication date: March 31, 2016

Inventors: Martin A. Oberhofer, Yannick Saillet, Jens Seifert

prev … 3 4 5 6 7 8 9 10 next