Patents by Inventor Yannick Saillet
Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9760615Abstract: A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes.Type: GrantFiled: September 30, 2016Date of Patent: September 12, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
-
Patent number: 9760593Abstract: A processor receives statistical information about a data set included in a column of a data table. The processor receives additional information about the data set that indicates a data format utilized by the data set and a type of information represented by the data set. The processor generates a data dictionary for compression of the data set based, at least in part, on the statistical information and the additional information. The data dictionary is created such that the data dictionary is capable of compressing data that is statistically predicted to be received at a future point.Type: GrantFiled: September 30, 2014Date of Patent: September 12, 2017Assignee: International Business Machines CorporationInventors: Martin A. Oberhofer, Yannick Saillet, Jens Seifert
-
Publication number: 20170242878Abstract: A method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.Type: ApplicationFiled: March 8, 2017Publication date: August 24, 2017Inventors: Albert Maier, Yannick Saillet, Damir Spisic
-
Publication number: 20170242877Abstract: A method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.Type: ApplicationFiled: February 18, 2016Publication date: August 24, 2017Inventors: Albert Maier, Yannick Saillet, Damir Spisic
-
Patent number: 9720971Abstract: Provided are a method, system, and article of manufacture for discovering transformations applied to a source table to generate a target table. Selection is made of a source table comprising a plurality of rows and a target table resulting from a transformation applied to the rows of the source table. A first pre-processing method is applied with respect to columns in the source and target tables to produce first category pre-processing output. The first category pre-processing output is used to determine first category transformation rules with respect to at least one source table column and at least one target table column. For each unpredicted target column in the target table not predicted by the determined first category transformation rules, a second pre-processing method is applied to columns in the source table and unpredicted target columns to produce second category pre-processing output.Type: GrantFiled: June 30, 2008Date of Patent: August 1, 2017Assignee: International Business Machines CorporationInventors: Torsten Bittner, Holger Kache, Mary Ann Roth, Yannick Saillet
-
Publication number: 20170212953Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.Type: ApplicationFiled: April 6, 2017Publication date: July 27, 2017Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
-
Patent number: 9716700Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.Type: GrantFiled: February 19, 2015Date of Patent: July 25, 2017Assignee: International Business Machines CorporationInventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
-
Patent number: 9716704Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.Type: GrantFiled: February 26, 2016Date of Patent: July 25, 2017Assignee: International Business Machines CorporationInventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
-
Publication number: 20170139746Abstract: The invention provides for a method for processing a plurality of data sets (105; 106; 108; 110-113; DB1; DB2) in a data repository (104) for storing at least unstructured data, the method comprising:—providing (302) a set of agents (150-168), each agent being operable to trigger the processing of one or more of the data sets, the exe-cution of each of said agents being automatically triggered in case one or more conditions assigned to said agent are met, at least one of the conditions relating to the existence, structure, content and/or annotations of the data set whose processing can be triggered by said agent;—executing (304) a first one of the agents;—updating (306) the annotations (115) of the first data set by the first agent; and—executing (308) a second one of the agents, said execution being triggered by the updated annotations of the first data set meeting the conditions of the second agent, thereby triggering a further up-dating of the annotations of the first data set.Type: ApplicationFiled: February 18, 2015Publication date: May 18, 2017Inventors: Albert Maier, Yannick Saillet, Harald C. Smith, Daniel C. Wolfson
-
Publication number: 20170109424Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.Type: ApplicationFiled: October 14, 2015Publication date: April 20, 2017Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
-
Patent number: 9594797Abstract: According to one embodiment of the present invention, a system assesses the quality of column data. The system assigns a pre-defined domain to one or more columns of the data based on a validity condition for the domain, applies the validity condition for the domain assigned to a column to data values in the column to compute a data quality metric for the column, and computes and displays a metric for a group of columns based on the computed data quality metric of at least one column in the group. Embodiments of the present invention further include a method and computer program product for assessing the quality of column data in substantially the same manners described above.Type: GrantFiled: September 9, 2014Date of Patent: March 14, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Thomas Hollifield, Yannick Saillet
-
Patent number: 9558230Abstract: According to one embodiment of the present invention, a system assesses the quality of column data. The system assigns a pre-defined domain to one or more columns of the data based on a validity condition for the domain, applies the validity condition for the domain assigned to a column to data values in the column to compute a data quality metric for the column, and computes and displays a metric for a group of columns based on the computed data quality metric of at least one column in the group. Embodiments of the present invention further include a method and computer program product for assessing the quality of column data in substantially the same manners described above.Type: GrantFiled: February 12, 2013Date of Patent: January 31, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Thomas Hollifield, Yannick Saillet
-
Publication number: 20170017705Abstract: A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes.Type: ApplicationFiled: September 30, 2016Publication date: January 19, 2017Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
-
Patent number: 9542656Abstract: Methods and apparatus, including computer program products, implementing and using techniques for integrating and data activities in a process flow. A data transformation activity is invoked through local or remote invocation. The data transformation activity is part of a process flow defined in a standard business process execution language format and is invoked from within the process flow. A system for executing a process flow including one or more control activities and one or more data transformation activities is also described. The system includes a process control engine for executing activities included in the process flow, a data transformation subsystem for storing domain specific definitions of data transformation processes of data in one or more databases, and a control data repository for storing domain specific activity information related to the process flow.Type: GrantFiled: November 13, 2006Date of Patent: January 10, 2017Assignee: International Business Machines CorporationInventors: Marion Behnen, Qi Jin, Yannick Saillet, Sriram Srinivasan, Muthukumar Thirunavukkarasu, Hoi J. Yoo
-
Patent number: 9465825Abstract: A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes.Type: GrantFiled: October 21, 2014Date of Patent: October 11, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
-
Publication number: 20160248743Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.Type: ApplicationFiled: February 19, 2015Publication date: August 25, 2016Inventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
-
Publication number: 20160246986Abstract: In an approach for providing data privacy in information integration systems, a method performed during compilation of an information integration job receives information regarding a data flow structure of the job to be executed, said data flow structure comprising at least one source system, one or more target entities, and at least one operator for modifying output data provided by the source system. The method determines data exit points at which output data are provided to the target entities and determines at least one non-trusted target entity. The method determines, for each non-trusted target entity, if at least one data field included in the output data provided to the non-trusted target entity is classified as sensitive information, and, if so, modifies the information integration job by including a masking operator directly before a data exit point associated with the non-trusted target entity in order to mask said sensitive information.Type: ApplicationFiled: February 26, 2016Publication date: August 25, 2016Inventors: Ivan M. Milman, Martin Oberhofer, Yannick Saillet
-
Patent number: 9311278Abstract: Methods implementing and using techniques for providing a visual editor allowing graphical editing of expressions in an expression language. A graphical user interface is displayed. A first user input of an expression is received. The expression is defined in a logical or textual form, and each component of the expression is represented by a graphical element on the graphical user interface. A syntax of the first user input is verified and an alert is provided to the user in response to detecting a syntax error or an inconsistency of the first user input when verifying the syntax.Type: GrantFiled: February 23, 2012Date of Patent: April 12, 2016Assignee: International Business Machines CorporationInventors: Frederick Charles Ernest Briden, Yannick Saillet
-
Publication number: 20160092479Abstract: A method, executed by a computer, for de-duplicating data includes receiving a dataset, pivoting the dataset along a set of columns that have a common domain to provide a pivoted dataset, de-duplicating the pivoted dataset to provide a de-duplicated dataset, and using the de-duplicated dataset. De-duplicating the pivoted dataset may include computing similarity scores for records that have different primary keys and merging records that have a similarity score that exceeds a selected threshold value. The method may include determining the set of columns having a common domain by referencing a business catalog and/or conducting a data classification operation on some or all of the columns of the dataset. The method may also include pivoting the dataset along another set of columns that have a different common domain. A computer system and computer program product corresponding to the method are also disclosed herein.Type: ApplicationFiled: May 20, 2015Publication date: March 31, 2016Inventors: Namit Kabra, Yannick Saillet
-
Publication number: 20160092497Abstract: A processor receives statistical information about a data set included in a column of a data table. The processor receives additional information about the data set that indicates a data format utilized by the data set and a type of information represented by the data set. The processor generates a data dictionary for compression of the data set based, at least in part, on the statistical information and the additional information. The data dictionary is created such that the data dictionary is capable of compressing data that is statistically predicted to be received at a future point.Type: ApplicationFiled: June 8, 2015Publication date: March 31, 2016Inventors: Martin A. Oberhofer, Yannick Saillet, Jens Seifert