Patents by Inventor Austin Clifford
Austin Clifford has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11841878Abstract: In an approach for automatic vertical partitioning of fact tables in a distributed query engine a processor analyzes a sample end-user workload of queries to extract filter predicates associated with each of multiple fact tables relating to a big data store. A processor, for each fact table, and for each column in the fact table to which a filter predicate is applied and where coarsification is required, generates a candidate partitioning expression incorporating an adjustment to a coarsification function based on a data distribution of values in the column. A processor scores the candidate partitioning expressions for each fact table based on cost data relating to the sample end-user workload and selects one or more candidate partitioning expressions to optimize partitioning of each fact table with each partition data being placed in a separate directory in a distributed file system.Type: GrantFiled: August 30, 2021Date of Patent: December 12, 2023Assignee: International Business Machines CorporationInventors: Austin Clifford, Hemant Asandas Bhatia, Ilker Ender, Mara Elisa de Paiva Fernandes Matias
-
Patent number: 11809424Abstract: Aspects of the present invention disclose a method, computer program product, and system for auto-scaling a query engine. The method includes one or more processors monitoring query traffic at the query engine. The method further includes one or more processors classifying queries by a plurality of service classes based on a level of complexity of a query. The method further includes one or more processors comparing query traffic for each service class with a concurrency threshold of a maximum number of queries of the service class allowed to be concurrently processed. The method further includes one or more processors instructing auto-scaling of a cluster of worker nodes to change a number of worker nodes available in the cluster based on the comparison, over a defined period of time, of the query traffic relative to a defined upscaling threshold and a defined downscaling threshold.Type: GrantFiled: October 23, 2020Date of Patent: November 7, 2023Assignee: International Business Machines CorporationInventors: Austin Clifford, Ilker Ender, Mara Matias
-
Publication number: 20230333971Abstract: A computer-implemented method, system and computer program product for optimally performing stress testing against big data management systems. A set of random test queries is generated and compiled to determine the data points of the features (e.g., table type being queried) of the set of random test queries. A distance (e.g., Mahalanobis distance) is then measured between the data points of the features and the mean of a distribution of data points corresponding to each same feature of an extracted feature set. Each random test query whose distance exceeds a threshold distance is then ranked. The ranked random test queries are then executed in order of rank. Those executed random test queries which resulted in an error (e.g., system failure) are added to a log, which is used to identify those queries to perform a stress test against the big data management system.Type: ApplicationFiled: June 21, 2023Publication date: October 19, 2023Inventors: Ilker Ender, Austin Clifford, Pedro Miguel Barbas, Mara Elisa de Paiva Fernandes Matias, Hemant Asandas Bhatia
-
Patent number: 11741001Abstract: A computer-implemented method, system and computer program product for optimally performing stress testing against big data management systems. A set of random test queries is generated and compiled to determine the data points of the features (e.g., table type being queried) of the set of random test queries. A distance (e.g., Mahalanobis distance) is then measured between the data points of the features and the mean of a distribution of data points corresponding to each same feature of an extracted feature set. Each random test query whose distance exceeds a threshold distance is then ranked. The ranked random test queries are then executed in order of rank. Those executed random test queries which resulted in an error (e.g., system failure) are added to a log, which is used to identify those queries to perform a stress test against the big data management system.Type: GrantFiled: October 1, 2021Date of Patent: August 29, 2023Assignee: International Business Machines CorporationInventors: Ilker Ender, Austin Clifford, Pedro Miguel Barbas, Mara Elisa de Paiva Fernandes Matias, Hemant Asandas Bhatia
-
Publication number: 20230103856Abstract: A computer-implemented method, system and computer program product for optimally performing stress testing against big data management systems. A set of random test queries is generated and compiled to determine the data points of the features (e.g., table type being queried) of the set of random test queries. A distance (e.g., Mahalanobis distance) is then measured between the data points of the features and the mean of a distribution of data points corresponding to each same feature of an extracted feature set. Each random test query whose distance exceeds a threshold distance is then ranked. The ranked random test queries are then executed in order of rank. Those executed random test queries which resulted in an error (e.g., system failure) are added to a log, which is used to identify those queries to perform a stress test against the big data management system.Type: ApplicationFiled: October 1, 2021Publication date: April 6, 2023Inventors: Ilker Ender, Austin Clifford, Pedro Miguel Barbas, Mara Elisa de Paiva Fernandes Matias, Hemant Asandas Bhatia
-
Publication number: 20230082010Abstract: In an approach for automatic vertical partitioning of fact tables in a distributed query engine a processor analyzes a sample end-user workload of queries to extract filter predicates associated with each of multiple fact tables relating to a big data store. A processor, for each fact table, and for each column in the fact table to which a filter predicate is applied and where coarsification is required, generates a candidate partitioning expression incorporating an adjustment to a coarsification function based on a data distribution of values in the column. A processor scores the candidate partitioning expressions for each fact table based on cost data relating to the sample end-user workload and selects one or more candidate partitioning expressions to optimize partitioning of each fact table with each partition data being placed in a separate directory in a distributed file system.Type: ApplicationFiled: August 30, 2021Publication date: March 16, 2023Inventors: Austin Clifford, Hemant Asandas Bhatia, Ilker Ender, Mara Elisa de Paiva Fernandes Matias
-
Patent number: 11468192Abstract: A computer-implemented method, computer program product and system for identifying pseudonymized data within data sources. One or more data repositories within one or more of the data sources are selected. One or more privacy data models are provided, where each of the privacy data models includes pattern(s) and/or parameter(s). One or more of the one or more privacy data models are selected. Data identification information is generated, where the data identification information indicates a presence or absence of pseudonymized data and of non-pseudonymized data within the one or more of the data sources. The data identification information is generated utilizing the pattern(s) and/or the parameter(s) to determine pseudonymized data.Type: GrantFiled: March 25, 2020Date of Patent: October 11, 2022Inventors: Pedro Barbas, Austin Clifford, Konrad Emanowicz, Patrick G. O'Sullivan
-
Patent number: 11416180Abstract: Proposed are concepts for providing resilience (i.e., fault tolerance) for the temporary data needs of a distributed file system. Such concepts may, for instance, provide a virtual storage layer in a data node of a distributed file system. The virtual storage layer may provide resilience for the temporary data needs of a Massively Parallel Processing (MPP) SQL on Hadoop engine.Type: GrantFiled: November 5, 2020Date of Patent: August 16, 2022Assignee: International Business Machines CorporationInventors: Austin Clifford, Mara Matias, Ilker Ender
-
Patent number: 11341139Abstract: Provided are a system, method and computer program product for redistribution of data in an online shared nothing database, said shared nothing database comprising a plurality of original partitions and at least one new partition.Type: GrantFiled: July 2, 2019Date of Patent: May 24, 2022Assignee: International Business Machines CorporationInventors: Enzo Cialini, Austin Clifford, Garrett Fitzsimons
-
Publication number: 20220137884Abstract: Proposed are concepts for providing resilience (i.e., fault tolerance) for the temporary data needs of a distributed file system. Such concepts may, for instance, provide a virtual storage layer in a data node of a distributed file system. The virtual storage layer may provide resilience for the temporary data needs of a Massively Parallel Processing (MPP) SQL on Hadoop engine.Type: ApplicationFiled: November 5, 2020Publication date: May 5, 2022Inventors: Austin Clifford, MARA MATIAS, ILKER ENDER
-
Publication number: 20220129460Abstract: Aspects of the present invention disclose a method, computer program product, and system for auto-scaling a query engine. The method includes one or more processors monitoring query traffic at the query engine. The method further includes one or more processors classifying queries by a plurality of service classes based on a level of complexity of a query. The method further includes one or more processors comparing query traffic for each service class with a concurrency threshold of a maximum number of queries of the service class allowed to be concurrently processed. The method further includes one or more processors instructing auto-scaling of a cluster of worker nodes to change a number of worker nodes available in the cluster based on the comparison, over a defined period of time, of the query traffic relative to a defined upscaling threshold and a defined downscaling threshold.Type: ApplicationFiled: October 23, 2020Publication date: April 28, 2022Inventors: Austin Clifford, ILKER ENDER, MARA MATIAS
-
Patent number: 11163744Abstract: Embodiments of the present invention provide a method, system and computer program product for test data generation using unique common factor sequencing. In an embodiment of the invention, a method for test data generation using unique common factor sequencing is provided. The method includes loading a table for population with test data in a test data generation tool executing in memory of a computer. A column set of multiple columns in the table associated with a key to the table can be selected for processing and different cardinality sequence values are assigned to the columns in the set such that the cardinality sequence values do not share a common factor except for unity as in the case of prime numbers.Type: GrantFiled: July 8, 2019Date of Patent: November 2, 2021Assignee: International Business Machines CorporationInventors: Austin Clifford, Konrad Emanowicz, Enda McCallig, Gary Murtagh, Clare Scally
-
Patent number: 11036684Abstract: Disclosed is an approach comprising a column partitioned into a plurality of partitions including an empty partition and a plurality of filled partitions each comprising data entries associated with a set of parameters having parameter values, the data entries compressed in accordance with a compression dictionary. The approach comprises receiving forecasted parameter values for an expected set of data entries to be stored in an empty partition; predicting a recurrence frequency of the data entries in the expected set using the forecasted parameter values by evaluating the respective compression dictionaries of the filled partitions with a machine learning algorithm; generating a predictive compression dictionary for the expected set of data entries based on the predicted recurrence frequency of the data entries in the expected set; receiving the expected set of data entries; and compressing at least part of the received expected set of data entries using the predictive compression dictionary.Type: GrantFiled: November 29, 2018Date of Patent: June 15, 2021Assignee: International Business Machines CorporationInventors: Sami Abed, Pedro Barbas, Austin Clifford, Konrad Emanowicz
-
Patent number: 10747903Abstract: A computer-implemented method, computer program product and system for identifying pseudonymized data within data sources. One or more data repositories within one or more of the data sources are selected. One or more privacy data models are provided, where each of the privacy data models includes pattern(s) and/or parameter(s). One or more of the one or more privacy data models are selected. Data identification information is generated, where the data identification information indicates a presence or absence of pseudonymized data and of non-pseudonymized data within the one or more of the data sources. The data identification information is generated utilizing the pattern(s) and/or the parameter(s) to determine pseudonymized data.Type: GrantFiled: April 25, 2019Date of Patent: August 18, 2020Assignee: International Business Machines CorporationInventors: Pedro Barbas, Austin Clifford, Konrad Emanowicz, Patrick G. O'Sullivan
-
Publication number: 20200226289Abstract: A computer-implemented method, computer program product and system for identifying pseudonymized data within data sources. One or more data repositories within one or more of the data sources are selected. One or more privacy data models are provided, where each of the privacy data models includes pattern(s) and/or parameter(s). One or more of the one or more privacy data models are selected. Data identification information is generated, where the data identification information indicates a presence or absence of pseudonymized data and of non-pseudonymized data within the one or more of the data sources. The data identification information is generated utilizing the pattern(s) and/or the parameter(s) to determine pseudonymized data.Type: ApplicationFiled: March 25, 2020Publication date: July 16, 2020Inventors: Pedro Barbas, Austin Clifford, Konrad Emanowicz, Patrick G. O'Sullivan
-
Patent number: 10657287Abstract: A computer-implemented method, computer program product and system for identifying pseudonymized data within data sources. One or more data repositories within one or more of the data sources are selected. One or more privacy data models are provided, where each of the privacy data models includes pattern(s) and/or parameter(s). One or more of the one or more privacy data models are selected. Data identification information is generated, where the data identification information indicates a presence or absence of pseudonymized data and of non-pseudonymized data within the one or more of the data sources. The data identification information is generated utilizing the pattern(s) and/or the parameter(s) to determine pseudonymized data.Type: GrantFiled: November 1, 2017Date of Patent: May 19, 2020Assignee: International Business Machines CorporationInventors: Pedro Barbas, Austin Clifford, Konrad Emanowicz, Patrick G. O'Sullivan
-
Publication number: 20190377737Abstract: Provided are a system, method and computer program product for redistribution of data in an online shared nothing database, said shared nothing database comprising a plurality of original partitions and at least one new partition.Type: ApplicationFiled: July 2, 2019Publication date: December 12, 2019Inventors: Enzo Cialini, Austin Clifford, Garrett Fitzsimons
-
Publication number: 20190332592Abstract: Embodiments of the present invention provide a method, system and computer program product for test data generation using unique common factor sequencing. In an embodiment of the invention, a method for test data generation using unique common factor sequencing is provided. The method includes loading a table for population with test data in a test data generation tool executing in memory of a computer. A column set of multiple columns in the table associated with a key to the table can be selected for processing and different cardinality sequence values are assigned to the columns in the set such that the cardinality sequence values do not share a common factor except for unity as in the case of prime numbers.Type: ApplicationFiled: July 8, 2019Publication date: October 31, 2019Inventors: Austin CLIFFORD, Konrad Emanowicz, Enda McCallig, Gary Murtagh, Clare Scally
-
Patent number: 10387422Abstract: Provided are a system, method and computer program product for redistribution of data in an online shared nothing database, said shared nothing database comprising a plurality of original partitions and at least one new partition.Type: GrantFiled: December 9, 2014Date of Patent: August 20, 2019Assignee: International Business Machines CorporationInventors: Enzo Cialini, Austin Clifford, Garrett Fitzsimons
-
Publication number: 20190251292Abstract: A computer-implemented method, computer program product and system for identifying pseudonymized data within data sources. One or more data repositories within one or more of the data sources are selected. One or more privacy data models are provided, where each of the privacy data models includes pattern(s) and/or parameter(s). One or more of the one or more privacy data models are selected. Data identification information is generated, where the data identification information indicates a presence or absence of pseudonymized data and of non-pseudonymized data within the one or more of the data sources. The data identification information is generated utilizing the pattern(s) and/or the parameter(s) to determine pseudonymized data.Type: ApplicationFiled: April 25, 2019Publication date: August 15, 2019Inventors: Pedro Barbas, Austin Clifford, Konrad Emanowicz, Patrick G. O'Sullivan