Patents by Inventor Yea J. Chu

Yea J. Chu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9443194
    Abstract: Provided are techniques for imputing a missing value for each of one or more predictor variables. Data is received from one or more data sources. For each of the one or more predictor variables, an imputation model is built based on information of a target variable; a type of imputation model to construct is determined based on the one or more data sources, a measurement level of the predictor variable, and a measurement level of the target variable; and the determined type of imputation model is constructed using basic statistics of the predictor variable and the target variable. The missing value is imputed for each of the one or more predictor variables using the data from the one or more data sources and one or more built imputation models to generate a completed data set.
    Type: Grant
    Filed: April 12, 2012
    Date of Patent: September 13, 2016
    Assignee: International Business Machines Corporation
    Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Jing Xu
  • Patent number: 9361274
    Abstract: Provided are techniques for interaction detection for generalized linear models. Basic statistics are calculated for a pair of categorical predictor variables and a target variable from a dataset during a single pass over the dataset. It is determined whether there is a significant interaction effect for the pair of categorical predictor variables on the target variable by: calculating a log-likelihood value for a full generalized linear model without estimating model parameters; calculating the model parameters for a reduced generalized linear model with a recursive marginal mean accumulation technique using the basic statistics; calculating a log-likelihood value for the reduced generalized linear model; calculating a likelihood ratio test statistic using the log-likelihood value for the full generalized linear model and the log-likelihood value for the reduced generalized linear model; calculating a p-value of the likelihood ratio test statistic; and comparing the p-value to a significance level.
    Type: Grant
    Filed: March 11, 2013
    Date of Patent: June 7, 2016
    Assignee: International Business Machines Corporation
    Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr
  • Patent number: 9159028
    Abstract: Provided are techniques for computing a task result. A processing data set of records is created, wherein each of the records contains data specific to a sub-task from a set of actual sub-tasks and contains a reference to data shared by the set of actual sub-tasks, and wherein a number of the records is equivalent to a number of the actual sub-tasks in the set of actual sub-tasks. With each mapper in a set of mappers, one of the records of the processing data set is received and an assigned sub-task is executed using the received one of the records to generate output. With a single reducer, the output from each mapper in the set of mappers is reduced to determine a task result.
    Type: Grant
    Filed: January 11, 2013
    Date of Patent: October 13, 2015
    Assignee: International Business Machines Corporation
    Inventors: Yea J. Chu, Dong Liang, Jing-Yun Shyr
  • Patent number: 9152921
    Abstract: Provided are techniques for computing a task result. A processing data set of records is created, wherein each of the records contains data specific to a sub-task from a set of actual sub-tasks and contains a reference to data shared by the set of actual sub-tasks, and wherein a number of the records is equivalent to a number of the actual sub-tasks in the set of actual sub-tasks. With each mapper in a set of mappers, one of the records of the processing data set is received and an assigned sub-task is executed using the received one of the records to generate output. With a single reducer, the output from each mapper in the set of mappers is reduced to determine a task result.
    Type: Grant
    Filed: March 21, 2014
    Date of Patent: October 6, 2015
    Assignee: International Business Machines Corporation
    Inventors: Yea J. Chu, Dong Liang, Jing-Yun Shyr
  • Patent number: 9053170
    Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.
    Type: Grant
    Filed: March 8, 2013
    Date of Patent: June 9, 2015
    Assignee: International Business Machines Corporation
    Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
  • Patent number: 8965895
    Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.
    Type: Grant
    Filed: July 30, 2012
    Date of Patent: February 24, 2015
    Assignee: International Business Machines Corporation
    Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
  • Publication number: 20150006605
    Abstract: Provided are techniques for interaction detection for generalized linear models. Basic statistics are calculated for a pair of categorical predictor variables and a target variable from a dataset during a single pass over the dataset. It is determined whether there is a significant interaction effect for the pair of categorical predictor variables on the target variable by: calculating a log-likelihood value for a full generalized linear model without estimating model parameters; calculating the model parameters for a reduced generalized linear model with a recursive marginal mean accumulation technique using the basic statistics; calculating a log-likelihood value for the reduced generalized linear model; calculating a likelihood ratio test statistic using the log-likelihood value for the full generalized linear model and the log-likelihood value for the reduced generalized linear model; calculating a p-value of the likelihood ratio test statistic; and comparing the p-value to a significance level.
    Type: Application
    Filed: September 15, 2014
    Publication date: January 1, 2015
    Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr
  • Patent number: 8868573
    Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.
    Type: Grant
    Filed: April 11, 2012
    Date of Patent: October 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
  • Patent number: 8843423
    Abstract: Provided are techniques for imputing a missing value for each of one or more predictor variables. Data is received from one or more data sources. For each of the one or more predictor variables, an imputation model is built based on information of a target variable; a type of imputation model to construct is determined based on the one or more data sources, a measurement level of the predictor variable, and a measurement level of the target variable; and the determined type of imputation model is constructed using basic statistics of the predictor variable and the target variable. The missing value is imputed for each of the one or more predictor variables using the data from the one or more data sources and one or more built imputation models to generate a completed data set.
    Type: Grant
    Filed: February 23, 2012
    Date of Patent: September 23, 2014
    Assignee: International Business Machines Corporation
    Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Jing Xu
  • Publication number: 20140258355
    Abstract: Provided are techniques for interaction detection for generalized linear models. Basic statistics are calculated for a pair of categorical predictor variables and a target variable from a dataset during a single pass over the dataset. It is determined whether there is a significant interaction effect for the pair of categorical predictor variables on the target variable by: calculating a log-likelihood value for a full generalized linear model without estimating model parameters; calculating the model parameters for a reduced generalized linear model with a recursive marginal mean accumulation technique using the basic statistics; calculating a log-likelihood value for the reduced generalized linear model; calculating a likelihood ratio test statistic using the log-likelihood value for the full generalized linear model and the log-likelihood value for the reduced generalized linear model; calculating a p-value of the likelihood ratio test statistic; and comparing the p-value to a significance level.
    Type: Application
    Filed: March 11, 2013
    Publication date: September 11, 2014
    Applicant: International Business Machines Corporation
    Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr
  • Publication number: 20140207722
    Abstract: Provided are techniques for computing a task result. A processing data set of records is created, wherein each of the records contains data specific to a sub-task from a set of actual sub-tasks and contains a reference to data shared by the set of actual sub-tasks, and wherein a number of the records is equivalent to a number of the actual sub-tasks in the set of actual sub-tasks. With each mapper in a set of mappers, one of the records of the processing data set is received and an assigned sub-task is executed using the received one of the records to generate output. With a single reducer, the output from each mapper in the set of mappers is reduced to determine a task result.
    Type: Application
    Filed: March 21, 2014
    Publication date: July 24, 2014
    Applicant: International Business Machines Corporation
    Inventors: Yea J. Chu, Dong Liang, Jing-Yun Shyr
  • Publication number: 20140201744
    Abstract: Provided are techniques for computing a task result. A processing data set of records is created, wherein each of the records contains data specific to a sub-task from a set of actual sub-tasks and contains a reference to data shared by the set of actual sub-tasks, and wherein a number of the records is equivalent to a number of the actual sub-tasks in the set of actual sub-tasks. With each mapper in a set of mappers, one of the records of the processing data set is received and an assigned sub-task is executed using the received one of the records to generate output. With a single reducer, the output from each mapper in the set of mappers is reduced to determine a task result.
    Type: Application
    Filed: January 11, 2013
    Publication date: July 17, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yea J. Chu, Dong Liang, Jing-Yun Shyr
  • Publication number: 20140032553
    Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.
    Type: Application
    Filed: July 30, 2012
    Publication date: January 30, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
  • Publication number: 20130226838
    Abstract: Provided are techniques for imputing a missing value for each of one or more predictor variables. Data is received from one or more data sources. For each of the one or more predictor variables, an imputation model is built based on information of a target variable; a type of imputation model to construct is determined based on the one or more data sources, a measurement level of the predictor variable, and a measurement level of the target variable; and the determined type of imputation model is constructed using basic statistics of the predictor variable and the target variable. The missing value is imputed for each of the one or more predictor variables using the data from the one or more data sources and one or more built imputation models to generate a completed data set.
    Type: Application
    Filed: February 23, 2012
    Publication date: August 29, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yea J. CHU, Sier HAN, Jing-Yun SHYR, Jing XU
  • Publication number: 20130226842
    Abstract: Provided are techniques for imputing a missing value for each of one or more predictor variables. Data is received from one or more data sources. For each of the one or more predictor variables, an imputation model is built based on information of a target variable; a type of imputation model to construct is determined based on the one or more data sources, a measurement level of the predictor variable, and a measurement level of the target variable; and the determined type of imputation model is constructed using basic statistics of the predictor variable and the target variable. The missing value is imputed for each of the one or more predictor variables using the data from the one or more data sources and one or more built imputation models to generate a completed data set.
    Type: Application
    Filed: April 12, 2012
    Publication date: August 29, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yea J. CHU, Sier HAN, Jing-Yun SHYR, Jing XU
  • Publication number: 20130218908
    Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.
    Type: Application
    Filed: February 17, 2012
    Publication date: August 22, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
  • Publication number: 20130218909
    Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.
    Type: Application
    Filed: April 11, 2012
    Publication date: August 22, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu