Patents by Inventor Jing-Yun Shyr

Jing-Yun Shyr has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CONDENSING HIERARCHICAL DATA

Publication number: 20150186500

Abstract: A computing device includes at least one processor, and at least one module operable by the at least one processor to receive data representing a hierarchy, wherein the hierarchy comprises at least one set of sibling nodes and a respective parent node, generate a condensed hierarchy by determining a grouping for the at least one set of sibling nodes, determine whether the at least one set of sibling nodes can be represented by the respective parent node, based at least in part on the grouping for the at least one set of sibling nodes, and responsive to determining that the at least one set of sibling nodes can be represented by the respective parent node, remove the at least one set of sibling nodes from the condensed hierarchy. The at least one module may further be operable by the at least one processor to output the condensed hierarchy for display.

Type: Application

Filed: June 19, 2014

Publication date: July 2, 2015

Inventors: Daniel J. Rope, Jing-Yun Shyr, Damir Spisic
CONDENSING HIERARCHICAL DATA

Publication number: 20150186529

Abstract: A computing device includes at least one processor, and at least one module operable by the at least one processor to receive data representing a hierarchy, wherein the hierarchy comprises at least one set of sibling nodes and a respective parent node, generate a condensed hierarchy by determining a grouping for the at least one set of sibling nodes, determine whether the at least one set of sibling nodes can be represented by the respective parent node, based at least in part on the grouping for the at least one set of sibling nodes, and responsive to determining that the at least one set of sibling nodes can be represented by the respective parent node, remove the at least one set of sibling nodes from the condensed hierarchy. The at least one module may further be operable by the at least one processor to output the condensed hierarchy for display.

Type: Application

Filed: December 27, 2013

Publication date: July 2, 2015

Applicant: International Business Machines Corporation

Inventors: Daniel J. Rope, Jing-Yun Shyr, Damir Spisic
Relationship discovery in business analytics

Patent number: 9053170

Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.

Type: Grant

Filed: March 8, 2013

Date of Patent: June 9, 2015

Assignee: International Business Machines Corporation

Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
Generating a predictive model from multiple data sources

Patent number: 8996452

Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.

Type: Grant

Filed: July 10, 2012

Date of Patent: March 31, 2015

Assignee: International Business Machines Corporation

Inventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu
Generating a predictive model from multiple data sources

Patent number: 8990149

Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.

Type: Grant

Filed: March 15, 2011

Date of Patent: March 24, 2015

Assignee: International Business Machines Corporation

Inventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu
Relationship discovery in business analytics

Patent number: 8965895

Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.

Type: Grant

Filed: July 30, 2012

Date of Patent: February 24, 2015

Assignee: International Business Machines Corporation

Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
DECISION TREE INSIGHT DISCOVERY

Publication number: 20150039624

Abstract: Techniques for presenting insight into classification trees may include performing a grouping analysis to group leaf nodes of a classification tree into a significant group and an insignificant group, performing influential target category analysis to identify one or more influential target categories for the leaf nodes of the classification tree in the significant group, and presenting one or more insights into the classification tree based on the grouping analysis and the influential target category analysis.

Type: Application

Filed: September 17, 2014

Publication date: February 5, 2015

Inventors: Jane Y. Chu, Jing-Yun Shyr, Weicai Zhong
INTERACTION DETECTION FOR GENERALIZED LINEAR MODELS

Publication number: 20150006605

Abstract: Provided are techniques for interaction detection for generalized linear models. Basic statistics are calculated for a pair of categorical predictor variables and a target variable from a dataset during a single pass over the dataset. It is determined whether there is a significant interaction effect for the pair of categorical predictor variables on the target variable by: calculating a log-likelihood value for a full generalized linear model without estimating model parameters; calculating the model parameters for a reduced generalized linear model with a recursive marginal mean accumulation technique using the basic statistics; calculating a log-likelihood value for the reduced generalized linear model; calculating a likelihood ratio test statistic using the log-likelihood value for the full generalized linear model and the log-likelihood value for the reduced generalized linear model; calculating a p-value of the likelihood ratio test statistic; and comparing the p-value to a significance level.

Type: Application

Filed: September 15, 2014

Publication date: January 1, 2015

Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr
Interestingness of data

Patent number: 8880532

Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.

Type: Grant

Filed: June 29, 2011

Date of Patent: November 4, 2014

Assignee: International Business Machines Corporation

Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
Computing and applying order statistics for data preparation

Patent number: 8868573

Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.

Type: Grant

Filed: April 11, 2012

Date of Patent: October 21, 2014

Assignee: International Business Machines Corporation

Inventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
Missing value imputation for predictive models

Patent number: 8843423

Abstract: Provided are techniques for imputing a missing value for each of one or more predictor variables. Data is received from one or more data sources. For each of the one or more predictor variables, an imputation model is built based on information of a target variable; a type of imputation model to construct is determined based on the one or more data sources, a measurement level of the predictor variable, and a measurement level of the target variable; and the determined type of imputation model is constructed using basic statistics of the predictor variable and the target variable. The missing value is imputed for each of the one or more predictor variables using the data from the one or more data sources and one or more built imputation models to generate a completed data set.

Type: Grant

Filed: February 23, 2012

Date of Patent: September 23, 2014

Assignee: International Business Machines Corporation

Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Jing Xu
Interestingness of data

Patent number: 8843498

Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.

Type: Grant

Filed: September 13, 2012

Date of Patent: September 23, 2014

Assignee: International Business Machines Corporation

Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
DECISION TREE INSIGHT DISCOVERY

Publication number: 20140279775

Abstract: Techniques for presenting insight into classification trees may include performing a grouping analysis to group leaf nodes of a classification tree into a significant group and an insignificant group, performing influential target category analysis to identify one or more influential target categories for the leaf nodes of the classification tree in the significant group, and presenting one or more insights into the classification tree based on the grouping analysis and the influential target category analysis.

Type: Application

Filed: March 14, 2013

Publication date: September 18, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jane Y. Chu, Jing-Yun Shyr, Weicai Zhong
INTERACTION DETECTION FOR GENERALIZED LINEAR MODELS

Publication number: 20140258355

Abstract: Provided are techniques for interaction detection for generalized linear models. Basic statistics are calculated for a pair of categorical predictor variables and a target variable from a dataset during a single pass over the dataset. It is determined whether there is a significant interaction effect for the pair of categorical predictor variables on the target variable by: calculating a log-likelihood value for a full generalized linear model without estimating model parameters; calculating the model parameters for a reduced generalized linear model with a recursive marginal mean accumulation technique using the basic statistics; calculating a log-likelihood value for the reduced generalized linear model; calculating a likelihood ratio test statistic using the log-likelihood value for the full generalized linear model and the log-likelihood value for the reduced generalized linear model; calculating a p-value of the likelihood ratio test statistic; and comparing the p-value to a significance level.

Type: Application

Filed: March 11, 2013

Publication date: September 11, 2014

Applicant: International Business Machines Corporation

Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr
COMPUTING REGRESSION MODELS

Publication number: 20140207722

Abstract: Provided are techniques for computing a task result. A processing data set of records is created, wherein each of the records contains data specific to a sub-task from a set of actual sub-tasks and contains a reference to data shared by the set of actual sub-tasks, and wherein a number of the records is equivalent to a number of the actual sub-tasks in the set of actual sub-tasks. With each mapper in a set of mappers, one of the records of the processing data set is received and an assigned sub-task is executed using the received one of the records to generate output. With a single reducer, the output from each mapper in the set of mappers is reduced to determine a task result.

Type: Application

Filed: March 21, 2014

Publication date: July 24, 2014

Applicant: International Business Machines Corporation

Inventors: Yea J. Chu, Dong Liang, Jing-Yun Shyr
COMPUTING REGRESSION MODELS

Publication number: 20140201744

Abstract: Provided are techniques for computing a task result. A processing data set of records is created, wherein each of the records contains data specific to a sub-task from a set of actual sub-tasks and contains a reference to data shared by the set of actual sub-tasks, and wherein a number of the records is equivalent to a number of the actual sub-tasks in the set of actual sub-tasks. With each mapper in a set of mappers, one of the records of the processing data set is received and an assigned sub-task is executed using the received one of the records to generate output. With a single reducer, the output from each mapper in the set of mappers is reduced to determine a task result.

Type: Application

Filed: January 11, 2013

Publication date: July 17, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yea J. Chu, Dong Liang, Jing-Yun Shyr
INTERPRETATION OF STATISTICAL RESULTS

Publication number: 20140114707

Abstract: Provided are techniques for summarizing statistical results. Multiple sets of statistical results are received, wherein each of the sets of statistical results are ordered according to interestingness. Insights are generated based on the multiple sets of statistical results. Relationships between the generated insights are identified. An executive summary is generated with a set of findings based on the identified relationships. An interactive visualization is provided with the generated insights based on the executive summary.

Type: Application

Filed: October 19, 2012

Publication date: April 24, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Daniel J. Rope, Jing-Yun Shyr, Margaret J. Vais, Michael D. Woods
RELATIONSHIP DISCOVERY IN BUSINESS ANALYTICS

Publication number: 20140032553

Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.

Type: Application

Filed: July 30, 2012

Publication date: January 30, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
MISSING VALUE IMPUTATION FOR PREDICTIVE MODELS

Publication number: 20130226842

Abstract: Provided are techniques for imputing a missing value for each of one or more predictor variables. Data is received from one or more data sources. For each of the one or more predictor variables, an imputation model is built based on information of a target variable; a type of imputation model to construct is determined based on the one or more data sources, a measurement level of the predictor variable, and a measurement level of the target variable; and the determined type of imputation model is constructed using basic statistics of the predictor variable and the target variable. The missing value is imputed for each of the one or more predictor variables using the data from the one or more data sources and one or more built imputation models to generate a completed data set.

Type: Application

Filed: April 12, 2012

Publication date: August 29, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yea J. CHU, Sier HAN, Jing-Yun SHYR, Jing XU
MISSING VALUE IMPUTATION FOR PREDICTIVE MODELS

Publication number: 20130226838

Abstract: Provided are techniques for imputing a missing value for each of one or more predictor variables. Data is received from one or more data sources. For each of the one or more predictor variables, an imputation model is built based on information of a target variable; a type of imputation model to construct is determined based on the one or more data sources, a measurement level of the predictor variable, and a measurement level of the target variable; and the determined type of imputation model is constructed using basic statistics of the predictor variable and the target variable. The missing value is imputed for each of the one or more predictor variables using the data from the one or more data sources and one or more built imputation models to generate a completed data set.

Type: Application

Filed: February 23, 2012

Publication date: August 29, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yea J. CHU, Sier HAN, Jing-Yun SHYR, Jing XU

prev 1 2 3 4 next