Patents by Inventor Damir Spisic
Damir Spisic has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20150286704Abstract: One or more processors generate subsets of cluster feature (CF)-trees, which represent respective sets of local data as leaf entries. One or more processors collect variables that were used to generate the CF-trees included in the subsets. One or more processors generate respective approximate clustering solutions for the subsets by applying hierarchical agglomerative clustering to the collected variables and leaf entries of the plurality of CF-trees. One or more processors select candidate sets of variables with maximal goodness that are locally optimal for respective subsets based on the approximate clustering solutions. One or more processors select a set of variables, which produce an overall clustering solution, from the candidate sets of variables.Type: ApplicationFiled: November 26, 2014Publication date: October 8, 2015Inventors: Jing-Yun Shyr, Damir Spisic, Jing Xu
-
Publication number: 20150286703Abstract: One or more processors initiate cluster feature (CF)-tree based hierarchical clustering on leaf entries of CF-trees included in a plurality of subsets. One or more processors, generate respective partial clustering solutions for the subsets. A partial clustering solution includes a set of regular sub-clusters and candidate outlier sub-clusters. One or more processors generate initial regular clusters by performing hierarchical clustering using the regular sub-clusters. For a candidate outlier sub-cluster, one or more processors determine a closest initial regular cluster, and a distance separating the candidate outlier sub-cluster and the closest initial regular cluster. One or more processors determine which candidate outlier sub-clusters are outlier clusters based on which candidate outlier sub-clusters have a computed distance to their respective closest initial regular cluster that is greater than a corresponding distance threshold associated with their respective closest initial regular cluster.Type: ApplicationFiled: November 25, 2014Publication date: October 8, 2015Inventors: Svetlana Levitan, Jing-Yun Shyr, Damir Spisic, Jing Xu
-
Publication number: 20150286707Abstract: One or more processors initiate cluster feature (CF)-tree based hierarchical clustering on leaf entries of CF-trees included in a plurality of subsets. One or more processors, generate respective partial clustering solutions for the subsets. A partial clustering solution includes a set of regular sub-clusters and candidate outlier sub-clusters. One or more processors generate initial regular clusters by performing hierarchical clustering using the regular sub-clusters. For a candidate outlier sub-cluster, one or more processors determine a closest initial regular cluster, and a distance separating the candidate outlier sub-cluster and the closest initial regular cluster. One or more processors determine which candidate outlier sub-clusters are outlier clusters based on which candidate outlier sub-clusters have a computed distance to their respective closest initial regular cluster that is greater than a corresponding distance threshold associated with their respective closest initial regular cluster.Type: ApplicationFiled: April 8, 2014Publication date: October 8, 2015Applicant: International Business Machines CorporationInventors: Svetlana Levitan, Jing-Yun Shyr, Damir Spisic, Jing Xu
-
Publication number: 20150286702Abstract: One or more processors generate subsets of cluster feature (CF)-trees, which represent respective sets of local data as leaf entries. One or more processors collect variables that were used to generate the CF-trees included in the subsets. One or more processors generate respective approximate clustering solutions for the subsets by applying hierarchical agglomerative clustering to the collected variables and leaf entries of the plurality of CF-trees. One or more processors select candidate sets of variables with maximal goodness that are locally optimal for respective subsets based on the approximate clustering solutions. One or more processors select a set of variables, which produce an overall clustering solution, from the candidate sets of variables.Type: ApplicationFiled: April 8, 2014Publication date: October 8, 2015Applicant: International Business Machines CorporationInventors: Jing-Yun Shyr, Damir Spisic, Jing Xu
-
Publication number: 20150186500Abstract: A computing device includes at least one processor, and at least one module operable by the at least one processor to receive data representing a hierarchy, wherein the hierarchy comprises at least one set of sibling nodes and a respective parent node, generate a condensed hierarchy by determining a grouping for the at least one set of sibling nodes, determine whether the at least one set of sibling nodes can be represented by the respective parent node, based at least in part on the grouping for the at least one set of sibling nodes, and responsive to determining that the at least one set of sibling nodes can be represented by the respective parent node, remove the at least one set of sibling nodes from the condensed hierarchy. The at least one module may further be operable by the at least one processor to output the condensed hierarchy for display.Type: ApplicationFiled: June 19, 2014Publication date: July 2, 2015Inventors: Daniel J. Rope, Jing-Yun Shyr, Damir Spisic
-
Publication number: 20150186529Abstract: A computing device includes at least one processor, and at least one module operable by the at least one processor to receive data representing a hierarchy, wherein the hierarchy comprises at least one set of sibling nodes and a respective parent node, generate a condensed hierarchy by determining a grouping for the at least one set of sibling nodes, determine whether the at least one set of sibling nodes can be represented by the respective parent node, based at least in part on the grouping for the at least one set of sibling nodes, and responsive to determining that the at least one set of sibling nodes can be represented by the respective parent node, remove the at least one set of sibling nodes from the condensed hierarchy. The at least one module may further be operable by the at least one processor to output the condensed hierarchy for display.Type: ApplicationFiled: December 27, 2013Publication date: July 2, 2015Applicant: International Business Machines CorporationInventors: Daniel J. Rope, Jing-Yun Shyr, Damir Spisic
-
Patent number: 9053170Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.Type: GrantFiled: March 8, 2013Date of Patent: June 9, 2015Assignee: International Business Machines CorporationInventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
-
Patent number: 8996452Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.Type: GrantFiled: July 10, 2012Date of Patent: March 31, 2015Assignee: International Business Machines CorporationInventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu
-
Patent number: 8990149Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.Type: GrantFiled: March 15, 2011Date of Patent: March 24, 2015Assignee: International Business Machines CorporationInventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu
-
Patent number: 8965895Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.Type: GrantFiled: July 30, 2012Date of Patent: February 24, 2015Assignee: International Business Machines CorporationInventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
-
Patent number: 8880532Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.Type: GrantFiled: June 29, 2011Date of Patent: November 4, 2014Assignee: International Business Machines CorporationInventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
-
Patent number: 8868573Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.Type: GrantFiled: April 11, 2012Date of Patent: October 21, 2014Assignee: International Business Machines CorporationInventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
-
Patent number: 8843498Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.Type: GrantFiled: September 13, 2012Date of Patent: September 23, 2014Assignee: International Business Machines CorporationInventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
-
Publication number: 20140032553Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.Type: ApplicationFiled: July 30, 2012Publication date: January 30, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
-
Publication number: 20130218909Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.Type: ApplicationFiled: April 11, 2012Publication date: August 22, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
-
Publication number: 20130218908Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.Type: ApplicationFiled: February 17, 2012Publication date: August 22, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
-
Publication number: 20130006998Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.Type: ApplicationFiled: June 29, 2011Publication date: January 3, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
-
Publication number: 20130007003Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.Type: ApplicationFiled: September 13, 2012Publication date: January 3, 2013Applicant: International Business Machines CorporationInventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
-
Publication number: 20120278275Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.Type: ApplicationFiled: July 10, 2012Publication date: November 1, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu
-
Publication number: 20120239613Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.Type: ApplicationFiled: March 15, 2011Publication date: September 20, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu