Patents by Inventor Damir Spisic

Damir Spisic has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ADAPTIVE VARIABLE SELECTION FOR DATA CLUSTERING

Publication number: 20150286704

Abstract: One or more processors generate subsets of cluster feature (CF)-trees, which represent respective sets of local data as leaf entries. One or more processors collect variables that were used to generate the CF-trees included in the subsets. One or more processors generate respective approximate clustering solutions for the subsets by applying hierarchical agglomerative clustering to the collected variables and leaf entries of the plurality of CF-trees. One or more processors select candidate sets of variables with maximal goodness that are locally optimal for respective subsets based on the approximate clustering solutions. One or more processors select a set of variables, which produce an overall clustering solution, from the candidate sets of variables.

Type: Application

Filed: November 26, 2014

Publication date: October 8, 2015

Inventors: Jing-Yun Shyr, Damir Spisic, Jing Xu
ADAPTIVE VARIABLE SELECTION FOR DATA CLUSTERING

Publication number: 20150286703

Abstract: One or more processors initiate cluster feature (CF)-tree based hierarchical clustering on leaf entries of CF-trees included in a plurality of subsets. One or more processors, generate respective partial clustering solutions for the subsets. A partial clustering solution includes a set of regular sub-clusters and candidate outlier sub-clusters. One or more processors generate initial regular clusters by performing hierarchical clustering using the regular sub-clusters. For a candidate outlier sub-cluster, one or more processors determine a closest initial regular cluster, and a distance separating the candidate outlier sub-cluster and the closest initial regular cluster. One or more processors determine which candidate outlier sub-clusters are outlier clusters based on which candidate outlier sub-clusters have a computed distance to their respective closest initial regular cluster that is greater than a corresponding distance threshold associated with their respective closest initial regular cluster.

Type: Application

Filed: November 25, 2014

Publication date: October 8, 2015

Inventors: Svetlana Levitan, Jing-Yun Shyr, Damir Spisic, Jing Xu
DISTRIBUTED CLUSTERING WITH OUTLIER DETECTION

Publication number: 20150286707

Abstract: One or more processors initiate cluster feature (CF)-tree based hierarchical clustering on leaf entries of CF-trees included in a plurality of subsets. One or more processors, generate respective partial clustering solutions for the subsets. A partial clustering solution includes a set of regular sub-clusters and candidate outlier sub-clusters. One or more processors generate initial regular clusters by performing hierarchical clustering using the regular sub-clusters. For a candidate outlier sub-cluster, one or more processors determine a closest initial regular cluster, and a distance separating the candidate outlier sub-cluster and the closest initial regular cluster. One or more processors determine which candidate outlier sub-clusters are outlier clusters based on which candidate outlier sub-clusters have a computed distance to their respective closest initial regular cluster that is greater than a corresponding distance threshold associated with their respective closest initial regular cluster.

Type: Application

Filed: April 8, 2014

Publication date: October 8, 2015

Applicant: International Business Machines Corporation

Inventors: Svetlana Levitan, Jing-Yun Shyr, Damir Spisic, Jing Xu
ADAPTIVE VARIABLE SELECTION FOR DATA CLUSTERING

Publication number: 20150286702

Abstract: One or more processors generate subsets of cluster feature (CF)-trees, which represent respective sets of local data as leaf entries. One or more processors collect variables that were used to generate the CF-trees included in the subsets. One or more processors generate respective approximate clustering solutions for the subsets by applying hierarchical agglomerative clustering to the collected variables and leaf entries of the plurality of CF-trees. One or more processors select candidate sets of variables with maximal goodness that are locally optimal for respective subsets based on the approximate clustering solutions. One or more processors select a set of variables, which produce an overall clustering solution, from the candidate sets of variables.

Type: Application

Filed: April 8, 2014

Publication date: October 8, 2015

Applicant: International Business Machines Corporation

Inventors: Jing-Yun Shyr, Damir Spisic, Jing Xu
CONDENSING HIERARCHICAL DATA

Publication number: 20150186500

Abstract: A computing device includes at least one processor, and at least one module operable by the at least one processor to receive data representing a hierarchy, wherein the hierarchy comprises at least one set of sibling nodes and a respective parent node, generate a condensed hierarchy by determining a grouping for the at least one set of sibling nodes, determine whether the at least one set of sibling nodes can be represented by the respective parent node, based at least in part on the grouping for the at least one set of sibling nodes, and responsive to determining that the at least one set of sibling nodes can be represented by the respective parent node, remove the at least one set of sibling nodes from the condensed hierarchy. The at least one module may further be operable by the at least one processor to output the condensed hierarchy for display.

Type: Application

Filed: June 19, 2014

Publication date: July 2, 2015

Inventors: Daniel J. Rope, Jing-Yun Shyr, Damir Spisic
CONDENSING HIERARCHICAL DATA

Publication number: 20150186529

Abstract: A computing device includes at least one processor, and at least one module operable by the at least one processor to receive data representing a hierarchy, wherein the hierarchy comprises at least one set of sibling nodes and a respective parent node, generate a condensed hierarchy by determining a grouping for the at least one set of sibling nodes, determine whether the at least one set of sibling nodes can be represented by the respective parent node, based at least in part on the grouping for the at least one set of sibling nodes, and responsive to determining that the at least one set of sibling nodes can be represented by the respective parent node, remove the at least one set of sibling nodes from the condensed hierarchy. The at least one module may further be operable by the at least one processor to output the condensed hierarchy for display.

Type: Application

Filed: December 27, 2013

Publication date: July 2, 2015

Applicant: International Business Machines Corporation

Inventors: Daniel J. Rope, Jing-Yun Shyr, Damir Spisic
Relationship discovery in business analytics

Patent number: 9053170

Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.

Type: Grant

Filed: March 8, 2013

Date of Patent: June 9, 2015

Assignee: International Business Machines Corporation

Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
Generating a predictive model from multiple data sources

Patent number: 8996452

Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.

Type: Grant

Filed: July 10, 2012

Date of Patent: March 31, 2015

Assignee: International Business Machines Corporation

Inventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu
Generating a predictive model from multiple data sources

Patent number: 8990149

Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.

Type: Grant

Filed: March 15, 2011

Date of Patent: March 24, 2015

Assignee: International Business Machines Corporation

Inventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu
Relationship discovery in business analytics

Patent number: 8965895

Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.

Type: Grant

Filed: July 30, 2012

Date of Patent: February 24, 2015

Assignee: International Business Machines Corporation

Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
Interestingness of data

Patent number: 8880532

Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.

Type: Grant

Filed: June 29, 2011

Date of Patent: November 4, 2014

Assignee: International Business Machines Corporation

Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
Computing and applying order statistics for data preparation

Patent number: 8868573

Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.

Type: Grant

Filed: April 11, 2012

Date of Patent: October 21, 2014

Assignee: International Business Machines Corporation

Inventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
Interestingness of data

Patent number: 8843498

Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.

Type: Grant

Filed: September 13, 2012

Date of Patent: September 23, 2014

Assignee: International Business Machines Corporation

Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
RELATIONSHIP DISCOVERY IN BUSINESS ANALYTICS

Publication number: 20140032553

Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.

Type: Application

Filed: July 30, 2012

Publication date: January 30, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
COMPUTING AND APPLYING ORDER STATISTICS FOR DATA PREPARATION

Publication number: 20130218909

Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.

Type: Application

Filed: April 11, 2012

Publication date: August 22, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
COMPUTING AND APPLYING ORDER STATISTICS FOR DATA PREPARATION

Publication number: 20130218908

Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.

Type: Application

Filed: February 17, 2012

Publication date: August 22, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
INTERESTINGNESS OF DATA

Publication number: 20130006998

Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.

Type: Application

Filed: June 29, 2011

Publication date: January 3, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
INTERESTINGNESS OF DATA

Publication number: 20130007003

Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.

Type: Application

Filed: September 13, 2012

Publication date: January 3, 2013

Applicant: International Business Machines Corporation

Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
GENERATING A PREDICTIVE MODEL FROM MULTIPLE DATA SOURCES

Publication number: 20120278275

Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.

Type: Application

Filed: July 10, 2012

Publication date: November 1, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu
GENERATING A PREDICTIVE MODEL FROM MULTIPLE DATA SOURCES

Publication number: 20120239613

Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.

Type: Application

Filed: March 15, 2011

Publication date: September 20, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu

prev 1 2 3