Patents by Inventor Damir Spisic

Damir Spisic has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20150286704
    Abstract: One or more processors generate subsets of cluster feature (CF)-trees, which represent respective sets of local data as leaf entries. One or more processors collect variables that were used to generate the CF-trees included in the subsets. One or more processors generate respective approximate clustering solutions for the subsets by applying hierarchical agglomerative clustering to the collected variables and leaf entries of the plurality of CF-trees. One or more processors select candidate sets of variables with maximal goodness that are locally optimal for respective subsets based on the approximate clustering solutions. One or more processors select a set of variables, which produce an overall clustering solution, from the candidate sets of variables.
    Type: Application
    Filed: November 26, 2014
    Publication date: October 8, 2015
    Inventors: Jing-Yun Shyr, Damir Spisic, Jing Xu
  • Publication number: 20150286703
    Abstract: One or more processors initiate cluster feature (CF)-tree based hierarchical clustering on leaf entries of CF-trees included in a plurality of subsets. One or more processors, generate respective partial clustering solutions for the subsets. A partial clustering solution includes a set of regular sub-clusters and candidate outlier sub-clusters. One or more processors generate initial regular clusters by performing hierarchical clustering using the regular sub-clusters. For a candidate outlier sub-cluster, one or more processors determine a closest initial regular cluster, and a distance separating the candidate outlier sub-cluster and the closest initial regular cluster. One or more processors determine which candidate outlier sub-clusters are outlier clusters based on which candidate outlier sub-clusters have a computed distance to their respective closest initial regular cluster that is greater than a corresponding distance threshold associated with their respective closest initial regular cluster.
    Type: Application
    Filed: November 25, 2014
    Publication date: October 8, 2015
    Inventors: Svetlana Levitan, Jing-Yun Shyr, Damir Spisic, Jing Xu
  • Publication number: 20150286707
    Abstract: One or more processors initiate cluster feature (CF)-tree based hierarchical clustering on leaf entries of CF-trees included in a plurality of subsets. One or more processors, generate respective partial clustering solutions for the subsets. A partial clustering solution includes a set of regular sub-clusters and candidate outlier sub-clusters. One or more processors generate initial regular clusters by performing hierarchical clustering using the regular sub-clusters. For a candidate outlier sub-cluster, one or more processors determine a closest initial regular cluster, and a distance separating the candidate outlier sub-cluster and the closest initial regular cluster. One or more processors determine which candidate outlier sub-clusters are outlier clusters based on which candidate outlier sub-clusters have a computed distance to their respective closest initial regular cluster that is greater than a corresponding distance threshold associated with their respective closest initial regular cluster.
    Type: Application
    Filed: April 8, 2014
    Publication date: October 8, 2015
    Applicant: International Business Machines Corporation
    Inventors: Svetlana Levitan, Jing-Yun Shyr, Damir Spisic, Jing Xu
  • Publication number: 20150286702
    Abstract: One or more processors generate subsets of cluster feature (CF)-trees, which represent respective sets of local data as leaf entries. One or more processors collect variables that were used to generate the CF-trees included in the subsets. One or more processors generate respective approximate clustering solutions for the subsets by applying hierarchical agglomerative clustering to the collected variables and leaf entries of the plurality of CF-trees. One or more processors select candidate sets of variables with maximal goodness that are locally optimal for respective subsets based on the approximate clustering solutions. One or more processors select a set of variables, which produce an overall clustering solution, from the candidate sets of variables.
    Type: Application
    Filed: April 8, 2014
    Publication date: October 8, 2015
    Applicant: International Business Machines Corporation
    Inventors: Jing-Yun Shyr, Damir Spisic, Jing Xu
  • Publication number: 20150186500
    Abstract: A computing device includes at least one processor, and at least one module operable by the at least one processor to receive data representing a hierarchy, wherein the hierarchy comprises at least one set of sibling nodes and a respective parent node, generate a condensed hierarchy by determining a grouping for the at least one set of sibling nodes, determine whether the at least one set of sibling nodes can be represented by the respective parent node, based at least in part on the grouping for the at least one set of sibling nodes, and responsive to determining that the at least one set of sibling nodes can be represented by the respective parent node, remove the at least one set of sibling nodes from the condensed hierarchy. The at least one module may further be operable by the at least one processor to output the condensed hierarchy for display.
    Type: Application
    Filed: June 19, 2014
    Publication date: July 2, 2015
    Inventors: Daniel J. Rope, Jing-Yun Shyr, Damir Spisic
  • Publication number: 20150186529
    Abstract: A computing device includes at least one processor, and at least one module operable by the at least one processor to receive data representing a hierarchy, wherein the hierarchy comprises at least one set of sibling nodes and a respective parent node, generate a condensed hierarchy by determining a grouping for the at least one set of sibling nodes, determine whether the at least one set of sibling nodes can be represented by the respective parent node, based at least in part on the grouping for the at least one set of sibling nodes, and responsive to determining that the at least one set of sibling nodes can be represented by the respective parent node, remove the at least one set of sibling nodes from the condensed hierarchy. The at least one module may further be operable by the at least one processor to output the condensed hierarchy for display.
    Type: Application
    Filed: December 27, 2013
    Publication date: July 2, 2015
    Applicant: International Business Machines Corporation
    Inventors: Daniel J. Rope, Jing-Yun Shyr, Damir Spisic
  • Patent number: 9053170
    Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.
    Type: Grant
    Filed: March 8, 2013
    Date of Patent: June 9, 2015
    Assignee: International Business Machines Corporation
    Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
  • Patent number: 8996452
    Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.
    Type: Grant
    Filed: July 10, 2012
    Date of Patent: March 31, 2015
    Assignee: International Business Machines Corporation
    Inventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu
  • Patent number: 8990149
    Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.
    Type: Grant
    Filed: March 15, 2011
    Date of Patent: March 24, 2015
    Assignee: International Business Machines Corporation
    Inventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu
  • Patent number: 8965895
    Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.
    Type: Grant
    Filed: July 30, 2012
    Date of Patent: February 24, 2015
    Assignee: International Business Machines Corporation
    Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
  • Patent number: 8880532
    Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.
    Type: Grant
    Filed: June 29, 2011
    Date of Patent: November 4, 2014
    Assignee: International Business Machines Corporation
    Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
  • Patent number: 8868573
    Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.
    Type: Grant
    Filed: April 11, 2012
    Date of Patent: October 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
  • Patent number: 8843498
    Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.
    Type: Grant
    Filed: September 13, 2012
    Date of Patent: September 23, 2014
    Assignee: International Business Machines Corporation
    Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
  • Publication number: 20140032553
    Abstract: A subset of (k?1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k?1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions.
    Type: Application
    Filed: July 30, 2012
    Publication date: January 30, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yea J. Chu, Sier Han, Jing-Yun Shyr, Damir Spisic, Xueying Zhang
  • Publication number: 20130218909
    Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.
    Type: Application
    Filed: April 11, 2012
    Publication date: August 22, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
  • Publication number: 20130218908
    Abstract: Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.
    Type: Application
    Filed: February 17, 2012
    Publication date: August 22, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yea J. Chu, Sier Han, Fan Li, Jing-Yun Shyr, Damir Spisic, Graham J. Wills, Jing Xu
  • Publication number: 20130006998
    Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.
    Type: Application
    Filed: June 29, 2011
    Publication date: January 3, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
  • Publication number: 20130007003
    Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 3, 2013
    Applicant: International Business Machines Corporation
    Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
  • Publication number: 20120278275
    Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.
    Type: Application
    Filed: July 10, 2012
    Publication date: November 1, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu
  • Publication number: 20120239613
    Abstract: Techniques are disclosed for generating an ensemble model from multiple data sources. In one embodiment, the ensemble model is generated using a global validation sample, a global holdout sample and base models generated from the multiple data sources. An accuracy value may be determined for each base model, on the basis of the global validation dataset. The ensemble model may be generated from a subset of the base models, where the subset is selected on the basis of the determined accuracy values.
    Type: Application
    Filed: March 15, 2011
    Publication date: September 20, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Marius I. Danciu, Fan Li, Michael McRoberts, Jing-Yun Shyr, Damir Spisic, Jing Xu