Patents by Inventor Charu C. Aggarwal

Charu C. Aggarwal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7716154
    Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.
    Type: Grant
    Filed: August 20, 2007
    Date of Patent: May 11, 2010
    Assignee: International Business Machines Coporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20090319526
    Abstract: Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application.
    Type: Application
    Filed: May 13, 2008
    Publication date: December 24, 2009
    Applicant: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20090292979
    Abstract: A technique for monitoring a primary data stream comprising a plurality of secondary data streams for abnormalities is provided. A deviation value for each of two or more of the plurality of secondary data streams is determined. The two or more deviation values of the two or more secondary data streams are combined to form a combined deviation value. An abnormality signal is generated based at least in part on the combined deviation value.
    Type: Application
    Filed: July 23, 2009
    Publication date: November 26, 2009
    Applicant: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Publication number: 20090281971
    Abstract: Systems and methods for object classification are provided. An object is identified along with the attributes that describe that object. These attributes are grouped into attribute patterns. Classes to be used in the classification are also identified. For each identified class a sketch table containing a plurality of parallel hash tables is created and trained using known objects with known classifications. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table. This results in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern. This produces a discriminatory power for each attribute pattern. Those attribute patterns having a discriminatory power above a given threshold are selected. The selected attribute patterns and associated sketch table values are added.
    Type: Application
    Filed: May 9, 2008
    Publication date: November 12, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charu C. Aggarwal, Philip S. Yu
  • Publication number: 20090222410
    Abstract: Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.
    Type: Application
    Filed: February 28, 2008
    Publication date: September 3, 2009
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20090222472
    Abstract: Techniques are disclosed for aggregation in uncertain data in data processing systems. For example, a method of aggregation in an application that involves an uncertain data set includes the following steps. The uncertain data set along with uncertainty information is obtained. One or more clusters of data points are constructed from the data set. Aggregate statistics of the one or more clusters and uncertainty information are stored. The data set may be data from a data stream. It is realized that the use of even modest uncertainty information during an application such as a data mining process is sufficient to greatly improve the quality of the underlying results.
    Type: Application
    Filed: February 28, 2008
    Publication date: September 3, 2009
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7487167
    Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.
    Type: Grant
    Filed: May 31, 2007
    Date of Patent: February 3, 2009
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7475085
    Abstract: Improved techniques for privacy preserving data mining of multidimensional data records are disclosed. For example, a technique for generating at least one output data set from at least one input data set for use in association with a data mining process comprises the following steps/operations. At least one relevant attribute of the at least one input data set is selected through determination of at least one relevance coefficient. The at least one output data set is generated from the at least one input data set, wherein the at least one output data set comprises the at least one relevant attribute of the at least one input data set, as determined by use of the at least one relevance coefficient.
    Type: Grant
    Filed: April 4, 2006
    Date of Patent: January 6, 2009
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Nagui Halim
  • Publication number: 20080281626
    Abstract: Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value metric. The semantic models may define interactions between the participants and define at least a portion of information produced or consumed by the participants. The determination of the values and the direction of the information flow may be performed multiple times in order to modify the one or more value metrics. The direction of information flow may allow participants to be deleted from the network, may allow participants to be added to the network, or may allow behavior of the participants to be modified.
    Type: Application
    Filed: July 24, 2008
    Publication date: November 13, 2008
    Applicant: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Murray Scott Campbell, Yuan-Chi Chang, Matthew Leon Hill, Chung-Sheng Li, Milind R. Naphade, Sriram K. Padmanabhan, John R. Smith, Min Wang, Kun-Lung Wu, Philip Shilung Yu
  • Publication number: 20080243742
    Abstract: Techniques are disclosed for predicting the future behavior of data streams through the use of current trends of the data stream. By way of example, a technique for predicting the future behavior of a data stream comprises the following steps/operations. Statistics are obtained from the data stream. Estimated statistics for a future time interval are generated by using at least a portion of the obtained statistics. A portion of the estimated statistics are utilized to generate one or more representative pseudo-data records within the future time interval. Pseudo-data records are utilized for forecasting of at least one characteristic of the data stream.
    Type: Application
    Filed: June 10, 2008
    Publication date: October 2, 2008
    Applicant: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Publication number: 20080234977
    Abstract: Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data.
    Type: Application
    Filed: June 6, 2008
    Publication date: September 25, 2008
    Applicant: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7421452
    Abstract: Techniques are disclosed for predicting the future behavior of data streams through the use of current trends of the data stream. By way of example, a technique for predicting the future behavior of a data stream comprises the following steps/operations. Statistics are obtained from the data stream. Estimated statistics for a future time interval are generated by using at least a portion of the obtained statistics. A portion of the estimated statistics are utilized to generate one or more representative pseudo-data records within the future time interval. Pseudo-data records are utilized for forecasting of at least one characteristic of the data stream.
    Type: Grant
    Filed: June 14, 2006
    Date of Patent: September 2, 2008
    Assignee: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 7395250
    Abstract: Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data.
    Type: Grant
    Filed: October 11, 2000
    Date of Patent: July 1, 2008
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20080133438
    Abstract: A system and method for feature based load shedding in classification. The system includes a plurality of data sources. The plurality of data sources being configured to render independent streams of input data, such data being selectively grouped together to form a particular classification task. The system further includes a central classification server configured to analyze and execute multiple tasks, each task consisting of multiple input data. The central classification server further configured to analyze the data for knowledge-based decision-making. The central classification server being communicatively engaged via a network to the plurality of data sources. The method includes rendering independent streams of input data, such data being selectively grouped together to form a particular task. The method further includes analyzing and handling multiple tasks, each task consisting of multiple input data. The method also includes analyzing the data for knowledge-based decision-making.
    Type: Application
    Filed: November 30, 2006
    Publication date: June 5, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charu C. Aggarwal, Haixun Wang
  • Patent number: 7379939
    Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: May 27, 2008
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20080082475
    Abstract: A system and method for resource adaptive classification of data streams. Embodiments of systems and methods provide classifying data received in a computer, including discretizing the received data, constructing an intermediate data structure from said received data as training instances, performing subspace sampling on said received data as test instances and adaptively classifying said received data based on statistics of said subspace sampling.
    Type: Application
    Filed: September 12, 2006
    Publication date: April 3, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charu C. Aggarwal, Philip Shi-lung Yu
  • Publication number: 20080082566
    Abstract: Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.
    Type: Application
    Filed: September 30, 2006
    Publication date: April 3, 2008
    Applicant: IBM Corporation
    Inventors: Charu C. Aggarwal, Philip S. Yu
  • Patent number: 7353218
    Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.
    Type: Grant
    Filed: August 14, 2003
    Date of Patent: April 1, 2008
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20070294216
    Abstract: Techniques are disclosed for predicting the future behavior of data streams through the use of current trends of the data stream. By way of example, a technique for predicting the future behavior of a data stream comprises the following steps/operations. Statistics are obtained from the data stream. Estimated statistics for a future time interval are generated by using at least a portion of the obtained statistics. A portion of the estimated statistics are utilized to generate one or more representative pseudo-data records within the future time interval. Pseudo-data records are utilized for forecasting of at least one characteristic of the data stream.
    Type: Application
    Filed: June 14, 2006
    Publication date: December 20, 2007
    Applicant: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 7310624
    Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.
    Type: Grant
    Filed: May 2, 2000
    Date of Patent: December 18, 2007
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu