Patents by Inventor Charu Aggarwal

Charu Aggarwal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20070288417
    Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.
    Type: Application
    Filed: August 20, 2007
    Publication date: December 13, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070239982
    Abstract: Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application.
    Type: Application
    Filed: October 13, 2005
    Publication date: October 11, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070233711
    Abstract: Improved techniques for privacy preserving data mining of multidimensional data records are disclosed. For example, a technique for generating at least one output data set from at least one input data set for use in association with a data mining process comprises the following steps/operations. At least one relevant attribute of the at least one input data set is selected through determination of at least one relevance coefficient. The at least one output data set is generated from the at least one input data set, wherein the at least one output data set comprises the at least one relevant attribute of the at least one input data set, as determined by use of the at least one relevance coefficient.
    Type: Application
    Filed: April 4, 2006
    Publication date: October 4, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Nagui Halim
  • Publication number: 20070226212
    Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.
    Type: Application
    Filed: May 24, 2007
    Publication date: September 27, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070226209
    Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.
    Type: Application
    Filed: May 30, 2007
    Publication date: September 27, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070226216
    Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.
    Type: Application
    Filed: May 31, 2007
    Publication date: September 27, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070043565
    Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.
    Type: Application
    Filed: August 22, 2005
    Publication date: February 22, 2007
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20060282425
    Abstract: Techniques are disclosed for clustering and classifying stream data. By way of example, a technique for processing a data stream comprises the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure may be used for classification of data in the data stream.
    Type: Application
    Filed: April 20, 2005
    Publication date: December 14, 2006
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20060242610
    Abstract: Systems and methods for providing density-based traffic generation. Data are clustered to create partitions, and transforms of clustered data are constructed in a transformed space. Data points are generated via employing grid discretization in the transformed space, and density estimates of the generated data points are employed to generate synthetic pseudo-points.
    Type: Application
    Filed: March 29, 2005
    Publication date: October 26, 2006
    Applicant: IBM Corporation
    Inventor: Charu Aggarwal
  • Publication number: 20060064438
    Abstract: A technique for monitoring a primary data stream comprising one or more secondary data streams for abnormalities is provided. A deviation value is determined for each of the one or more secondary data streams. The determined deviation values of the one or more secondary data streams are combined to form a combined deviation value. The combined deviation value is used to generate an abnormality signal.
    Type: Application
    Filed: September 17, 2004
    Publication date: March 23, 2006
    Applicant: International Business Machines Corporation
    Inventor: Charu Aggarwal
  • Publication number: 20060026175
    Abstract: The present invention is directed to the use of an evolutionary algorithm to locate optimal solution subspaces. The evolutionary algorithm uses a point-based coding of the subspace determination problem and searches selectively over the space of possible coded solutions. Each feasible solution to the problem, or individual in the population of feasible solutions, is coded as a string, which facilitates use of the evolutionary algorithm to determine the optimal solution to the fitness function. The fitness of each string is determined by solving the objective function for that string. The resulting fitness value can then be converted to a rank, and all of the members of the population of solutions can be evaluated using selection, crossover, and mutation processes that are applied sequentially and iteratively to the individuals in the population of solutions.
    Type: Application
    Filed: July 28, 2004
    Publication date: February 2, 2006
    Inventor: Charu Aggarwal
  • Publication number: 20060015474
    Abstract: Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.
    Type: Application
    Filed: July 16, 2004
    Publication date: January 19, 2006
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20060004754
    Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.
    Type: Application
    Filed: June 30, 2004
    Publication date: January 5, 2006
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20050246262
    Abstract: Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value metric. The semantic models may define interactions between the participants and define at least a portion of information produced or consumed by the participants. The determination of the values and the direction of the information flow may be performed multiple times in order to modify the one or more value metrics. The direction of information flow may allow participants to be deleted from the network, may allow participants to be added to the network, or may allow behavior of the participants to be modified.
    Type: Application
    Filed: April 29, 2004
    Publication date: November 3, 2005
    Inventors: Charu Aggarwal, Murray Campbell, Yuan-Chi Chang, Matthew Hill, Chung-Sheng Li, Milind Naphade, Sriram Padmanabhan, John Smith, Min Wang, Kun-Lung Wu, Philip Yu
  • Publication number: 20050210027
    Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.
    Type: Application
    Filed: March 16, 2004
    Publication date: September 22, 2005
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20050049991
    Abstract: Methods and apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process are provided. First, data statistics are constructed from the at least one input data set. Then, an output data set is generated from the data statistics. The output data set differs from the input data set but maintains one or more correlations from within the input data set. The correlations may be the inherent correlations between different dimensions of a multidimensional input data set. A significant amount of information from the input data set may be hidden so that the privacy level of the data mining process may be increased.
    Type: Application
    Filed: August 14, 2003
    Publication date: March 3, 2005
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20050038769
    Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.
    Type: Application
    Filed: August 14, 2003
    Publication date: February 17, 2005
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20030018623
    Abstract: The present invention provides a method for query processing of time variant objects. In order to achieve this, we create an efficient index structure on a parametric representation of the relevant attributes of objects. The method particularly relates to resolving different kinds of queries such as nearest neighbor query and range query. Such a technique can be used to efficiently retrieve objects in a very large database of objects whose attributes are both complex and varying with time. The technique can handle complex objects which have multiple attributes evolving possibly nonlinearly with time. Such a method can be used in applications that track mobile objects or it can be used in supermarket applications which track the evolution of consumer traits.
    Type: Application
    Filed: July 18, 2001
    Publication date: January 23, 2003
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Dakshi Agrawal