Patents by Inventor Charu Aggarwal

Charu Aggarwal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods and Apparatus for Generating Decision Trees with Discriminants and Employing Same in Data Classification

Publication number: 20070288417

Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.

Type: Application

Filed: August 20, 2007

Publication date: December 13, 2007

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Method and apparatus for analyzing community evolution in graph data streams

Publication number: 20070288465

Abstract: Improved techniques are disclosed for detecting patterns of interaction among a set of entities and analyzing community evolution in a stream environment. By way of example, a technique for processing data from a data stream includes the following steps/operations. A data point of the data stream representing an interaction event is obtained. An interaction graph is updated on-line based on the data point representing the interaction event. The updated interaction graph is stored in a nonvolatile memory. An interaction evolution is determined off-line from the updated interaction graph stored in the nonvolatile memory.

Type: Application

Filed: October 5, 2005

Publication date: December 13, 2007

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Method and apparatus for variable privacy preservation in data mining

Publication number: 20070239982

Abstract: Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application.

Type: Application

Filed: October 13, 2005

Publication date: October 11, 2007

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Method and apparatus for privacy preserving data mining by restricting attribute choice

Publication number: 20070233711

Abstract: Improved techniques for privacy preserving data mining of multidimensional data records are disclosed. For example, a technique for generating at least one output data set from at least one input data set for use in association with a data mining process comprises the following steps/operations. At least one relevant attribute of the at least one input data set is selected through determination of at least one relevance coefficient. The at least one output data set is generated from the at least one input data set, wherein the at least one output data set comprises the at least one relevant attribute of the at least one input data set, as determined by use of the at least one relevance coefficient.

Type: Application

Filed: April 4, 2006

Publication date: October 4, 2007

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Nagui Halim
Methods and Apparatus for Dynamic Classification of Data in Evolving Data Stream

Publication number: 20070226216

Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.

Type: Application

Filed: May 31, 2007

Publication date: September 27, 2007

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Methods and Apparatus for Data Stream Clustering for Abnormality Monitoring

Publication number: 20070226212

Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.

Type: Application

Filed: May 24, 2007

Publication date: September 27, 2007

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Methods and Apparatus for Clustering Evolving Data Streams Through Online and Offline Components

Publication number: 20070226209

Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.

Type: Application

Filed: May 30, 2007

Publication date: September 27, 2007

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Systems and methods for providing real-time classification of continuous data streatms

Publication number: 20070043565

Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.

Type: Application

Filed: August 22, 2005

Publication date: February 22, 2007

Inventors: Charu Aggarwal, Philip Yu
Method and apparatus for processing data streams

Publication number: 20060282425

Abstract: Techniques are disclosed for clustering and classifying stream data. By way of example, a technique for processing a data stream comprises the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure may be used for classification of data in the data stream.

Type: Application

Filed: April 20, 2005

Publication date: December 14, 2006

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Systems and methods of data traffic generation via density estimation

Publication number: 20060242610

Abstract: Systems and methods for providing density-based traffic generation. Data are clustered to create partitions, and transforms of clustered data are constructed in a transformed space. Data points are generated via employing grid discretization in the transformed space, and density estimates of the generated data points are employed to generate synthetic pseudo-points.

Type: Application

Filed: March 29, 2005

Publication date: October 26, 2006

Applicant: IBM Corporation

Inventor: Charu Aggarwal
Methods and apparartus for monitoring abnormalities in data stream

Publication number: 20060064438

Abstract: A technique for monitoring a primary data stream comprising one or more secondary data streams for abnormalities is provided. A deviation value is determined for each of the one or more secondary data streams. The determined deviation values of the one or more secondary data streams are combined to form a combined deviation value. The combined deviation value is used to generate an abnormality signal.

Type: Application

Filed: September 17, 2004

Publication date: March 23, 2006

Applicant: International Business Machines Corporation

Inventor: Charu Aggarwal
System and method of flexible data reduction for arbitrary applications

Publication number: 20060026175

Abstract: The present invention is directed to the use of an evolutionary algorithm to locate optimal solution subspaces. The evolutionary algorithm uses a point-based coding of the subspace determination problem and searches selectively over the space of possible coded solutions. Each feasible solution to the problem, or individual in the population of feasible solutions, is coded as a string, which facilitates use of the evolutionary algorithm to determine the optimal solution to the fitness function. The fitness of each string is determined by solving the objective function for that string. The resulting fitness value can then be converted to a rank, and all of the members of the population of solutions can be evaluated using selection, crossover, and mutation processes that are applied sequentially and iteratively to the individuals in the population of solutions.

Type: Application

Filed: July 28, 2004

Publication date: February 2, 2006

Inventor: Charu Aggarwal
System and method for distributed privacy preserving data mining

Publication number: 20060015474

Abstract: Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.

Type: Application

Filed: July 16, 2004

Publication date: January 19, 2006

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Methods and apparatus for dynamic classification of data in evolving data stream

Publication number: 20060004754

Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.

Type: Application

Filed: June 30, 2004

Publication date: January 5, 2006

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Enabling interoperability between participants in a network

Publication number: 20050246262

Abstract: Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value metric. The semantic models may define interactions between the participants and define at least a portion of information produced or consumed by the participants. The determination of the values and the direction of the information flow may be performed multiple times in order to modify the one or more value metrics. The direction of information flow may allow participants to be deleted from the network, may allow participants to be added to the network, or may allow behavior of the participants to be modified.

Type: Application

Filed: April 29, 2004

Publication date: November 3, 2005

Inventors: Charu Aggarwal, Murray Campbell, Yuan-Chi Chang, Matthew Hill, Chung-Sheng Li, Milind Naphade, Sriram Padmanabhan, John Smith, Min Wang, Kun-Lung Wu, Philip Yu
Methods and apparatus for data stream clustering for abnormality monitoring

Publication number: 20050210027

Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.

Type: Application

Filed: March 16, 2004

Publication date: September 22, 2005

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Methods and apparatus for privacy preserving data mining using statistical condensing approach

Publication number: 20050049991

Abstract: Methods and apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process are provided. First, data statistics are constructed from the at least one input data set. Then, an output data set is generated from the data statistics. The output data set differs from the input data set but maintains one or more correlations from within the input data set. The correlations may be the inherent correlations between different dimensions of a multidimensional input data set. A significant amount of information from the input data set may be hidden so that the privacy level of the data mining process may be increased.

Type: Application

Filed: August 14, 2003

Publication date: March 3, 2005

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Methods and apparatus for clustering evolving data streams through online and offline components

Publication number: 20050038769

Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.

Type: Application

Filed: August 14, 2003

Publication date: February 17, 2005

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
System and method of query processing of time variant objects

Publication number: 20030018623

Abstract: The present invention provides a method for query processing of time variant objects. In order to achieve this, we create an efficient index structure on a parametric representation of the relevant attributes of objects. The method particularly relates to resolving different kinds of queries such as nearest neighbor query and range query. Such a technique can be used to efficiently retrieve objects in a very large database of objects whose attributes are both complex and varying with time. The technique can handle complex objects which have multiple attributes evolving possibly nonlinearly with time. Such a method can be used in applications that track mobile objects or it can be used in supermarket applications which track the evolution of consumer traits.

Type: Application

Filed: July 18, 2001

Publication date: January 23, 2003

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Dakshi Agrawal

prev 1 2