Patents by Inventor Charu Aggarwal
Charu Aggarwal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20070288417Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.Type: ApplicationFiled: August 20, 2007Publication date: December 13, 2007Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20070288465Abstract: Improved techniques are disclosed for detecting patterns of interaction among a set of entities and analyzing community evolution in a stream environment. By way of example, a technique for processing data from a data stream includes the following steps/operations. A data point of the data stream representing an interaction event is obtained. An interaction graph is updated on-line based on the data point representing the interaction event. The updated interaction graph is stored in a nonvolatile memory. An interaction evolution is determined off-line from the updated interaction graph stored in the nonvolatile memory.Type: ApplicationFiled: October 5, 2005Publication date: December 13, 2007Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20070239982Abstract: Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application.Type: ApplicationFiled: October 13, 2005Publication date: October 11, 2007Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20070233711Abstract: Improved techniques for privacy preserving data mining of multidimensional data records are disclosed. For example, a technique for generating at least one output data set from at least one input data set for use in association with a data mining process comprises the following steps/operations. At least one relevant attribute of the at least one input data set is selected through determination of at least one relevance coefficient. The at least one output data set is generated from the at least one input data set, wherein the at least one output data set comprises the at least one relevant attribute of the at least one input data set, as determined by use of the at least one relevance coefficient.Type: ApplicationFiled: April 4, 2006Publication date: October 4, 2007Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Nagui Halim
-
Publication number: 20070226216Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.Type: ApplicationFiled: May 31, 2007Publication date: September 27, 2007Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20070226212Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.Type: ApplicationFiled: May 24, 2007Publication date: September 27, 2007Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20070226209Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.Type: ApplicationFiled: May 30, 2007Publication date: September 27, 2007Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20070043565Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.Type: ApplicationFiled: August 22, 2005Publication date: February 22, 2007Inventors: Charu Aggarwal, Philip Yu
-
Publication number: 20060282425Abstract: Techniques are disclosed for clustering and classifying stream data. By way of example, a technique for processing a data stream comprises the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure may be used for classification of data in the data stream.Type: ApplicationFiled: April 20, 2005Publication date: December 14, 2006Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20060242610Abstract: Systems and methods for providing density-based traffic generation. Data are clustered to create partitions, and transforms of clustered data are constructed in a transformed space. Data points are generated via employing grid discretization in the transformed space, and density estimates of the generated data points are employed to generate synthetic pseudo-points.Type: ApplicationFiled: March 29, 2005Publication date: October 26, 2006Applicant: IBM CorporationInventor: Charu Aggarwal
-
Publication number: 20060064438Abstract: A technique for monitoring a primary data stream comprising one or more secondary data streams for abnormalities is provided. A deviation value is determined for each of the one or more secondary data streams. The determined deviation values of the one or more secondary data streams are combined to form a combined deviation value. The combined deviation value is used to generate an abnormality signal.Type: ApplicationFiled: September 17, 2004Publication date: March 23, 2006Applicant: International Business Machines CorporationInventor: Charu Aggarwal
-
Publication number: 20060026175Abstract: The present invention is directed to the use of an evolutionary algorithm to locate optimal solution subspaces. The evolutionary algorithm uses a point-based coding of the subspace determination problem and searches selectively over the space of possible coded solutions. Each feasible solution to the problem, or individual in the population of feasible solutions, is coded as a string, which facilitates use of the evolutionary algorithm to determine the optimal solution to the fitness function. The fitness of each string is determined by solving the objective function for that string. The resulting fitness value can then be converted to a rank, and all of the members of the population of solutions can be evaluated using selection, crossover, and mutation processes that are applied sequentially and iteratively to the individuals in the population of solutions.Type: ApplicationFiled: July 28, 2004Publication date: February 2, 2006Inventor: Charu Aggarwal
-
Publication number: 20060015474Abstract: Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.Type: ApplicationFiled: July 16, 2004Publication date: January 19, 2006Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20060004754Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.Type: ApplicationFiled: June 30, 2004Publication date: January 5, 2006Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20050246262Abstract: Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value metric. The semantic models may define interactions between the participants and define at least a portion of information produced or consumed by the participants. The determination of the values and the direction of the information flow may be performed multiple times in order to modify the one or more value metrics. The direction of information flow may allow participants to be deleted from the network, may allow participants to be added to the network, or may allow behavior of the participants to be modified.Type: ApplicationFiled: April 29, 2004Publication date: November 3, 2005Inventors: Charu Aggarwal, Murray Campbell, Yuan-Chi Chang, Matthew Hill, Chung-Sheng Li, Milind Naphade, Sriram Padmanabhan, John Smith, Min Wang, Kun-Lung Wu, Philip Yu
-
Publication number: 20050210027Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.Type: ApplicationFiled: March 16, 2004Publication date: September 22, 2005Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20050049991Abstract: Methods and apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process are provided. First, data statistics are constructed from the at least one input data set. Then, an output data set is generated from the data statistics. The output data set differs from the input data set but maintains one or more correlations from within the input data set. The correlations may be the inherent correlations between different dimensions of a multidimensional input data set. A significant amount of information from the input data set may be hidden so that the privacy level of the data mining process may be increased.Type: ApplicationFiled: August 14, 2003Publication date: March 3, 2005Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20050038769Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.Type: ApplicationFiled: August 14, 2003Publication date: February 17, 2005Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Philip Yu
-
Publication number: 20030018623Abstract: The present invention provides a method for query processing of time variant objects. In order to achieve this, we create an efficient index structure on a parametric representation of the relevant attributes of objects. The method particularly relates to resolving different kinds of queries such as nearest neighbor query and range query. Such a technique can be used to efficiently retrieve objects in a very large database of objects whose attributes are both complex and varying with time. The technique can handle complex objects which have multiple attributes evolving possibly nonlinearly with time. Such a method can be used in applications that track mobile objects or it can be used in supermarket applications which track the evolution of consumer traits.Type: ApplicationFiled: July 18, 2001Publication date: January 23, 2003Applicant: International Business Machines CorporationInventors: Charu Aggarwal, Dakshi Agrawal