Patents by Inventor Charu C. Aggarwal

Charu C. Aggarwal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods and apparatus for generating decision trees with discriminants and employing same in data classification

Patent number: 7716154

Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.

Type: Grant

Filed: August 20, 2007

Date of Patent: May 11, 2010

Assignee: International Business Machines Coporation

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
Method and Apparatus for Variable Privacy Preservation in Data Mining

Publication number: 20090319526

Abstract: Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application.

Type: Application

Filed: May 13, 2008

Publication date: December 24, 2009

Applicant: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
Methods and Apparatus for Monitoring Abnormalities in Data Stream

Publication number: 20090292979

Abstract: A technique for monitoring a primary data stream comprising a plurality of secondary data streams for abnormalities is provided. A deviation value for each of two or more of the plurality of secondary data streams is determined. The two or more deviation values of the two or more secondary data streams are combined to form a combined deviation value. An abnormality signal is generated based at least in part on the combined deviation value.

Type: Application

Filed: July 23, 2009

Publication date: November 26, 2009

Applicant: International Business Machines Corporation

Inventor: Charu C. Aggarwal
SYSTEM AND METHOD FOR CLASSIFYING DATA STREAMS WITH VERY LARGE CARDINALITY

Publication number: 20090281971

Abstract: Systems and methods for object classification are provided. An object is identified along with the attributes that describe that object. These attributes are grouped into attribute patterns. Classes to be used in the classification are also identified. For each identified class a sketch table containing a plurality of parallel hash tables is created and trained using known objects with known classifications. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table. This results in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern. This produces a discriminatory power for each attribute pattern. Those attribute patterns having a discriminatory power above a given threshold are selected. The selected attribute patterns and associated sketch table values are added.

Type: Application

Filed: May 9, 2008

Publication date: November 12, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Charu C. Aggarwal, Philip S. Yu
Method and Apparatus for Query Processing of Uncertain Data

Publication number: 20090222410

Abstract: Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.

Type: Application

Filed: February 28, 2008

Publication date: September 3, 2009

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
Method and Apparatus for Aggregation in Uncertain Data

Publication number: 20090222472

Abstract: Techniques are disclosed for aggregation in uncertain data in data processing systems. For example, a method of aggregation in an application that involves an uncertain data set includes the following steps. The uncertain data set along with uncertainty information is obtained. One or more clusters of data points are constructed from the data set. Aggregate statistics of the one or more clusters and uncertainty information are stored. The data set may be data from a data stream. It is realized that the use of even modest uncertainty information during an application such as a data mining process is sufficient to greatly improve the quality of the underlying results.

Type: Application

Filed: February 28, 2008

Publication date: September 3, 2009

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
Apparatus for dynamic classification of data in evolving data stream

Patent number: 7487167

Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.

Type: Grant

Filed: May 31, 2007

Date of Patent: February 3, 2009

Assignee: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
Method and apparatus for privacy preserving data mining by restricting attribute choice

Patent number: 7475085

Abstract: Improved techniques for privacy preserving data mining of multidimensional data records are disclosed. For example, a technique for generating at least one output data set from at least one input data set for use in association with a data mining process comprises the following steps/operations. At least one relevant attribute of the at least one input data set is selected through determination of at least one relevance coefficient. The at least one output data set is generated from the at least one input data set, wherein the at least one output data set comprises the at least one relevant attribute of the at least one input data set, as determined by use of the at least one relevance coefficient.

Type: Grant

Filed: April 4, 2006

Date of Patent: January 6, 2009

Assignee: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Nagui Halim
Enabling Interoperability Between Participants in a Network

Publication number: 20080281626

Abstract: Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value metric. The semantic models may define interactions between the participants and define at least a portion of information produced or consumed by the participants. The determination of the values and the direction of the information flow may be performed multiple times in order to modify the one or more value metrics. The direction of information flow may allow participants to be deleted from the network, may allow participants to be added to the network, or may allow behavior of the participants to be modified.

Type: Application

Filed: July 24, 2008

Publication date: November 13, 2008

Applicant: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Murray Scott Campbell, Yuan-Chi Chang, Matthew Leon Hill, Chung-Sheng Li, Milind R. Naphade, Sriram K. Padmanabhan, John R. Smith, Min Wang, Kun-Lung Wu, Philip Shilung Yu
Method and Apparatus for Predicting Future Behavior of Data Streams

Publication number: 20080243742

Abstract: Techniques are disclosed for predicting the future behavior of data streams through the use of current trends of the data stream. By way of example, a technique for predicting the future behavior of a data stream comprises the following steps/operations. Statistics are obtained from the data stream. Estimated statistics for a future time interval are generated by using at least a portion of the obtained statistics. A portion of the estimated statistics are utilized to generate one or more representative pseudo-data records within the future time interval. Pseudo-data records are utilized for forecasting of at least one characteristic of the data stream.

Type: Application

Filed: June 10, 2008

Publication date: October 2, 2008

Applicant: International Business Machines Corporation

Inventor: Charu C. Aggarwal
Methods and Apparatus for Outlier Detection for High Dimensional Data Sets

Publication number: 20080234977

Abstract: Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data.

Type: Application

Filed: June 6, 2008

Publication date: September 25, 2008

Applicant: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
Method and apparatus for predicting future behavior of data streams

Patent number: 7421452

Abstract: Techniques are disclosed for predicting the future behavior of data streams through the use of current trends of the data stream. By way of example, a technique for predicting the future behavior of a data stream comprises the following steps/operations. Statistics are obtained from the data stream. Estimated statistics for a future time interval are generated by using at least a portion of the obtained statistics. A portion of the estimated statistics are utilized to generate one or more representative pseudo-data records within the future time interval. Pseudo-data records are utilized for forecasting of at least one characteristic of the data stream.

Type: Grant

Filed: June 14, 2006

Date of Patent: September 2, 2008

Assignee: International Business Machines Corporation

Inventor: Charu C. Aggarwal
Methods and apparatus for outlier detection for high dimensional data sets

Patent number: 7395250

Abstract: Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data.

Type: Grant

Filed: October 11, 2000

Date of Patent: July 1, 2008

Assignee: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
SYSTEM AND METHOD FOR FEATURE BASED LOAD SHEDDING IN CLASSIFICATION

Publication number: 20080133438

Abstract: A system and method for feature based load shedding in classification. The system includes a plurality of data sources. The plurality of data sources being configured to render independent streams of input data, such data being selectively grouped together to form a particular classification task. The system further includes a central classification server configured to analyze and execute multiple tasks, each task consisting of multiple input data. The central classification server further configured to analyze the data for knowledge-based decision-making. The central classification server being communicatively engaged via a network to the plurality of data sources. The method includes rendering independent streams of input data, such data being selectively grouped together to form a particular task. The method further includes analyzing and handling multiple tasks, each task consisting of multiple input data. The method also includes analyzing the data for knowledge-based decision-making.

Type: Application

Filed: November 30, 2006

Publication date: June 5, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Charu C. Aggarwal, Haixun Wang
Methods for dynamic classification of data in evolving data stream

Patent number: 7379939

Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.

Type: Grant

Filed: June 30, 2004

Date of Patent: May 27, 2008

Assignee: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
SYSTEM AND METHOD FOR RESOURCE ADAPTIVE CLASSIFICATION OF DATA STREAMS

Publication number: 20080082475

Abstract: A system and method for resource adaptive classification of data streams. Embodiments of systems and methods provide classifying data received in a computer, including discretizing the received data, constructing an intermediate data structure from said received data as training instances, performing subspace sampling on said received data as test instances and adaptively classifying said received data based on statistics of said subspace sampling.

Type: Application

Filed: September 12, 2006

Publication date: April 3, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Charu C. Aggarwal, Philip Shi-lung Yu
Systems and methods for condensation-based privacy in strings

Publication number: 20080082566

Abstract: Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.

Type: Application

Filed: September 30, 2006

Publication date: April 3, 2008

Applicant: IBM Corporation

Inventors: Charu C. Aggarwal, Philip S. Yu
Methods and apparatus for clustering evolving data streams through online and offline components

Patent number: 7353218

Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.

Type: Grant

Filed: August 14, 2003

Date of Patent: April 1, 2008

Assignee: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
Method and apparatus for predicting future behavior of data streams

Publication number: 20070294216

Abstract: Techniques are disclosed for predicting the future behavior of data streams through the use of current trends of the data stream. By way of example, a technique for predicting the future behavior of a data stream comprises the following steps/operations. Statistics are obtained from the data stream. Estimated statistics for a future time interval are generated by using at least a portion of the obtained statistics. A portion of the estimated statistics are utilized to generate one or more representative pseudo-data records within the future time interval. Pseudo-data records are utilized for forecasting of at least one characteristic of the data stream.

Type: Application

Filed: June 14, 2006

Publication date: December 20, 2007

Applicant: International Business Machines Corporation

Inventor: Charu C. Aggarwal
Methods and apparatus for generating decision trees with discriminants and employing same in data classification

Patent number: 7310624

Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.

Type: Grant

Filed: May 2, 2000

Date of Patent: December 18, 2007

Assignee: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu

prev 1 2 3 4 5 6 next