Patents by Inventor Charu C. Aggarwal

Charu C. Aggarwal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7305378
    Abstract: Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.
    Type: Grant
    Filed: July 16, 2004
    Date of Patent: December 4, 2007
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7302420
    Abstract: Methods and apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process are provided. First, data statistics are constructed from the at least one input data set. Then, an output data set is generated from the data statistics. The output data set differs from the input data set but maintains one or more correlations from within the input data set. The correlations may be the inherent correlations between different dimensions of a multidimensional input data set. A significant amount of information from the input data set may be hidden so that the privacy level of the data mining process may be increased.
    Type: Grant
    Filed: August 14, 2003
    Date of Patent: November 27, 2007
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7277893
    Abstract: The present invention is directed to the use of an evolutionary algorithm to locate optimal solution subspaces. The evolutionary algorithm uses a point-based coding of the subspace determination problem and searches selectively over the space of possible coded solutions. Each feasible solution to the problem, or individual in the population of feasible solutions, is coded as a string, which facilitates use of the evolutionary algorithm to determine the optimal solution to the fitness function. The fitness of each string is determined by solving the objective function for that string. The resulting fitness value can then be converted to a rank, and all of the members of the population of solutions can be evaluated using selection, crossover, and mutation processes that are applied sequentially and iteratively to the individuals in the population of solutions.
    Type: Grant
    Filed: July 28, 2004
    Date of Patent: October 2, 2007
    Assignee: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 7236638
    Abstract: Data compression techniques particularly applicable to high dimensional data. The invention uses a hierarchical partitioning approach in conjunction with a subspace sampling methodology which is sensitive to a subject data set. The dual nature of this hierarchical partitioning and subspace sampling approach makes the overall data compression process very effective. While the data compression process provides a much more compact representation than traditional dimensionality reduction techniques, the process also provides hard bounds on the error of the approximation. Also, the data compression process of the invention realizes a compression factor that improves with increasing database size.
    Type: Grant
    Filed: July 30, 2002
    Date of Patent: June 26, 2007
    Assignee: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 7139688
    Abstract: A technique for structurally classifying substructures of at least one unmarked string utilizing at least one training data set with inserted markers identifying labeled substructures. A model of class labels and substructures within strings of the training data set is first constructed. Markers are then inserted into the unmarked string, identifying substructures similar to substructures within strings of the training data set by using the model. Finally, class labels of the substructures in the unmarked string similar to substructures within strings of the training data set are predicted using the model.
    Type: Grant
    Filed: June 20, 2003
    Date of Patent: November 21, 2006
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7085981
    Abstract: Techniques for processing data sets and, more particularly, constructing a synthetic data set (test data set) from real data sets (input data sets) in accordance with user feedback. The technique mimics real data sets effectively to generate the corresponding synthetic ones. Multiple real data sets may be used to create a test data set which combines the characteristics of these multiple data sets. Users of the technique have the ability to modify the characteristics of the data sets to create a new data set which has features that a user may desire. For example, a user may change the shape or size of, or distort the different patterns in the data to create a new data set. A user may also choose to inject noise into the system.
    Type: Grant
    Filed: June 9, 2003
    Date of Patent: August 1, 2006
    Assignee: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 6970884
    Abstract: Techniques are provided for incorporating human or user interaction in accordance with the design and/or performance of data mining applications such as similarity determination and classification. Such user-centered techniques permit the mining of interesting characteristics of data in a data or feature space. For example, such interesting characteristics that may be determined in accordance with the user-centered mining techniques of the invention may include a determination of similarity among different data objects, as well the determination of individual class labels. These techniques allow effective data mining applications to be performed in accordance with high dimensional data.
    Type: Grant
    Filed: August 14, 2001
    Date of Patent: November 29, 2005
    Assignee: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 6871165
    Abstract: A technique for effective classification of time series data using a rule-based wavelet decomposition approach. This method is effective in classification of a wide variety of time series data sets. The process uses a combination of wavelet decomposition, discretization and rule generation of training time series data to classify various instances of test time series data. The wavelet decomposition can effectively explore the data at varying levels of granularity to classify instances of the test time series data.
    Type: Grant
    Filed: June 20, 2003
    Date of Patent: March 22, 2005
    Assignee: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 6847955
    Abstract: A method for mining incomplete data sets that avoids the process of having to extrapolate the attributes, and instead concentrate on the use of conceptual representations in order to mine the data sets. The idea in using conceptual representations is that even though many attributes may be missing, it is possible to accurately guess the behavior of the data along certain pre-specified directions, i.e., the conceptual directions of the data set.
    Type: Grant
    Filed: April 10, 2001
    Date of Patent: January 25, 2005
    Assignee: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Publication number: 20040260663
    Abstract: A technique for structurally classifying substructures of at least one unmarked string utilizing at least one training data set with inserted markers identifying labeled substructures. A model of class labels and substructures within strings of the training data set is first constructed. Markers are then inserted into the unmarked string, identifying substructures similar to substructures within strings of the training data set by using the model. Finally, class labels of the substructures in the unmarked string similar to substructures within strings of the training data set are predicted using the model.
    Type: Application
    Filed: June 20, 2003
    Publication date: December 23, 2004
    Applicant: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20040260521
    Abstract: A technique for effective classification of time series data using a rule-based wavelet decomposition approach. This method is effective in classification of a wide variety of time series data sets. The process uses a combination of wavelet decomposition, discretization and rule generation of training time series data to classify various instances of test time series data. The wavelet decomposition can effectively explore the data at varying levels of granularity to classify instances of the test time series data.
    Type: Application
    Filed: June 20, 2003
    Publication date: December 23, 2004
    Applicant: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Publication number: 20040250188
    Abstract: Techniques for processing data sets and, more particularly, constructing a synthetic data set (test data set) from real data sets (input data sets) in accordance with user feedback. The technique mimics real data sets effectively to generate the corresponding synthetic ones. Multiple real data sets may be used to create a test data set which combines the characteristics of these multiple data sets. Users of the technique have the ability to modify the characteristics of the data sets to create a new data set which has features that a user may desire. For example, a user may change the shape or size of, or distort the different patterns in the data to create a new data set. A user may also choose to inject noise into the system.
    Type: Application
    Filed: June 9, 2003
    Publication date: December 9, 2004
    Applicant: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Publication number: 20040205049
    Abstract: Techniques are provided for user-centered search and crawling on an information network such as the world wide web. The techniques identify the nature of the web pages which are most relevant to a given predicate. The behavior of users is used to identify and determine the web pages which are most relevant to a specific crawl. Thus, the techniques are implemented in a web crawling system which can obtain the web pages specific to a given topic by leveraging the nature of the interests of the users in different topics.
    Type: Application
    Filed: April 10, 2003
    Publication date: October 14, 2004
    Applicant: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 6804669
    Abstract: Techniques are provided for incorporating human or user interaction in accordance with the design and/or performance of data mining applications such as similarity determination and classification. Such user-centered techniques permit the mining of interesting characteristics of data in a data or feature space. For example, such interesting characteristics that may be determined in accordance with the user-centered mining techniques of the invention may include a determination of similarity among different data objects, as well the determination of individual class labels. These techniques allow effective data mining applications to be performed in accordance with high dimensional data.
    Type: Grant
    Filed: August 14, 2001
    Date of Patent: October 12, 2004
    Assignee: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 6799175
    Abstract: Techniques are provided for finding query responses from database queries using an interactive process between a user (e.g., a person entering a query to a database) and a computer system (e.g., a computing system upon which the database resides or which has access to the database). The interactive process comprises providing the user with one or more visual perspectives as feedback on the distribution of points in the database. These visual perspectives may be considered by the user in order for the user to provide feedback to the computer system. The computer system may then use the user-provided feedback to determine the best response to the query.
    Type: Grant
    Filed: April 23, 2001
    Date of Patent: September 28, 2004
    Assignee: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 6785669
    Abstract: A method of performing a flexible similarity search is provided. In one embodiment, the method comprises the steps of: (i) constructing an indexed representation of one or more documents to be used in the similarity search; (ii) specifying a target document and a similarity function to be used for the search of the one or more documents for which the indexed representation is constructed; and (iii) finding a document among the one or more documents for which the indexed representation is constructed which is similar to the target document, based on the specified similarity function. Thus, the invention creates a universal index for text similarity searches so that a user can specify a function at the time of query.
    Type: Grant
    Filed: March 8, 2000
    Date of Patent: August 31, 2004
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 6721719
    Abstract: System and method for generating classification using time sequences comprises inputting a set of time dependant feature variable graphs along with a set of time dependant category variable graphs; finding frequent shapes in the time dependant feature variable graphs; utilizing the frequent shapes to generate combinations of frequent shapes; generating rules relating one or more patterns of combinations of frequent shapes to a category variable; and, performing a categorization utilizing the rules generated.
    Type: Grant
    Filed: July 26, 1999
    Date of Patent: April 13, 2004
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-lung Yu
  • Publication number: 20040022445
    Abstract: Data compression techniques particularly applicable to high dimensional data. The invention uses a hierarchical partitioning approach in conjunction with a subspace sampling methodology which is sensitive to a subject data set. The dual nature of this hierarchical partitioning and subspace sampling approach makes the overall data compression process very effective. While the data compression process provides a much more compact representation than traditional dimensionality reduction techniques, the process also provides hard bounds on the error of the approximation. Also, the data compression process of the invention realizes a compression factor that improves with increasing database size.
    Type: Application
    Filed: July 30, 2002
    Publication date: February 5, 2004
    Applicant: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 6631413
    Abstract: In accordance with the present invention, a method for selecting a channel and delivery time for digital objects for a broadcast delivery service including multiple channels of varying bandwidths includes the steps of selecting digital objects to be sent over the multiple channels, generating a schedule and pricing for the digital objects based on the digital object selected and existing delivery commitments and manipulating the schedule and pricing to provide a profitable delivery of the digital objects. A system is also included.
    Type: Grant
    Filed: January 28, 1999
    Date of Patent: October 7, 2003
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Joel L. Wolf, Philip S. Yu
  • Patent number: 6587848
    Abstract: Methodologies are provided which use affinity lists in order to perform query retrieval more effectively and efficiently. The invention comprises a two phase method. In the first phase, we find a threshold number k of candidate documents which are retrieved by the method. In the second phase, we calculate the affinity value to each of these k documents and report them in ranked order of affinities. The first phase of finding the k most valuable candidates is accomplished using an iterative technique on the affinity lists. Once these candidates have been found, the affinity to each document in the set is obtained, and the resulting documents are rank ordered by affinity to the target document.
    Type: Grant
    Filed: March 8, 2000
    Date of Patent: July 1, 2003
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu