Patents by Inventor Charu C. Aggarwal

Charu C. Aggarwal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20120166382
    Abstract: An object and attributes that describe that object are identified. The attributes are grouped into attribute patterns, and classification classes are identified. For each identified class a sketch table containing a plurality of parallel hash tables is created. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table, resulting in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern, producing a discriminatory power for each attribute pattern. Attribute patterns having a discriminatory power above a given threshold are selected and added to the associated sketch table values. The sketch table with the largest overall sum is identified, and the associated class is assigned to the object belonging to the attribute patterns.
    Type: Application
    Filed: February 21, 2012
    Publication date: June 28, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charu C. Aggarwal, Philip S. Yu
  • Patent number: 8165975
    Abstract: A technique for monitoring a primary data stream comprising a plurality of secondary data streams for abnormalities is provided. A deviation value for each of two or more of the plurality of secondary data streams is determined. The two or more deviation values of the two or more secondary data streams are combined to form a combined deviation value. An abnormality signal is generated based at least in part on the combined deviation value.
    Type: Grant
    Filed: July 23, 2009
    Date of Patent: April 24, 2012
    Assignee: International Business Machines Corporation
    Inventor: Charu C. Aggarwal
  • Patent number: 8165979
    Abstract: A system and method for resource adaptive classification of data streams. Embodiments of systems and methods provide classifying data received in a computer, including discretizing the received data, constructing an intermediate data structure from said received data as training instances, performing subspace sampling on said received data as test instances and adaptively classifying said received data based on statistics of said subspace sampling.
    Type: Grant
    Filed: April 1, 2011
    Date of Patent: April 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip S. Yu
  • Patent number: 8140448
    Abstract: An object and attributes that describe that object are identified. The attributes are grouped into attribute patterns, and classification classes are identified. For each identified class a sketch table containing a plurality of parallel hash tables is created. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table, resulting in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern, producing a discriminatory power for each attribute pattern. Attribute patterns having a discriminatory power above a given threshold are selected and added to associated sketch table values. The sketch table with the largest overall sum is identified, and the associated class is assigned to the object belonging to the attribute patterns.
    Type: Grant
    Filed: May 9, 2008
    Date of Patent: March 20, 2012
    Assignee: International Business Machines Corporation
    Inventors: Charu C Aggarwal, Philip S Yu
  • Publication number: 20110295832
    Abstract: Techniques for identifying one or more communities in an information network are provided. The techniques include collecting one or more nodes and one or more edges from an information network, performing a random walk on the one or more nodes to produce a sequence of one or more nodes, creating a sequence database from one or more sequences produced via random walk, and mining the sequence database to determine one or more patterns in the network, wherein the one or more patterns identify one or more communities in the information network.
    Type: Application
    Filed: May 28, 2010
    Publication date: December 1, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charu C. Aggarwal, Rajesh R. Bordawekar
  • Patent number: 8060816
    Abstract: Methods and apparatus for performing intelligent crawling are provided. Particularly, the intelligent crawling techniques of the invention provide a crawler mechanism which is capable of learning as it crawls in order to focus the search for documents on the information network being explored, e.g., world wide web. This crawler mechanism stores information about the crawled documents as it retrieves the documents, and then uses the information to further focus its search appropriately. The inventive techniques result in the crawling of a small percentage of the documents on the world wide web.
    Type: Grant
    Filed: October 31, 2000
    Date of Patent: November 15, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20110213740
    Abstract: A system and method for resource adaptive classification of data streams. Embodiments of systems and methods provide classifying data received in a computer, including discretizing the received data, constructing an intermediate data structure from said received data as training instances, performing subspace sampling on said received data as test instances and adaptively classifying said received data based on statistics of said subspace sampling.
    Type: Application
    Filed: April 1, 2011
    Publication date: September 1, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charu C. Aggarwal, Philip Shi-lung Yu
  • Patent number: 8010541
    Abstract: Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.
    Type: Grant
    Filed: September 30, 2006
    Date of Patent: August 30, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip S. Yu
  • Patent number: 8005839
    Abstract: Techniques are disclosed for aggregation in uncertain data in data processing systems. For example, a method of aggregation in an application that involves an uncertain data set includes the following steps. The uncertain data set along with uncertainty information is obtained. One or more clusters of data points are constructed from the data set. Aggregate statistics of the one or more clusters and uncertainty information are stored. The data set may be data from a data stream. It is realized that the use of even modest uncertainty information during an application such as a data mining process is sufficient to greatly improve the quality of the underlying results.
    Type: Grant
    Filed: February 28, 2008
    Date of Patent: August 23, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7970772
    Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.
    Type: Grant
    Filed: May 24, 2007
    Date of Patent: June 28, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20110074786
    Abstract: Mechanisms are provided for transforming an original graph data set into a representative form having a smaller number of dimensions that the original graph data set. The mechanisms generate a graph transformation basis structure based on an input graph data structure. The mechanisms further transform an original graph data set based on an intersection of the graph transformation basis structure and the input graph data structure to thereby generate a transformed graph data set data structure. The transformed graph data set data structure has a reduced dimensionality from that of the input graph data structure but represents characteristics of the original graph data set. Moreover, the mechanisms perform an application specific operation on the transformed graph data set data structure to generate an output of a closest similarity record in the transformed graph data set to a target component.
    Type: Application
    Filed: September 29, 2009
    Publication date: March 31, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Charu C. Aggarwal
  • Publication number: 20110078143
    Abstract: Mechanisms are provided for anonymizing data comprising a plurality of graph data sets. The mechanisms receive input data comprising a plurality of graph data sets. Each graph data set comprises data for generating a separate graph from graphs associated with other graph data sets. The mechanisms perform clustering on the graph data sets to generate a plurality of clusters. At least one cluster of the plurality of clusters comprises a plurality of graph data sets. Other clusters in the plurality of clusters comprise one or more graph data sets. The mechanisms also determine, for each cluster in the plurality of clusters, aggregate properties of the cluster. Moreover, the mechanisms generate, for each cluster in the plurality of clusters, pseudo-synthetic data representing the cluster, from the determined aggregate properties of the clusters.
    Type: Application
    Filed: September 29, 2009
    Publication date: March 31, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Charu C. Aggarwal
  • Patent number: 7917517
    Abstract: Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.
    Type: Grant
    Filed: February 28, 2008
    Date of Patent: March 29, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7890510
    Abstract: Improved techniques are disclosed for detecting patterns of interaction among a set of entities and analyzing community evolution in a stream environment. By way of example, a technique for processing data from a data stream includes the following steps/operations. A data point of the data stream representing an interaction event is obtained. An interaction graph is updated on-line based on the data point representing the interaction event. The updated interaction graph is stored in a nonvolatile memory. An interaction evolution is determined off-line from the updated interaction graph stored in the nonvolatile memory.
    Type: Grant
    Filed: October 5, 2005
    Date of Patent: February 15, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7885941
    Abstract: Methods and apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process are provided. First, data statistics are constructed from the at least one input data set. Then, an output data set is generated from the data statistics. The output data set differs from the input data set but maintains one or more correlations from within the input data set. The correlations may be the inherent correlations between different dimensions of a multidimensional input data set. A significant amount of information from the input data set may be hidden so that the privacy level of the data mining process may be increased.
    Type: Grant
    Filed: October 15, 2007
    Date of Patent: February 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Yu Shi-Lung Philip
  • Publication number: 20110029571
    Abstract: An illustrative embodiment includes a method for executing a query on a graph data stream. The graph stream comprises data representing edges that connect vertices of a graph. The method comprises constructing a plurality of synopsis data structures based on at least a subset of the graph data stream. Each vertex connected to an edge represented within the subset of the graph data stream is assigned to a synopsis data structure such that each synopsis data structure represents a corresponding section of the graph. The method further comprises mapping each received edge represented within the graph data stream onto the synopsis data structure which corresponds to the section of the graph which includes that edge, and using the plurality of synopsis data structures to execute the query on the graph data stream.
    Type: Application
    Filed: July 29, 2009
    Publication date: February 3, 2011
    Applicant: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Min Wang, Peixiang Zhao
  • Patent number: 7865456
    Abstract: Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data.
    Type: Grant
    Filed: June 6, 2008
    Date of Patent: January 4, 2011
    Assignee: Trend Micro Incorporated
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20100268734
    Abstract: Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.
    Type: Application
    Filed: May 23, 2007
    Publication date: October 21, 2010
    Applicant: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7739284
    Abstract: A technique for processing a data stream includes the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure maybe used for classification of data in the data stream.
    Type: Grant
    Filed: April 20, 2005
    Date of Patent: June 15, 2010
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7716155
    Abstract: Techniques are disclosed for predicting the future behavior of data streams through the use of current trends of the data stream. By way of example, a technique for predicting the future behavior of a data stream comprises the following steps/operations. Statistics are obtained from the data stream. Estimated statistics for a future time interval are generated by using at least a portion of the obtained statistics. A portion of the estimated statistics are utilized to generate one or more representative pseudo-data records within the future time interval. Pseudo-data records are utilized for forecasting of at least one characteristic of the data stream.
    Type: Grant
    Filed: June 10, 2008
    Date of Patent: May 11, 2010
    Assignee: International Business Machines Corporation
    Inventor: Charu C. Aggarwal