Patents by Inventor Philip Yu

Philip Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods and apparatus for performing structural joins for answering containment queries

Publication number: 20060101056

Abstract: Techniques are provided for performing structural joins for answering containment queries. Such inventive techniques may be used to perform efficient structural joins of two interval lists which are neither sorted nor pre-indexed. For example, in an illustrative aspect of the invention, a technique for performing structural joins of two element sets of a tree-structured document, wherein one of the two element sets is an ancestor element set and the other of the two element sets is a descendant element set, and further wherein each element is represented as an interval representing a start position and an end position of the element in the document, comprises the following steps/operations. An index is dynamically built for the ancestor element set. Then, one or more structural joins are performed by searching the index with the interval start position of each element in the descendant element set.

Type: Application

Filed: November 5, 2004

Publication date: May 11, 2006

Applicant: International Business Machines Corporation

Inventors: Shyh-Kwei Chen, Kun-Lung Wu, Philip Yu
Methods and apparatus for interval query indexing

Publication number: 20060101045

Abstract: Interval query indexing techniques for use in accordance with data stream processing systems are disclosed. For example, in an illustrative aspect of the invention, a technique for use in processing a data stream comprises the following steps/operations. First, an attribute range of query intervals associated with the data stream is partitioned into one or more segments. Then, a set of virtual intervals is defined for each of the one or more segments. A query interval index is then built using the set of virtual intervals. The query interval index may be built by decomposing each query interval into one or more of the virtual intervals, and associating a query identifier with the decomposed virtual intervals.

Type: Application

Filed: November 5, 2004

Publication date: May 11, 2006

Applicant: International Business Machines Corporation

Inventors: Shyh-Kwei Chen, Kun-Lung Wu, Philip Yu
System and method for graph indexing

Publication number: 20060036564

Abstract: Techniques for graph indexing are provided. In one aspect, a method for indexing graphs in a database, the graphs comprising graphic data, comprises the following steps. Frequent subgraphs among one or more of the graphs in the database are identified, the frequent subgraphs appearing in at least a threshold number of the graphs in the database. One or more of the frequent subgraphs are used to create an index of the graphs in the database.

Type: Application

Filed: April 30, 2004

Publication date: February 16, 2006

Applicant: International Business Machines Corporation

Inventors: Xifeng Yan, Philip Yu
Systems and methods for sequential modeling in less than one sequential scan

Publication number: 20060026110

Abstract: Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.

Type: Application

Filed: July 30, 2004

Publication date: February 2, 2006

Applicant: IBM Corporation

Inventors: Wei Fan, Haixun Wang, Philip Yu
System and method for distributed privacy preserving data mining

Publication number: 20060015474

Abstract: Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.

Type: Application

Filed: July 16, 2004

Publication date: January 19, 2006

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
System and method for continuous diagnosis of data streams

Publication number: 20060010093

Abstract: In connection with the mining of time-evolving data streams, a general framework that mines changes and reconstructs models from a data stream with unlabeled instances or a limited number of labeled instances. In particular, there are defined herein statistical profiling methods that extend a classification tree in order to guess the percentage of drifts in the data stream without any labelled data. Exact error can be estimated by actively sampling a small number of true labels. If the estimated error is significantly higher than empirical expectations, there preferably re-sampled a small number of true labels to reconstruct the decision tree from the leaf node level.

Type: Application

Filed: June 30, 2004

Publication date: January 12, 2006

Applicant: IBM Corporation

Inventors: Wei Fan, Haixun Wang, Philip Yu
Methods and apparatus for dynamic classification of data in evolving data stream

Publication number: 20060004754

Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.

Type: Application

Filed: June 30, 2004

Publication date: January 5, 2006

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Cross-feature analysis

Publication number: 20050283511

Abstract: Disclosed is a method of automatically identifying anomalous situations during computerized system operations that records actions performed by the computerized system as features in a history file, automatically creates a model for each feature only from normal data in the history file, performs training by calculating anomaly scores of the features, establishes a threshold to evaluate whether features are abnormal, automatically identifies abnormal actions of the computerized system based on the anomaly scores and said threshold, and periodically repeats the training process.

Type: Application

Filed: September 9, 2003

Publication date: December 22, 2005

Inventors: Wei Fan, Philip Yu
Systems and methods for subspace clustering

Publication number: 20050278324

Abstract: Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including e-Commerce target marketing, bioinformatics (large scale scientific data analysis), and automatic computing (web usage analysis), etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences.

Type: Application

Filed: May 31, 2004

Publication date: December 15, 2005

Applicant: IBM Corporation

Inventors: Wei Fan, Haixun Wang, Philip Yu
System and method for mining time-changing data streams

Publication number: 20050278322

Abstract: A general framework for mining concept-drifting data streams using weighted ensemble classifiers. An ensemble of classification models, such as C4.5, RIPPER, naive Bayesian, etc., is trained from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. An empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.

Type: Application

Filed: May 28, 2004

Publication date: December 15, 2005

Applicant: IBM Corporation

Inventors: Wei Fan, Haixun Wang, Philip Yu
Enabling interoperability between participants in a network

Publication number: 20050246262

Abstract: Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value metric. The semantic models may define interactions between the participants and define at least a portion of information produced or consumed by the participants. The determination of the values and the direction of the information flow may be performed multiple times in order to modify the one or more value metrics. The direction of information flow may allow participants to be deleted from the network, may allow participants to be added to the network, or may allow behavior of the participants to be modified.

Type: Application

Filed: April 29, 2004

Publication date: November 3, 2005

Inventors: Charu Aggarwal, Murray Campbell, Yuan-Chi Chang, Matthew Hill, Chung-Sheng Li, Milind Naphade, Sriram Padmanabhan, John Smith, Min Wang, Kun-Lung Wu, Philip Yu
System and method for searching using a temporal dimension

Publication number: 20050234877

Abstract: The present invention is directed to a system and a method for generating a temporally ranked set of search results in response to a query. Each result in the set of search results can be ranked temporally or based on the reputation associated with authors of each result and the reputation associated with the repository where each result is located. Temporal ranking takes into account a present importance weight and a future importance weight are assigned to each result. The present importance of each result uses creation date, publication date, in-link dates and search frequency, and the future importance uses an aging factor based on the elapsed time from publication for each search result and a rate at which each search result decreases in importance. Temporal ranking can be applied as a modification of existing and common search engine algorithms include PageRank and HITS.

Type: Application

Filed: April 8, 2004

Publication date: October 20, 2005

Inventor: Philip Yu
Methods and apparatus for data stream clustering for abnormality monitoring

Publication number: 20050210027

Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.

Type: Application

Filed: March 16, 2004

Publication date: September 22, 2005

Applicant: International Business Machines Corporation

Inventors: Charu Aggarwal, Philip Yu
Method and apparatus for hierarchical selective personalization

Publication number: 20050193110

Abstract: Techniques are provided for improved serving of content in a distributed data network. In one aspect of the invention, a technique for delivering content in a client-server system based on a request from a client comprises the following steps/operations. The request is obtained. A performance characteristic of at least one server or at least one cache of the client-server system is determined. Then, a level of data accuracy to be delivered to the client in response to the request is determined. The data accuracy determination is based on: (i) the determined performance characteristic of the at least one server or the at least one cache; and (ii) at least one preference associated with the client. The performance characteristic may comprise a load of the at least one server or the at least one cache. The level of data accuracy may comprise a level of personalization to be delivered to the client in response to the request.

Type: Application

Filed: February 27, 2004

Publication date: September 1, 2005

Applicant: International Business Machines Corporation

Inventors: Paul Dantzig, Daniel Dias, Arun Ivengar, Philip Yu
Method and apparatus for representing and managing service level agreement management data and relationships thereof

Publication number: 20050177545

Abstract: Techniques are provided for representing and managing data and associated relationships. In one aspect of the invention, a technique for managing data associated with a given domain comprises the following steps. A specification of data attributes representing one or more types of data to be managed is maintained. Further, a specification of algorithms representing one or more types of operations performable in accordance with the data attributes is maintained. Still further, a specification of relationships representing relationships between the data attributes and the algorithms is maintained. The data attribute specification, the algorithm specification and the relationship specification are maintained in a storage framework having multiple levels, the multiple levels being specified based on the given domain with which the data being managed is associated. The techniques may be provided in support of service level management.

Type: Application

Filed: February 11, 2004

Publication date: August 11, 2005

Applicant: International Business Machines Corporation

Inventors: Melissa Buco, Rong Chang, Laura Luan, Zon-Yin Shae, Christopher Ward, Joel Wolf, Philip Yu
System and method for adaptive pruning

Publication number: 20050131873

Abstract: Disclosed in a method and structure for searching data in databases using an ensemble of models. First the invention performs training. This training orders models within the ensemble in order of prediction accuracy and joins different numbers of models together to form sub-ensembles. The models are joined together in the sub-ensemble in the order of prediction accuracy. Next in the training process, the invention calculates confidence values of each of the sub-ensembles. The confidence is a measure of how closely results form the sub-ensemble will match results from the ensemble. The size of each of the sub-ensembles is variable depending upon the level of confidence, while, to the contrary, the size of the ensemble is fixed. After the training, the invention can make a prediction. First, the invention selects a sub-ensemble that meets a given level of confidence.

Type: Application

Filed: December 16, 2003

Publication date: June 16, 2005

Inventors: Wei Fan, Haixun Wang, Philip Yu
System and method for scalable cost-sensitive learning

Publication number: 20050125434

Abstract: A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset into N subsets of data and developing an estimated learning model for the dataset by developing a learning model for a first subset of the N subsets.

Type: Application

Filed: December 3, 2003

Publication date: June 9, 2005

Applicant: International Business Machines Corporation

Inventors: Wei Fan, Haixun Wang, Philip Yu
Near-neighbor search in pattern distance spaces

Publication number: 20050114331

Abstract: Similarity searching techniques are provided. In one aspect, a method for use in finding near-neighbors in a set of objects comprises the following steps. Subspace pattern similarities that the objects in the set exhibit in multi-dimensional spaces are identified. Subspace correlations are defined between two or more of the objects in the set based on the identified subspace pattern similarities for use in identifying near-neighbor objects. A pattern distance index may be created. A method of performing a near-neighbor search of one or more query objects against a set of objects is also provided.

Type: Application

Filed: November 26, 2003

Publication date: May 26, 2005

Applicant: International Business Machines Corporation

Inventors: Haixun Wang, Philip Yu
Index structure for supporting structural XML queries

Publication number: 20050114314

Abstract: The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure.

Type: Application

Filed: November 26, 2003

Publication date: May 26, 2005

Inventors: Wei Fan, Haixun Wang, Philip Yu
System and method for indexing weighted-sequences in large databases

Publication number: 20050114298

Abstract: The present invention provides an index structure for managing weighted-sequences in large databases. A weighted-sequence is defined as a two-dimensional structure in which each element in the sequence is associated with a weight. A series of network events, for instance, is a weighted-sequence because each event is associated with a timestamp. Querying a large sequence database by events' occurrence patterns is a first step towards understanding the temporal causal relationships among the events. The index structure proposed herein enables the efficient retrieval from the database of all subsequences (contiguous and non-contiguous) that match a given query sequence both by events and by weights. The index structure also takes into consideration the nonuniform frequency distribution of events in the sequence data.

Type: Application

Filed: November 26, 2003

Publication date: May 26, 2005

Inventors: Wei Fan, Chang-Shing Perng, Haixun Wang, Philip Yu

prev 1 2 3 4 5 6 next