Patents by Inventor Philip Yu

Philip Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20070288417
    Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.
    Type: Application
    Filed: August 20, 2007
    Publication date: December 13, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070288465
    Abstract: Improved techniques are disclosed for detecting patterns of interaction among a set of entities and analyzing community evolution in a stream environment. By way of example, a technique for processing data from a data stream includes the following steps/operations. A data point of the data stream representing an interaction event is obtained. An interaction graph is updated on-line based on the data point representing the interaction event. The updated interaction graph is stored in a nonvolatile memory. An interaction evolution is determined off-line from the updated interaction graph stored in the nonvolatile memory.
    Type: Application
    Filed: October 5, 2005
    Publication date: December 13, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070288635
    Abstract: A computer implemented method, apparatus, and computer usable program code for processing multi-way stream correlations. Stream data are received for correlation. A task is formed for continuously partitioning a multi-way stream correlation workload into smaller workload pieces. Each of the smaller workload pieces may be processed by a single host. The stream data are sent to different hosts for correlation processing.
    Type: Application
    Filed: May 4, 2006
    Publication date: December 13, 2007
    Applicant: International Business Machines Corporation
    Inventors: Xiaohui Gu, Haixun Wang, Philip Yu
  • Publication number: 20070271243
    Abstract: The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure.
    Type: Application
    Filed: July 19, 2007
    Publication date: November 22, 2007
    Inventors: Wei Fan, Haixun Wang, Philip Yu
  • Publication number: 20070260563
    Abstract: The method trains an inductive model to output multiple models from the inductive model and trains an error correlation model to estimate an average output of predictions made by the multiple models. Then the method can determine an error estimation of each of the multiple models using the error correlation model.
    Type: Application
    Filed: April 17, 2006
    Publication date: November 8, 2007
    Applicant: International Business Machines Corporation
    Inventors: Wei Fan, Philip Yu
  • Publication number: 20070239982
    Abstract: Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application.
    Type: Application
    Filed: October 13, 2005
    Publication date: October 11, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070230488
    Abstract: There is provided a method for determining reachability between any two nodes within a graph. The inventive method utilizes a dual-labeling scheme. Initially, a spanning tree is defined for a group of nodes within a graph. Each node in the spanning tree is assigned a unique interval-based label, that describes its dependency from an ancestor node. Non-tree labels are then assigned to each node in the spanning tree that is connected to another node in the spanning tree by a non-tree link. From these labels, reachability of any two nodes in the spanning tree is determined by using only the interval-based labels and the non-tree labels.
    Type: Application
    Filed: March 31, 2006
    Publication date: October 4, 2007
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Philip Yu, Haixun Wang, Hao He
  • Publication number: 20070226212
    Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.
    Type: Application
    Filed: May 24, 2007
    Publication date: September 27, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070226216
    Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.
    Type: Application
    Filed: May 31, 2007
    Publication date: September 27, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070226209
    Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.
    Type: Application
    Filed: May 30, 2007
    Publication date: September 27, 2007
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070223598
    Abstract: Streaming environments typically dictate incomplete or approximate algorithm execution, in order to cope with sudden surges in the data rate. Such limitations are even more accentuated in mobile environments (such as sensor networks) where computational and memory resources are typically limited. Introduced herein is a novel “resource adaptive” algorithm for spectrum and periodicity estimation on a continuous stream of data. The formulation is based on the derivation of a closed-form incremental computation of the spectrum, augmented by an intelligent load-shedding scheme that can adapt to available CPU resources. Experimentation indicates that the proposed technique can be a viable and resource efficient solution for real-time spectrum estimation.
    Type: Application
    Filed: March 24, 2006
    Publication date: September 27, 2007
    Applicant: IBM Corporation
    Inventors: Deepak Turaga, Michail Vlachos, Philip Yu
  • Publication number: 20070226264
    Abstract: There are provided a method, a computer program product, and a system for maintaining a materialized view defined on a relation of a relational database. The method includes the step of performing content-based filtering on the relation to identify an update to the relation as being irrelevant with respect to the materialized view.
    Type: Application
    Filed: March 22, 2006
    Publication date: September 27, 2007
    Inventors: Gang Luo, Philip Yu
  • Publication number: 20070220219
    Abstract: A method (and system) of storing data in a value-based storage system, includes optimizing a value of data stored in the value-based storage system.
    Type: Application
    Filed: March 16, 2006
    Publication date: September 20, 2007
    Applicant: International Business Machines Corporation
    Inventors: Nikhil Bansal, Frederick Douglis, Lisa Fleischer, Kirsten Hildrum, Akshay Kumar Katta, John Palmer, Elizabeth Richards, David Tao, William Tetzlaff, Joel Wolf, Philip Yu
  • Publication number: 20070214163
    Abstract: There are provided methods, computer program products, and systems for indexing a data stream. A method for indexing a data stream having attribute values includes the steps of parsing the data stream, and forming an index of tuples for a subset of attribute values of the data stream. The index is configured for retrieving the top-K tuples that optimize linearly weighted sums of at least some of the attribute values in the subset.
    Type: Application
    Filed: March 10, 2006
    Publication date: September 13, 2007
    Inventors: Gang Luo, Kun-Lung Wu, Philip Yu
  • Publication number: 20070211703
    Abstract: A system, method, and computer program product for establishing multi-party VoIP conference audio calls in a distributed, peer-to-peer network where any number of nodes are able to arbitrarily and asynchronously start or stop producing audio output to be mixed into a single composite audio stream that is distributed to all nodes. A single distribution tree is used that has optimal communications characteristics to distribute the composite audio signal to all nodes. An audio mixing tree is established and maintained by adaptively and dynamically adding and merging intermediate mixing nodes operating between user nodes and the root of the single distribution tree. The intermediate mixing nodes and the root of the single distribution tree are all hosted, in an exemplary embodiment, on user nodes that are endpoints of the distribution tree.
    Type: Application
    Filed: March 10, 2006
    Publication date: September 13, 2007
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Xiaohui Gu, Zon-Yin Shae, Zhen Wen, Philip Yu
  • Publication number: 20070118539
    Abstract: Techniques for community discovery in a network are disclosed. For example, a technique for discovering a community around a given entity in an interaction graph, wherein nodes in the graph represent entities and edges connecting nodes in the graph represent interactions between connected nodes, comprises the following steps/operations. Nodes in the interaction graph are partitioned into different sets of nodes based on interaction information associated with each node to minimize a number of interaction pairs that need to be considered. An objective function is minimized by moving entities between the different sets such that the community is discovered once a measure associated with the objective function is minimized.
    Type: Application
    Filed: November 18, 2005
    Publication date: May 24, 2007
    Applicant: International Business Machines Corporation
    Inventors: Kirsten Hildrum, Philip Yu
  • Publication number: 20070043565
    Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.
    Type: Application
    Filed: August 22, 2005
    Publication date: February 22, 2007
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20070016560
    Abstract: A computer implemented method, apparatus, and computer usable program code for performing load diffusion to process data stream pairs. A data stream pair is received for correlation. The data stream pair is partitioned into portions to meet correlation constraints for correlating data in the data stream pair to form a partitioned data stream pair. The partitioned data stream pair is sent to a set of nodes for correlation processing to perform the load diffusion.
    Type: Application
    Filed: July 15, 2005
    Publication date: January 18, 2007
    Applicant: International Business Machines Corporation
    Inventors: Xiaohui Gu, Philip Yu
  • Publication number: 20060287984
    Abstract: Range query techniques are disclosed for use in accordance with data stream processing systems. In one aspect of the invention, a technique is provided for indexing continual range queries for use in data stream processing. For example, a technique for use in processing a data stream comprises obtaining at least one range query to be associated with the data stream, and building a range query index based on the at least one range query using one or more virtual constructs such that the query index is adaptive to one or more changes in a distribution of range query sizes. The step/operation of building the range query index may further comprise building the range query index such that the range query index accommodates one or more changes in query positions outside a monitoring area of the at least one range query. In another aspect of the invention, a technique is provided for incrementally processing continual range queries against moving objects.
    Type: Application
    Filed: June 17, 2005
    Publication date: December 21, 2006
    Applicant: International Business Machines Corporation
    Inventors: Shyh-Kwei Chen, Kun-Lung Wu, Philip Yu
  • Publication number: 20060282425
    Abstract: Techniques are disclosed for clustering and classifying stream data. By way of example, a technique for processing a data stream comprises the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure may be used for classification of data in the data stream.
    Type: Application
    Filed: April 20, 2005
    Publication date: December 14, 2006
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu