Patents by Inventor Philip Yu

Philip Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20060282425
    Abstract: Techniques are disclosed for clustering and classifying stream data. By way of example, a technique for processing a data stream comprises the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure may be used for classification of data in the data stream.
    Type: Application
    Filed: April 20, 2005
    Publication date: December 14, 2006
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20060271304
    Abstract: A method which identifies different types of substructures within a graph and encodes them using techniques suitable to the characteristics of each of them. The method is embodied by an efficient two-phase algorithm, where the first phase identifies and encodes strongly connected components as well as tree substructures, and the second phase encodes the remaining reachability relationships by compressing dense rectangular submatrices in the transitive closure matrix.
    Type: Application
    Filed: May 31, 2005
    Publication date: November 30, 2006
    Applicant: IBM Corporation
    Inventors: Hao He, Haixun Wang, Philip Yu
  • Publication number: 20060224356
    Abstract: Arrangements and methods for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.
    Type: Application
    Filed: March 31, 2005
    Publication date: October 5, 2006
    Applicant: IBM Corporation
    Inventors: Vittorio Castelli, Michail Vlaschos, Philip Yu
  • Publication number: 20060224562
    Abstract: Techniques for similarity searching are provided. In one aspect, a method of searching structural data in a database against one or more structural queries comprises the following steps. A desired minimum degree of similarity between the one or more queries and the structural data in the database is first specified. One or more indices are then used to exclude from consideration any structural data in the database that does not share the minimum degree of similarity with one or more of the queries.
    Type: Application
    Filed: March 31, 2005
    Publication date: October 5, 2006
    Applicant: International Business Machines Corporation
    Inventors: Xifeng Yan, Philip Yu
  • Publication number: 20060212337
    Abstract: A method (and system) of assigning a sales opportunity, includes creating an assignment model based on clustering historical sales opportunities, and providing a scoring mechanism on a plurality of sales agents for automatically optimizing an assignment of at least one sales opportunity to at least one of the plurality of sales agents.
    Type: Application
    Filed: March 16, 2005
    Publication date: September 21, 2006
    Applicant: International Business Machines Corporation
    Inventors: Jamshid Vayghan, Philip Yu
  • Publication number: 20060200251
    Abstract: A system and method are provided for optimizing component composition in a distributed stream-processing environment having a plurality of nodes capable of being associated with one or more of a plurality of stream processing components. The system includes an adaptive composition probing (ACP) module and a hierarchical state manager. The ACP module probes a subset of the plurality of stream processing components to determine the optimal component composition in response to a stream processing request. The hierarchical state manager manages local and global information for use by said ACP module in determining the optimal component composition.
    Type: Application
    Filed: March 1, 2005
    Publication date: September 7, 2006
    Inventors: Xiaohui Gu, Philip Yu
  • Publication number: 20060195599
    Abstract: One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output.
    Type: Application
    Filed: February 28, 2005
    Publication date: August 31, 2006
    Inventors: Bugra Gedik, Kun-Lung Wu, Philip Yu
  • Publication number: 20060190430
    Abstract: Systems and methods are provided for resource adaptive workload management. In a method thereof, at least one execution objective is received for at least one of a plurality of queries under execution. A progress status of, and an amount of resource consumed by, each of the plurality of queries are monitored. A remaining resource requirement for each of the plurality of queries is estimated, based on the progress status of, and the amount of resource consumed by, each of the plurality of queries. Resource allocation is adjusted based on the at least one execution objective and the estimates of the remaining resource requirements.
    Type: Application
    Filed: February 22, 2005
    Publication date: August 24, 2006
    Inventors: Gang Luo, Philip Yu
  • Publication number: 20060184527
    Abstract: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.
    Type: Application
    Filed: February 16, 2005
    Publication date: August 17, 2006
    Applicant: IBM Corporation
    Inventors: Yun Chi, Haixun Wang, Philip Yu
  • Publication number: 20060174024
    Abstract: Towards mining closed frequent itemsets over a sliding window using limited memory space, a synopsis data structure to monitor transactions in the sliding window so that one can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets, but monitoring only frequent itemsets makes it difficult to detect new itemsets when they become frequent. Herein, there is introduced a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of itemsets over a sliding-window. The selected itemsets include a boundary between closed frequent itemsets and the rest of the itemsets Because the boundary is relatively stable, the cost of mining closed frequent itemsets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET.
    Type: Application
    Filed: January 31, 2005
    Publication date: August 3, 2006
    Applicant: IBM Corporation
    Inventors: Yun Chi, Haixun Wang, Philip Yu
  • Publication number: 20060161575
    Abstract: Sequence-based XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically through subsequence matching. Herein, there is addresed the problem of query equivalence with respect to this transformation, and thereis introduced a performance-oriented principle for sequencing tree structures. With query equivalence, XML queries can be performed through subsequence matching without join operations, post-processing, or other special handling for problems such as false alarms. There is identified a class of sequencing methods for this purpose, and there is presented a novel subsequence matching algorithm that observe query equivalence. Also introduced is a performance-oriented principle to guide the sequencing of tree structures.
    Type: Application
    Filed: January 14, 2005
    Publication date: July 20, 2006
    Applicant: IBM Corporation
    Inventors: Wei Fan, Haixun Wang, Philip Yu
  • Publication number: 20060132326
    Abstract: An improved universal remote control unit (URC) for controlling electronic appliance units. The URC unit has the typical remote controller module for controlling appliances such as TV, stereo, VCR or DVD. Additionally, the URC has a scratch pad memory for storing telephone numbers and web site information entered through the URC unit's alphanumeric keys. When activated, the key pad entries are stored in the memory, instead of being used to control the appliance. The URC unit further has a digital recorder module that can be implemented with a microphone, a voice recorder chip and a speaker, all integrated with the URC unit. The digital recorder module can even use the battery that is typically used by the URC unit. The URC unit further has a display screen to display the information stored in and recalled from the memory.
    Type: Application
    Filed: December 23, 2005
    Publication date: June 22, 2006
    Inventors: Calvin Fang, Philip Yu
  • Publication number: 20060106666
    Abstract: A method for implementing a multi-stage, multi-classification sales opportunity modeling system. The method includes receiving operational data relating to past sales activities and receiving parameters identified as being relevant in determining a likelihood of whether exploitation of a sales opportunity will be successful. The method also includes generating a multi-stage model by applying the operational data and the parameters to an analytic engine for evaluating different factors affecting success of the sales opportunity.
    Type: Application
    Filed: November 15, 2004
    Publication date: May 18, 2006
    Applicant: International Business Machines Corporation
    Inventors: Jamshid Vayghan, Philip Yu
  • Publication number: 20060101056
    Abstract: Techniques are provided for performing structural joins for answering containment queries. Such inventive techniques may be used to perform efficient structural joins of two interval lists which are neither sorted nor pre-indexed. For example, in an illustrative aspect of the invention, a technique for performing structural joins of two element sets of a tree-structured document, wherein one of the two element sets is an ancestor element set and the other of the two element sets is a descendant element set, and further wherein each element is represented as an interval representing a start position and an end position of the element in the document, comprises the following steps/operations. An index is dynamically built for the ancestor element set. Then, one or more structural joins are performed by searching the index with the interval start position of each element in the descendant element set.
    Type: Application
    Filed: November 5, 2004
    Publication date: May 11, 2006
    Applicant: International Business Machines Corporation
    Inventors: Shyh-Kwei Chen, Kun-Lung Wu, Philip Yu
  • Publication number: 20060101045
    Abstract: Interval query indexing techniques for use in accordance with data stream processing systems are disclosed. For example, in an illustrative aspect of the invention, a technique for use in processing a data stream comprises the following steps/operations. First, an attribute range of query intervals associated with the data stream is partitioned into one or more segments. Then, a set of virtual intervals is defined for each of the one or more segments. A query interval index is then built using the set of virtual intervals. The query interval index may be built by decomposing each query interval into one or more of the virtual intervals, and associating a query identifier with the decomposed virtual intervals.
    Type: Application
    Filed: November 5, 2004
    Publication date: May 11, 2006
    Applicant: International Business Machines Corporation
    Inventors: Shyh-Kwei Chen, Kun-Lung Wu, Philip Yu
  • Publication number: 20060036564
    Abstract: Techniques for graph indexing are provided. In one aspect, a method for indexing graphs in a database, the graphs comprising graphic data, comprises the following steps. Frequent subgraphs among one or more of the graphs in the database are identified, the frequent subgraphs appearing in at least a threshold number of the graphs in the database. One or more of the frequent subgraphs are used to create an index of the graphs in the database.
    Type: Application
    Filed: April 30, 2004
    Publication date: February 16, 2006
    Applicant: International Business Machines Corporation
    Inventors: Xifeng Yan, Philip Yu
  • Publication number: 20060026110
    Abstract: Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.
    Type: Application
    Filed: July 30, 2004
    Publication date: February 2, 2006
    Applicant: IBM Corporation
    Inventors: Wei Fan, Haixun Wang, Philip Yu
  • Publication number: 20060015474
    Abstract: Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.
    Type: Application
    Filed: July 16, 2004
    Publication date: January 19, 2006
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20060010093
    Abstract: In connection with the mining of time-evolving data streams, a general framework that mines changes and reconstructs models from a data stream with unlabeled instances or a limited number of labeled instances. In particular, there are defined herein statistical profiling methods that extend a classification tree in order to guess the percentage of drifts in the data stream without any labelled data. Exact error can be estimated by actively sampling a small number of true labels. If the estimated error is significantly higher than empirical expectations, there preferably re-sampled a small number of true labels to reconstruct the decision tree from the leaf node level.
    Type: Application
    Filed: June 30, 2004
    Publication date: January 12, 2006
    Applicant: IBM Corporation
    Inventors: Wei Fan, Haixun Wang, Philip Yu
  • Publication number: 20060004754
    Abstract: A technique for classifying data from a test data stream is provided. A stream of training data having class labels is received. One or more class-specific clusters of the training data are determined and stored. At least one test instance of the test data stream is classified using the one or more class-specific clusters.
    Type: Application
    Filed: June 30, 2004
    Publication date: January 5, 2006
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu