Patents by Inventor Philip S. Yu

Philip S. Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7475070
    Abstract: Sequence-based XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically through subsequence matching. Herein, there is addressed the problem of query equivalence with respect to this transformation, and thereis introduced a performance-oriented principle for sequencing tree structures. With query equivalence, XML queries can be performed through subsequence matching without join operations, post-processing, or other special handling for problems such as false alarms. There is identified a class of sequencing methods for this purpose, and there is presented a novel subsequence matching algorithm that observe query equivalence. Also introduced is a performance-oriented principle to guide the sequencing of tree structures.
    Type: Grant
    Filed: January 14, 2005
    Date of Patent: January 6, 2009
    Assignee: International Business Machines Corporation
    Inventors: Wei Fan, Haixun Wang, Philip S. Yu
  • Patent number: 7464068
    Abstract: In connection with the mining of time-evolving data streams, a general framework that mines changes and reconstructs models from a data stream with unlabeled instances or a limited number of labeled instances. In particular, there are defined herein statistical profiling methods that extend a classification tree in order to guess the percentage of drifts in the data stream without any labelled data. Exact error can be estimated by actively sampling a small number of true labels. If the estimated error is significantly higher than empirical expectations, there preferably re-sampled a small number of true labels to reconstruct the decision tree from the leaf node level.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: December 9, 2008
    Assignee: International Business Machines Corporation
    Inventors: Wei Fan, Haixun Wang, Philip S. Yu
  • Publication number: 20080275671
    Abstract: Arrangements and methods for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.
    Type: Application
    Filed: May 6, 2008
    Publication date: November 6, 2008
    Applicant: International Business Machines Corporation
    Inventors: Vittorio Castelli, Michail Vlachos, Philip S. Yu
  • Publication number: 20080270640
    Abstract: One embodiment of the present method and apparatus adaptive in-operator load shedding includes receiving at least two data streams (each comprising a plurality of tuples, or data items) into respective sliding windows of memory. A throttling fraction is then calculated based on input rates associated with the data streams and on currently available processing resources. Tuples are then selected for processing from the data streams in accordance with the throttling fraction, where the selected tuples represent a subset of all tuples contained within the sliding window.
    Type: Application
    Filed: June 30, 2008
    Publication date: October 30, 2008
    Inventors: BUGRA GEDIK, Kun-Lung Wu, Philip S. Yu
  • Publication number: 20080243811
    Abstract: Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.
    Type: Application
    Filed: March 29, 2007
    Publication date: October 2, 2008
    Applicant: IBM Corporation
    Inventors: Hao He, Philip S. Yu, Haixun Wang
  • Publication number: 20080215419
    Abstract: A method for implementing a multi-stage, multi-classification sales opportunity modeling system. The method includes receiving operational data relating to past sales activities and receiving parameters identified as being relevant in determining a likelihood of whether exploitation of a sales opportunity will be successful. The method also includes generating a multi-stage model by applying the operational data and the parameters to an analytic engine for evaluating different factors affecting success of the sales opportunity.
    Type: Application
    Filed: May 14, 2008
    Publication date: September 4, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jamshid A. Vayghan, Philip S. Yu
  • Publication number: 20080205641
    Abstract: A method, information processing system, and computer readable medium are provided for preserving privacy of one-dimensional nonstationary data streams. The method includes receiving a one-dimensional nonstationary data stream. A set of first-moment statistical values are calculated, for a given instant of sub-space of time, for the data. The first moment statistical values include a principal component for the sub-space of time. The data is perturbed with noise along the principal component in proportion to the first-moment of statistical values so that at least part of a set of second-moment statistical values for the data is perturbed by the noise only within a predetermined variance.
    Type: Application
    Filed: February 26, 2007
    Publication date: August 28, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yuan-Chi Chang, Feifei Li, Spyridon Papadimitriou, George A. Mihaila, Ioana Stanoi, Jimeng Sun, Philip S. Yu
  • Publication number: 20080209568
    Abstract: Disclosed is a method, information processing system, and computer readable medium for preserving privacy of nonstationary data streams. The method includes receiving at least one nonstationary data stream with time dependent data. Calculating, for a given instant of sub-space of time, A set of first-moment statistical values is calculated, for a given instant of sub-space of time, for the data. The first moment statistical values include a principal component for the sub-space of time. The data is perturbed with noise along the principal component in proportion to the first-moment of statistical values so that at least part of a set of second-moment statistical values for the data is perturbed by the noise only within a predetermined variance.
    Type: Application
    Filed: February 26, 2007
    Publication date: August 28, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yuan-Chi Chang, Feifei Li, Spyridon Papadimitriou, George A. Mihaila, Ioana Stanoi, Jimeng Sun, Philip S. Yu
  • Publication number: 20080188987
    Abstract: A system and method are provided for optimizing component composition in a distributed stream-processing environment having a plurality of nodes capable of being associated with one or more of a plurality of stream processing components. The system includes an adaptive composition probing (ACP) module and a hierarchical state manager. The ACP module probes a subset of the plurality of stream processing components to determine the optimal component composition in response to a stream processing request. The hierarchical state manager manages local and global information for use by said ACP module in determining the optimal component composition.
    Type: Application
    Filed: April 2, 2008
    Publication date: August 7, 2008
    Inventors: Xiaohui Gu, Philip S. Yu
  • Publication number: 20080183656
    Abstract: A method, system and computer program product for confirming the validity of data returned from a data store. A data store contains a primary data set encrypted using a first encryption and a secondary data set using a second encryption. The secondary data set is a subset of the primary data set. A client issues a substantive query against the data store to retrieve a primary data result belonging to the primary data set. A query interface issues at least one validating query against the data store. Each validating query returns a secondary data result belonging to the secondary data set. The query interface receives the secondary data result and provides a data invalid notification if data satisfying the substantive query included in an unencrypted form of the secondary data result is not contained in an unencrypted form of the primary data result.
    Type: Application
    Filed: January 25, 2007
    Publication date: July 31, 2008
    Inventors: Chang-Shing Perng, Haixun Wang, Jian Yin, Philip S. Yu
  • Publication number: 20080177833
    Abstract: A system and computer program product for establishing multi-party VoIP conference audio calls in a distributed, peer-to-peer network where any number of nodes are able to arbitrarily and asynchronously start or stop producing audio output to be mixed into a single composite audio stream that is distributed to all nodes. A single distribution tree is used that has optimal communications characteristics to distribute the composite audio signal to all nodes. An audio mixing tree is established and maintained by adaptively and dynamically adding and merging intermediate mixing nodes operating between user nodes and the root of the single distribution tree. The intermediate mixing nodes and the root of the single distribution tree are all hosted, in an exemplary embodiment, on user nodes that are endpoints of the distribution tree.
    Type: Application
    Filed: February 27, 2008
    Publication date: July 24, 2008
    Applicant: International Business Machines Corp.
    Inventors: XIAOHUI GU, Zon-Yin Shae, Zhen Wen, Philip S. Yu
  • Publication number: 20080168179
    Abstract: A computer implemented method, apparatus, and computer usable program code for performing load diffusion to process data stream pairs. A data stream pair is received for correlation. The data stream pair is partitioned into portions to meet correlation constraints for correlating data in the data stream pair to form a partitioned data stream pair. The partitioned data stream pair is sent to a set of nodes for correlation processing to perform the load diffusion.
    Type: Application
    Filed: March 24, 2008
    Publication date: July 10, 2008
    Inventors: XIAOHUI GU, Philip S. Yu
  • Publication number: 20080168375
    Abstract: In an exemplary embodiment, some of the main aspects of the present invention are the following: (i) Data model: We introduce tensor streams to deal with large collections of multi-aspect streams; and (ii) Algorithmic framework: We propose window-based tensor analysis (WTA) to effectively extract core patterns from tensor streams. The tensor representation is related to data cube in On-Line Analytical Processing (OLAP). However, our present invention focuses on constructing simple summaries for each window, rather than merely organizing the data to produce simple aggregates along each aspect or combination of aspects.
    Type: Application
    Filed: January 7, 2007
    Publication date: July 10, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Spyridon Papadimitriou, Jimeng Sun, Philip S. Yu
  • Patent number: 7383299
    Abstract: A method for searching for a partially specified Uniform Resource Locator (URL) addresses includes receiving a user request, from a user, including a partially specified URL address. A URL search request handler is invoked to search for the partially specified URL address within an inverted index of web site URLs. A web search request handler is invoked to rank the search results of the search for the partially specified URL address based on one or more keywords specified in the user request, a list of recently accessed URLs, and a user profile. Search results are returned to the user comprising a list of URL addresses based on the search for the partially specified URL and ranked based on the user search data.
    Type: Grant
    Filed: May 5, 2000
    Date of Patent: June 3, 2008
    Assignee: International Business Machines Corporation
    Inventors: Brent Hailpern, Philip S. Yu
  • Patent number: 7379450
    Abstract: A system, method, and computer program product for establishing multi-party VoIP conference audio calls in a distributed, peer-to-peer network where any number of nodes are able to arbitrarily and asynchronously start or stop producing audio output to be mixed into a single composite audio stream that is distributed to all nodes. A single distribution tree is used that has optimal communications characteristics to distribute the composite audio signal to all nodes. An audio mixing tree is established and maintained by adaptively and dynamically adding and merging intermediate mixing nodes operating between user nodes and the root of the single distribution tree. The intermediate mixing nodes and the root of the single distribution tree are all hosted, in an exemplary embodiment, on user nodes that are endpoints of the distribution tree.
    Type: Grant
    Filed: March 10, 2006
    Date of Patent: May 27, 2008
    Assignee: International Business Machines Corporation
    Inventors: Xiaohui Gu, Zon-Yin Shae, Zhen Wen, Philip S. Yu
  • Patent number: 7369961
    Abstract: Arrangements and methods for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.
    Type: Grant
    Filed: March 31, 2005
    Date of Patent: May 6, 2008
    Assignee: International Business Machines Corporation
    Inventors: Vittorio Castelli, Michail Vlachos, Philip S. Yu
  • Publication number: 20080086469
    Abstract: Disclosed are a method, information processing system, and computer readable medium for managing data collection in a distributed processing system. The method includes dynamically collecting at least one statistical query pattern associated with a selected group of information processing nodes. The statistical query pattern is dynamically collected from a plurality of information processing nodes in a distributed processing system. At least one operating attribute distribution associated with an operating attribute that has been queried for the selected group is dynamically monitored. The selected group is dynamically configured, based on the query pattern and the operating attribute distribution, to periodically push a set of attributes associated with the each information processing node in the selected group.
    Type: Application
    Filed: October 4, 2006
    Publication date: April 10, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Xiaohui Gu, Philip S. Yu, Shu-Ping Chang
  • Patent number: 7356592
    Abstract: Disclosed is a method for controlling a web farm having a plurality of websites and servers, the method comprising categorizing customer requests received from said websites into a plurality of categories, said categories comprising a shareable customer requests and unshareable customer requests, routing said shareable customer requests such that any of said servers may process shareable customer requests received from different said websites, and routing said unshareable customer requests from specific said websites only to specific servers to which said specific websites have been assigned.
    Type: Grant
    Filed: January 24, 2002
    Date of Patent: April 8, 2008
    Assignee: International Business Machines Corporation
    Inventors: Joel L. Wolf, Philip S. Yu
  • Publication number: 20080082566
    Abstract: Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.
    Type: Application
    Filed: September 30, 2006
    Publication date: April 3, 2008
    Applicant: IBM Corporation
    Inventors: Charu C. Aggarwal, Philip S. Yu
  • Patent number: 7337161
    Abstract: Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.
    Type: Grant
    Filed: July 30, 2004
    Date of Patent: February 26, 2008
    Assignee: International Business Machines Corporation
    Inventors: Wei Fan, Haixun Wang, Philip S. Yu