Patents by Inventor Philip S. Yu

Philip S. Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for tree structure indexing that provides at least one constraint sequence to preserve query-equivalence between xml document structure match and subsequence match

Patent number: 7475070

Abstract: Sequence-based XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically through subsequence matching. Herein, there is addressed the problem of query equivalence with respect to this transformation, and thereis introduced a performance-oriented principle for sequencing tree structures. With query equivalence, XML queries can be performed through subsequence matching without join operations, post-processing, or other special handling for problems such as false alarms. There is identified a class of sequencing methods for this purpose, and there is presented a novel subsequence matching algorithm that observe query equivalence. Also introduced is a performance-oriented principle to guide the sequencing of tree structures.

Type: Grant

Filed: January 14, 2005

Date of Patent: January 6, 2009

Assignee: International Business Machines Corporation

Inventors: Wei Fan, Haixun Wang, Philip S. Yu
System and method for continuous diagnosis of data streams

Patent number: 7464068

Abstract: In connection with the mining of time-evolving data streams, a general framework that mines changes and reconstructs models from a data stream with unlabeled instances or a limited number of labeled instances. In particular, there are defined herein statistical profiling methods that extend a classification tree in order to guess the percentage of drifts in the data stream without any labelled data. Exact error can be estimated by actively sampling a small number of true labels. If the estimated error is significantly higher than empirical expectations, there preferably re-sampled a small number of true labels to reconstruct the decision tree from the leaf node level.

Type: Grant

Filed: June 30, 2004

Date of Patent: December 9, 2008

Assignee: International Business Machines Corporation

Inventors: Wei Fan, Haixun Wang, Philip S. Yu
SYSTEMS AND METHODS FOR STRUCTURAL CLUSTERING OF TIME SEQUENCES

Publication number: 20080275671

Abstract: Arrangements and methods for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.

Type: Application

Filed: May 6, 2008

Publication date: November 6, 2008

Applicant: International Business Machines Corporation

Inventors: Vittorio Castelli, Michail Vlachos, Philip S. Yu
METHOD AND APPARATUS FOR ADAPTIVE IN-OPERATOR LOAD SHEDDING

Publication number: 20080270640

Abstract: One embodiment of the present method and apparatus adaptive in-operator load shedding includes receiving at least two data streams (each comprising a plurality of tuples, or data items) into respective sliding windows of memory. A throttling fraction is then calculated based on input rates associated with the data streams and on currently available processing resources. Tuples are then selected for processing from the data streams in accordance with the throttling fraction, where the selected tuples represent a subset of all tuples contained within the sliding window.

Type: Application

Filed: June 30, 2008

Publication date: October 30, 2008

Inventors: BUGRA GEDIK, Kun-Lung Wu, Philip S. Yu
SYSTEM AND METHOD FOR RANKED KEYWORD SEARCH ON GRAPHS

Publication number: 20080243811

Abstract: Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.

Type: Application

Filed: March 29, 2007

Publication date: October 2, 2008

Applicant: IBM Corporation

Inventors: Hao He, Philip S. Yu, Haixun Wang
METHOD, SYSTEM, AND STORAGE MEDIUM FOR IMPLEMENTING A MULTI-STAGE, MULTI-CLASSIFICATION SALES OPPORTUNITY MODELING SYSTEM

Publication number: 20080215419

Abstract: A method for implementing a multi-stage, multi-classification sales opportunity modeling system. The method includes receiving operational data relating to past sales activities and receiving parameters identified as being relevant in determining a likelihood of whether exploitation of a sales opportunity will be successful. The method also includes generating a multi-stage model by applying the operational data and the parameters to an analytic engine for evaluating different factors affecting success of the sales opportunity.

Type: Application

Filed: May 14, 2008

Publication date: September 4, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jamshid A. Vayghan, Philip S. Yu
PRESERVING PRIVACY OF ONE-DIMENSIONAL DATA STREAMS USING DYNAMIC AUTOCORRELATION

Publication number: 20080205641

Abstract: A method, information processing system, and computer readable medium are provided for preserving privacy of one-dimensional nonstationary data streams. The method includes receiving a one-dimensional nonstationary data stream. A set of first-moment statistical values are calculated, for a given instant of sub-space of time, for the data. The first moment statistical values include a principal component for the sub-space of time. The data is perturbed with noise along the principal component in proportion to the first-moment of statistical values so that at least part of a set of second-moment statistical values for the data is perturbed by the noise only within a predetermined variance.

Type: Application

Filed: February 26, 2007

Publication date: August 28, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yuan-Chi Chang, Feifei Li, Spyridon Papadimitriou, George A. Mihaila, Ioana Stanoi, Jimeng Sun, Philip S. Yu
PRESERVING PRIVACY OF DATA STREAMS USING DYNAMIC CORRELATIONS

Publication number: 20080209568

Abstract: Disclosed is a method, information processing system, and computer readable medium for preserving privacy of nonstationary data streams. The method includes receiving at least one nonstationary data stream with time dependent data. Calculating, for a given instant of sub-space of time, A set of first-moment statistical values is calculated, for a given instant of sub-space of time, for the data. The first moment statistical values include a principal component for the sub-space of time. The data is perturbed with noise along the principal component in proportion to the first-moment of statistical values so that at least part of a set of second-moment statistical values for the data is perturbed by the noise only within a predetermined variance.

Type: Application

Filed: February 26, 2007

Publication date: August 28, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yuan-Chi Chang, Feifei Li, Spyridon Papadimitriou, George A. Mihaila, Ioana Stanoi, Jimeng Sun, Philip S. Yu
SYSTEMS AND METHODS FOR OPTIMAL COMPONENT COMPOSITION IN A STREAM PROCESSING SYSTEM

Publication number: 20080188987

Abstract: A system and method are provided for optimizing component composition in a distributed stream-processing environment having a plurality of nodes capable of being associated with one or more of a plurality of stream processing components. The system includes an adaptive composition probing (ACP) module and a hierarchical state manager. The ACP module probes a subset of the plurality of stream processing components to determine the optimal component composition in response to a stream processing request. The hierarchical state manager manages local and global information for use by said ACP module in determining the optimal component composition.

Type: Application

Filed: April 2, 2008

Publication date: August 7, 2008

Inventors: Xiaohui Gu, Philip S. Yu
Query integrity assurance in database outsourcing

Publication number: 20080183656

Abstract: A method, system and computer program product for confirming the validity of data returned from a data store. A data store contains a primary data set encrypted using a first encryption and a secondary data set using a second encryption. The secondary data set is a subset of the primary data set. A client issues a substantive query against the data store to retrieve a primary data result belonging to the primary data set. A query interface issues at least one validating query against the data store. Each validating query returns a secondary data result belonging to the secondary data set. The query interface receives the secondary data result and provides a data invalid notification if data satisfying the substantive query included in an unencrypted form of the secondary data result is not contained in an unencrypted form of the primary data result.

Type: Application

Filed: January 25, 2007

Publication date: July 31, 2008

Inventors: Chang-Shing Perng, Haixun Wang, Jian Yin, Philip S. Yu
PEER-TO-PEER MULTI-PARTY VOICE-OVER-IP SERVICES

Publication number: 20080177833

Abstract: A system and computer program product for establishing multi-party VoIP conference audio calls in a distributed, peer-to-peer network where any number of nodes are able to arbitrarily and asynchronously start or stop producing audio output to be mixed into a single composite audio stream that is distributed to all nodes. A single distribution tree is used that has optimal communications characteristics to distribute the composite audio signal to all nodes. An audio mixing tree is established and maintained by adaptively and dynamically adding and merging intermediate mixing nodes operating between user nodes and the root of the single distribution tree. The intermediate mixing nodes and the root of the single distribution tree are all hosted, in an exemplary embodiment, on user nodes that are endpoints of the distribution tree.

Type: Application

Filed: February 27, 2008

Publication date: July 24, 2008

Applicant: International Business Machines Corp.

Inventors: XIAOHUI GU, Zon-Yin Shae, Zhen Wen, Philip S. Yu
METHOD AND APPARATUS FOR PROVIDING LOAD DIFFUSION IN DATA STREAM CORRELATIONS

Publication number: 20080168179

Abstract: A computer implemented method, apparatus, and computer usable program code for performing load diffusion to process data stream pairs. A data stream pair is received for correlation. The data stream pair is partitioned into portions to meet correlation constraints for correlating data in the data stream pair to form a partitioned data stream pair. The partitioned data stream pair is sent to a set of nodes for correlation processing to perform the load diffusion.

Type: Application

Filed: March 24, 2008

Publication date: July 10, 2008

Inventors: XIAOHUI GU, Philip S. Yu
SYSTEMS AND METHODS FOR SIMULTANEOUS SUMMARIZATION OF DATA CUBE STREAMS

Publication number: 20080168375

Abstract: In an exemplary embodiment, some of the main aspects of the present invention are the following: (i) Data model: We introduce tensor streams to deal with large collections of multi-aspect streams; and (ii) Algorithmic framework: We propose window-based tensor analysis (WTA) to effectively extract core patterns from tensor streams. The tensor representation is related to data cube in On-Line Analytical Processing (OLAP). However, our present invention focuses on constructing simple summaries for each window, rather than merely organizing the data to produce simple aggregates along each aspect or combination of aspects.

Type: Application

Filed: January 7, 2007

Publication date: July 10, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Spyridon Papadimitriou, Jimeng Sun, Philip S. Yu
System and method for providing service for searching web site addresses

Patent number: 7383299

Abstract: A method for searching for a partially specified Uniform Resource Locator (URL) addresses includes receiving a user request, from a user, including a partially specified URL address. A URL search request handler is invoked to search for the partially specified URL address within an inverted index of web site URLs. A web search request handler is invoked to rank the search results of the search for the partially specified URL address based on one or more keywords specified in the user request, a list of recently accessed URLs, and a user profile. Search results are returned to the user comprising a list of URL addresses based on the search for the partially specified URL and ranked based on the user search data.

Type: Grant

Filed: May 5, 2000

Date of Patent: June 3, 2008

Assignee: International Business Machines Corporation

Inventors: Brent Hailpern, Philip S. Yu
System and method for peer-to-peer multi-party voice-over-IP services

Patent number: 7379450

Abstract: A system, method, and computer program product for establishing multi-party VoIP conference audio calls in a distributed, peer-to-peer network where any number of nodes are able to arbitrarily and asynchronously start or stop producing audio output to be mixed into a single composite audio stream that is distributed to all nodes. A single distribution tree is used that has optimal communications characteristics to distribute the composite audio signal to all nodes. An audio mixing tree is established and maintained by adaptively and dynamically adding and merging intermediate mixing nodes operating between user nodes and the root of the single distribution tree. The intermediate mixing nodes and the root of the single distribution tree are all hosted, in an exemplary embodiment, on user nodes that are endpoints of the distribution tree.

Type: Grant

Filed: March 10, 2006

Date of Patent: May 27, 2008

Assignee: International Business Machines Corporation

Inventors: Xiaohui Gu, Zon-Yin Shae, Zhen Wen, Philip S. Yu
Systems and methods for structural clustering of time sequences

Patent number: 7369961

Abstract: Arrangements and methods for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.

Type: Grant

Filed: March 31, 2005

Date of Patent: May 6, 2008

Assignee: International Business Machines Corporation

Inventors: Vittorio Castelli, Michail Vlachos, Philip S. Yu
MODEL-BASED SELF-OPTIMIZING DISTRIBUTED INFORMATION MANAGEMENT

Publication number: 20080086469

Abstract: Disclosed are a method, information processing system, and computer readable medium for managing data collection in a distributed processing system. The method includes dynamically collecting at least one statistical query pattern associated with a selected group of information processing nodes. The statistical query pattern is dynamically collected from a plurality of information processing nodes in a distributed processing system. At least one operating attribute distribution associated with an operating attribute that has been queried for the selected group is dynamically monitored. The selected group is dynamically configured, based on the query pattern and the operating attribute distribution, to periodically push a set of attributes associated with the each information processing node in the selected group.

Type: Application

Filed: October 4, 2006

Publication date: April 10, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Xiaohui Gu, Philip S. Yu, Shu-Ping Chang
Method and apparatus for web farm traffic control

Patent number: 7356592

Abstract: Disclosed is a method for controlling a web farm having a plurality of websites and servers, the method comprising categorizing customer requests received from said websites into a plurality of categories, said categories comprising a shareable customer requests and unshareable customer requests, routing said shareable customer requests such that any of said servers may process shareable customer requests received from different said websites, and routing said unshareable customer requests from specific said websites only to specific servers to which said specific websites have been assigned.

Type: Grant

Filed: January 24, 2002

Date of Patent: April 8, 2008

Assignee: International Business Machines Corporation

Inventors: Joel L. Wolf, Philip S. Yu
Systems and methods for condensation-based privacy in strings

Publication number: 20080082566

Abstract: Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.

Type: Application

Filed: September 30, 2006

Publication date: April 3, 2008

Applicant: IBM Corporation

Inventors: Charu C. Aggarwal, Philip S. Yu
Systems and methods for sequential modeling in less than one sequential scan

Patent number: 7337161

Abstract: Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.

Type: Grant

Filed: July 30, 2004

Date of Patent: February 26, 2008

Assignee: International Business Machines Corporation

Inventors: Wei Fan, Haixun Wang, Philip S. Yu

prev 1 2 3 4 5 6 7 next