Patents by Inventor Philip S. Yu

Philip S. Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8010541
    Abstract: Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.
    Type: Grant
    Filed: September 30, 2006
    Date of Patent: August 30, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip S. Yu
  • Patent number: 7945570
    Abstract: A computer-implemented method, system, and a computer readable article of manufacture identify local patterns in at least one time series data stream. A data stream is received that comprises at least one set of time series data. The at least one set of time series data is formed into a set of multiple ordered levels of time series data. Multiple ordered levels of hierarchical approximation functions are generated directly from the multiple ordered levels of time series data. A set of approximating functions are created for each level. A current window with a current window length is selected from a set of varying window lengths. The set of approximating functions created at one level in the multiple ordered levels is passed to a subsequent level as a set of time series data. The multiple ordered levels of hierarchical approximation functions are stored into memory after being generated.
    Type: Grant
    Filed: August 31, 2009
    Date of Patent: May 17, 2011
    Assignee: International Business Machines Corporation
    Inventors: Spyridon Papadimitriou, Philip S. Yu
  • Patent number: 7941387
    Abstract: A method is provided for generating a resource function estimate of resource usage by an instance of a processing element configured to consume zero or more input data streams in a stream processing system having a set of available resources that comprises receiving at least one specified performance metric for the zero or more input data streams and a processing power of the set of available resources, wherein one specified performance metric is stream rate; generating a multi-part signature of executable-specific information for the processing element and a multi-part signature of context-specific information for the instance; accessing a database of resource functions to identify a static resource function corresponding to the executable-specific information and a context-dependent resource function corresponding to the context-specific information; combining the static resource function and the context-dependent resource function to form a composite resource function for the instance; and applying the res
    Type: Grant
    Filed: November 5, 2007
    Date of Patent: May 10, 2011
    Assignee: International Business Machines Corporation
    Inventors: Lisa Amini, Henrique Andrade, Wei Fan, James R. Giles, Kirsten W. Hildrum, Deepak Rajan, Deepak S. Turaga, Rohit Wagle, Joel L. Wolf, Philip S. Yu
  • Patent number: 7933740
    Abstract: Arrangements and methods for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.
    Type: Grant
    Filed: August 31, 2009
    Date of Patent: April 26, 2011
    Assignee: International Business Machines Corporation
    Inventors: Vittorio Castelli, Michail Vlachos, Philip S. Yu
  • Patent number: 7904397
    Abstract: A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset of examples into a plurality of subsets of data and generating, using a processor on a computer, a learning model using examples of a first subset of data of the plurality of subsets of data. The learning model being generated for the first subset comprises an initial stage of an evolving aggregate learning model (ensemble model) for an entirety of the dataset, the ensemble model thereby providing an evolving estimated learning model for the entirety of the dataset if all the subsets were to be processed. The generating of the learning model using data from a subset includes calculating a value for at least one parameter that provides an objective indication of an adequacy of a current stage of the ensemble model.
    Type: Grant
    Filed: January 20, 2010
    Date of Patent: March 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Wei Fan, Haixun Wang, Philip S. Yu
  • Patent number: 7904471
    Abstract: Privacy in data mining of sparse high dimensional data records is preserved by transforming the data records into anonymized data records. This transformation involves creating a sketch-based private representation of each data record, each data record containing only a small number of non-zero attribute value in relation to the high dimensionality of the data records.
    Type: Grant
    Filed: August 9, 2007
    Date of Patent: March 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip S. Yu
  • Patent number: 7900147
    Abstract: A system and method for supporting offline Web browsing. A user interests profile comprising content and attribute preferences of Web pages the user may be interested in is provided. Based on that user's profile, there is generated an interestingness values value for each of candidate Web pages. From a hoard request initiated by a user, received one or more Web pages are selected and downloaded based on their respective interestingness values. These Web pages are stored for later viewing by the user when offline. The candidate Web pages include base Web pages which are supplied by the user in the hoard request, and linked Web pages which are reachable from the base pages. Thus, an interestingness value may be computed as the interestingness of a hyperlink associated with a Web page reachable from a base Web page, the interestingness value of a hyperlink being based upon the similarity of a linked Web page to the base Web page and/or to that user's interests profile.
    Type: Grant
    Filed: July 22, 2002
    Date of Patent: March 1, 2011
    Assignee: International Business Machines Corporation
    Inventors: Hui Lei, Yiming Ye, Philip S. Yu
  • Patent number: 7890294
    Abstract: Arrangements are provided for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.
    Type: Grant
    Filed: May 5, 2008
    Date of Patent: February 15, 2011
    Assignee: International Business Machines Corporation
    Inventors: Vittorio Castelli, Michail Vlachos, Philip S. Yu
  • Patent number: 7870398
    Abstract: A method, system and computer program product for confirming the validity of data returned from a data store. A data store contains a primary data set encrypted using a first encryption and a secondary data set using a second encryption. The secondary data set is a subset of the primary data set. A client issues a substantive query against the data store to retrieve a primary data result belonging to the primary data set. A query interface issues at least one validating query against the data store. Each validating query returns a secondary data result belonging to the secondary data set. The query interface receives the secondary data result and provides a data invalid notification if data satisfying the substantive query included in an unencrypted form of the secondary data result is not contained in an unencrypted form of the primary data result.
    Type: Grant
    Filed: January 25, 2007
    Date of Patent: January 11, 2011
    Assignee: International Business Machines Corporation
    Inventors: Chang-shing Perng, Haixun Wang, Jian Yin, Philip S. Yu
  • Patent number: 7853545
    Abstract: Disclosed is a method, information processing system, and computer readable medium for preserving privacy of nonstationary data streams. The method includes receiving at least one nonstationary data stream with time dependent data. Calculating, for a given instant of sub-space of time, A set of first-moment statistical values is calculated, for a given instant of sub-space of time, for the data. The first moment statistical values include a principal component for the sub-space of time. The data is perturbed with noise along the principal component in proportion to the first-moment of statistical values so that at least part of a set of second-moment statistical values for the data is perturbed by the noise only within a predetermined variance.
    Type: Grant
    Filed: February 26, 2007
    Date of Patent: December 14, 2010
    Assignee: International Business Machines Corporation
    Inventors: Yuan-Chi Chang, Feifei Li, Spyridon Papadimitriou, George A. Mihaila, Ioana Stanoi, Jimeng Sun, Philip S. Yu
  • Patent number: 7849138
    Abstract: A system and computer program product for establishing multi-party VoIP conference audio calls in a distributed, peer-to-peer network where any number of nodes are able to arbitrarily and asynchronously start or stop producing audio output to be mixed into a single composite audio stream that is distributed to all nodes. A single distribution tree is used that has optimal communications characteristics to distribute the composite audio signal to all nodes. An audio mixing tree is established and maintained by adaptively and dynamically adding and merging intermediate mixing nodes operating between user nodes and the root of the single distribution tree. The intermediate mixing nodes and the root of the single distribution tree are all hosted, in an exemplary embodiment, on user nodes that are endpoints of the distribution tree.
    Type: Grant
    Filed: February 27, 2008
    Date of Patent: December 7, 2010
    Assignee: International Business Machines Corporation
    Inventors: Xiaohui Gu, Zon-Yin Shae, Zhen Wen, Philip S. Yu
  • Patent number: 7840516
    Abstract: A method, information processing system, and computer readable medium are provided for preserving privacy of one-dimensional nonstationary data streams. The method includes receiving a one-dimensional nonstationary data stream. A set of first-moment statistical values are calculated, for a given instant of sub-space of time, for the data. The first moment statistical values include a principal component for the sub-space of time. The data is perturbed with noise along the principal component in proportion to the first-moment of statistical values so that at least part of a set of second-moment statistical values for the data is perturbed by the noise only within a predetermined variance.
    Type: Grant
    Filed: February 26, 2007
    Date of Patent: November 23, 2010
    Assignee: International Business Machines Corporation
    Inventors: Yuan-Chi Chang, Feifei Li, Spyridon Papadimitriou, George A. Mihaila, Ioana Stanoi, Jimeng Sun, Philip S. Yu
  • Patent number: 7822730
    Abstract: Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: October 26, 2010
    Assignee: International Business Machines Corporation
    Inventors: Wei Fan, Haixun Wang, Philip S. Yu
  • Publication number: 20100169252
    Abstract: A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset of examples into a plurality of subsets of data and generating, using a processor on a computer, a learning model using examples of a first subset of data of the plurality of subsets of data. The learning model being generated for the first subset comprises an initial stage of an evolving aggregate learning model (ensemble model) for an entirety of the dataset, the ensemble model thereby providing an evolving estimated learning model for the entirety of the dataset if all the subsets were to be processed. The generating of the learning model using data from a subset includes calculating a value for at least one parameter that provides an objective indication of an adequacy of a current stage of the ensemble model.
    Type: Application
    Filed: January 20, 2010
    Publication date: July 1, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Wei Fan, Haixun Wang, Philip S. Yu
  • Patent number: 7739331
    Abstract: A computer implemented method, apparatus, and computer usable program code for performing load diffusion to process data stream pairs. A data stream pair is received for correlation. The data stream pair is partitioned into portions to meet correlation constraints for correlating data in the data stream pair to form a partitioned data stream pair. The partitioned data stream pair is sent to a set of nodes for correlation processing to perform the load diffusion.
    Type: Grant
    Filed: March 24, 2008
    Date of Patent: June 15, 2010
    Assignee: International Business Machines Corporation
    Inventors: Xiaohui Gu, Philip S. Yu
  • Patent number: 7720841
    Abstract: Disclosed are a method, information processing system, and computer readable medium for managing data collection in a distributed processing system. The method includes dynamically collecting at least one statistical query pattern associated with a selected group of information processing nodes. The statistical query pattern is dynamically collected from a plurality of information processing nodes in a distributed processing system. At least one operating attribute distribution associated with an operating attribute that has been queried for the selected group is dynamically monitored. The selected group is dynamically configured, based on the query pattern and the operating attribute distribution, to periodically push a set of attributes associated with the each information processing node in the selected group.
    Type: Grant
    Filed: October 4, 2006
    Date of Patent: May 18, 2010
    Assignee: International Business Machines Corporation
    Inventors: Xiaohui Gu, Philip S. Yu, Shu-Ping Chang
  • Patent number: 7702620
    Abstract: Arrangements and methods for providing for the efficient implementation of ranked keyword searches on graph-structured data. Since it is difficult to directly build indexes for general schemaless graphs, conventional techniques highly rely on graph traversal in running time. The previous lack of more knowledge about graphs also resulted in great difficulties in applying pruning techniques. To address these problems, there is introduced herein a new scoring function while the block is used as an intermediate access level; the result is an opportunity to create sophisticated indexes for keyword search. Also proposed herein is a cost-balanced expansion algorithm to conduct a backward search, which provides a good theoretical guarantee in terms of the search cost.
    Type: Grant
    Filed: March 29, 2007
    Date of Patent: April 20, 2010
    Assignee: International Business Machines Corporation
    Inventors: Hao He, Philip S. Yu, Haixun Wang
  • Publication number: 20100063974
    Abstract: A computer-implemented method, system, and a computer readable article of manufacture identify local patterns in at least one time series data stream. A data stream is received that comprises at least one set of time series data. The at least one set of time series data is formed into a set of multiple ordered levels of time series data. Multiple ordered levels of hierarchical approximation functions are generated directly from the multiple ordered levels of time series data. A set of approximating functions are created for each level. A current window with a current window length is selected from a set of varying window lengths. The set of approximating functions created at one level in the multiple ordered levels is passed to a subsequent level as a set of time series data. The multiple ordered levels of hierarchical approximation functions are stored into memory after being generated.
    Type: Application
    Filed: August 31, 2009
    Publication date: March 11, 2010
    Applicant: International Business Machines
    Inventors: SPYRIDON PAPADIMITRIOU, Philip S. Yu
  • Patent number: 7676458
    Abstract: A method of querying a hierarchically organized sensor network, said network being sensor network with a global coordinator node at a top level which receives data from lower level intermediate nodes which are either leader nodes for lower level nodes or sensor nodes, wherein a sensor node i at a lowest level receives a signal Y(i,t) at time t, said method including constructing a sketch Swkt=(Swkt1, . . . , Swktn) for an internal node k from S wkt j = ? i ? LeafDescendents ? ( k ) ? ? q = 1 i ? b wiq ยท r iq j , wherein component Swktj is a sketch of a descendent of node k, ritj is a random variable associated with each sensor node i and time instant t wherein index j refers to independently drawn instantiations of the random variable, bit bwit represents a state of sensor node i for signal value w=Y(i,t) at time t, and LeafDescendents(k) are the lowest level sensor nodes under node k, wherein said sketch is adapted for responding to queries regarding a state of said network.
    Type: Grant
    Filed: August 28, 2007
    Date of Patent: March 9, 2010
    Assignee: International Business Machines Corporation
    Inventors: Charu Chandra Aggarwal, Philip S. Yu
  • Publication number: 20100057399
    Abstract: Arrangements and methods for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.
    Type: Application
    Filed: August 31, 2009
    Publication date: March 4, 2010
    Applicant: International Business Machines Corporation
    Inventors: Vittorio Castelli, Michail Vlachos, Philip S. Yu