Patents by Inventor Philip S. Yu
Philip S. Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20090327185Abstract: Arrangements are provided for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.Type: ApplicationFiled: May 5, 2008Publication date: December 31, 2009Applicant: International Business Machines CorporationInventors: Vittorio Castelli, Michail Vlachos, Philip S. Yu
-
Publication number: 20090281971Abstract: Systems and methods for object classification are provided. An object is identified along with the attributes that describe that object. These attributes are grouped into attribute patterns. Classes to be used in the classification are also identified. For each identified class a sketch table containing a plurality of parallel hash tables is created and trained using known objects with known classifications. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table. This results in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern. This produces a discriminatory power for each attribute pattern. Those attribute patterns having a discriminatory power above a given threshold are selected. The selected attribute patterns and associated sketch table values are added.Type: ApplicationFiled: May 9, 2008Publication date: November 12, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Charu C. Aggarwal, Philip S. Yu
-
Patent number: 7610397Abstract: One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output.Type: GrantFiled: February 28, 2005Date of Patent: October 27, 2009Assignee: International Business Machines CorporationInventors: Bugra Gedik, Kun-Lung Wu, Philip S. Yu
-
Publication number: 20090226056Abstract: Systems and methods for embedding metadata such as personal patient information within actual medical data signals obtained from a patient are provided wherein two watermarks, a robust watermark and a fragile watermark are embedded in a given medical data signal. The robust watermark includes a binary coded representation of the metadata that is incorporated into the frequency domain of the medical data signal using discrete Fourier transformations and additive embedding. Error correcting code can also be added to the binary representation of the metadata using Hamming coding. A given robust watermark can be incorporated multiple times in the medical data signal. The fragile watermark is added on top of the modified medical signal containing the robust watermark in the spatial domain of the modified medical signal. The fragile watermark utilizes hash function to generate random sequences that are incorporated through the medical data signal.Type: ApplicationFiled: March 5, 2008Publication date: September 10, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michail Vlachos, Philip S. Yu
-
Publication number: 20090187914Abstract: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.Type: ApplicationFiled: February 17, 2009Publication date: July 23, 2009Applicant: International Business Machines CorporationInventors: Yun Chi, Haixun Wang, Philip S. Yu
-
Patent number: 7565346Abstract: Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including e-Commerce target marketing, bioinformatics (large scale scientific data analysis), and automatic computing (web usage analysis), etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences.Type: GrantFiled: May 31, 2004Date of Patent: July 21, 2009Assignee: International Business Machines CorporationInventors: Wei Fan, Haixun Wang, Philip S. Yu
-
Patent number: 7565369Abstract: A general framework for mining concept-drifting data streams using weighted ensemble classifiers. An ensemble of classification models, such as C4.5, RIPPER, naive Bayesian, etc., is trained from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. An empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.Type: GrantFiled: May 28, 2004Date of Patent: July 21, 2009Assignee: International Business Machines CorporationInventors: Wei Fan, Haixun Wang, Philip S. Yu
-
Patent number: 7562355Abstract: A system and method are provided for optimizing component composition in a distributed stream-processing environment having a plurality of nodes capable of being associated with one or more of a plurality of stream processing components. The system includes an adaptive composition probing (ACP) module and a hierarchical state manager. The ACP module probes a subset of the plurality of stream processing components to determine the optimal component composition in response to a stream processing request. The hierarchical state manager manages local and global information for use by said ACP module in determining the optimal component composition.Type: GrantFiled: March 1, 2005Date of Patent: July 14, 2009Assignee: International Business Machines CorporationInventors: Xiaohui Gu, Philip S. Yu
-
Publication number: 20090119238Abstract: A method is provided for generating a resource function estimate of resource usage by an instance of a processing element configured to consume zero or more input data streams in a stream processing system having a set of available resources that comprises receiving at least one specified performance metric for the zero or more input data streams and a processing power of the set of available resources, wherein one specified performance metric is stream rate; generating a multi-part signature of executable-specific information for the processing element and a multi-part signature of context-specific information for the instance; accessing a database of resource functions to identify a static resource function corresponding to the executable-specific information and a context-dependent resource function corresponding to the context-specific information; combining the static resource function and the context-dependent resource function to form a composite resource function for the instance; and applying the resType: ApplicationFiled: November 5, 2007Publication date: May 7, 2009Applicant: International Business Machines CorporationInventors: Lisa Amini, Henrique Andrade, Wei Fan, James R. Giles, Kirsten W. Hildrum, Deepak Rajan, Deepak S. Turaga, Rohit Wagle, Joel L. Wolf, Philip S. Yu
-
Publication number: 20090074043Abstract: Streaming environments typically dictate incomplete or approximate algorithm execution, in order to cope with sudden surges in the data rate. Such limitations are even more accentuated in mobile environments (such as sensor networks) where computational and memory resources are typically limited. Introduced herein is a novel “resource adaptive” algorithm for spectrum and periodicity estimation on a continuous stream of data. The formulation is based on the derivation of a closed-form incremental computation of the spectrum, augmented by an intelligent load-shedding scheme that can adapt to available CPU resources. Experimentation indicates that the proposed technique can be a viable and resource efficient solution for real-time spectrum estimation.Type: ApplicationFiled: July 22, 2008Publication date: March 19, 2009Applicant: International Business Machines CorporationInventors: Deepak Srinivac Turaga, Michail Vlachos, Philip S. Yu
-
Patent number: 7505876Abstract: In an exemplary embodiment, some of the main aspects of the present invention are the following: (i) Data model: We introduce tensor streams to deal with large collections of multi-aspect streams; and (ii) Algorithmic framework: We propose window-based tensor analysis (WTA) to effectively extract core patterns from tensor streams. The tensor representation is related to data cube in On-Line Analytical Processing (OLAP). However, our present invention focuses on constructing simple summaries for each window, rather than merely organizing the data to produce simple aggregates along each aspect or combination of aspects.Type: GrantFiled: January 7, 2007Date of Patent: March 17, 2009Assignee: International Business Machines CorporationInventors: Spyridon Papadimitriou, Jimeng Sun, Philip S. Yu
-
Publication number: 20090063432Abstract: A method of querying a hierarchically organized sensor network, said network being sensor network with a global coordinator node at a top level which receives data from lower level intermediate nodes which are either leader nodes for lower level nodes or sensor nodes, wherein a sensor node i at a lowest level receives a signal Y(i,t) at time t, said method including constructing a sketch Swkt=(Swkt1, . . . ,Swktn) for an internal node k from S wkt j = ? i ? LeafDescendents ? ( k ) ? ? q = 1 i ? b wiq · r iq j , wherein component Swktj is a sketch of a descendent of node k, ritj is a random variable associated with each sensor node i and time instant t wherein index j refers to independently drawn instantiations of the random variable, bit bwit represents a state of sensor node i for signal value w=Y(i,t) at time t, and LeafDescendents(k) are the lowest level sensor nodes under node k, wherein said sketch is adapted for responding to queries regarding a state of said network.Type: ApplicationFiled: August 28, 2007Publication date: March 5, 2009Inventors: Charu Chandra Aggarwal, Philip S. Yu
-
Publication number: 20090060095Abstract: Uncertain data is classified by constructing an error adjusted probability density estimate for the data, and applying a subspace exploration process to the probability density estimate to classify the data.Type: ApplicationFiled: August 28, 2007Publication date: March 5, 2009Applicant: INTERNATIONAL BUSINESS MACHINE CORPORATIONInventors: Charu Aggarwal, Philip S. Yu
-
Patent number: 7496592Abstract: Towards mining closed frequent itemsets over a sliding window using limited memory space, a synopsis data structure to monitor transactions in the sliding window so that one can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets, but monitoring only frequent itemsets makes it difficult to detect new itemsets when they become frequent. Herein, there is introduced a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of itemsets over a sliding-window. The selected itemsets include a boundary between closed frequent itemsets and the rest of the itemsets Because the boundary is relatively stable, the cost of mining closed frequent itemsets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET.Type: GrantFiled: January 31, 2005Date of Patent: February 24, 2009Assignee: International Business Machines CorporationInventors: Yun Chi, Haixun Wang, Philip S. Yu
-
Publication number: 20090049187Abstract: One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output.Type: ApplicationFiled: June 30, 2008Publication date: February 19, 2009Inventors: BUGRA GEDIK, Kun-Lung Wu, Philip S. Yu
-
Publication number: 20090049069Abstract: Privacy in data mining of sparse high dimensional data records is preserved by transforming the data records into anonymized data records. This transformation involves creating a sketch-based private representation of each data record, each data record containing only a small number of non-zero attribute value in relation to the high dimensionality of the data records.Type: ApplicationFiled: August 9, 2007Publication date: February 19, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Charu Aggarwal, Philip S. Yu
-
Patent number: 7493346Abstract: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.Type: GrantFiled: February 16, 2005Date of Patent: February 17, 2009Assignee: International Business Machines CorporationInventors: Yun Chi, Haixun Wang, Philip S. Yu
-
Patent number: 7492727Abstract: There is provided a method for determining reachability between any two nodes within a graph. The inventive method utilizes a dual-labeling scheme. Initially, a spanning tree is defined for a group of nodes within a graph. Each node in the spanning tree is assigned a unique interval-based label, that describes its dependency from an ancestor node. Non-tree labels are then assigned to each node in the spanning tree that is connected to another node in the spanning tree by a non-tree link. From these labels, reachability of any two nodes in the spanning tree is determined by using only the interval-based labels and the non-tree labels.Type: GrantFiled: March 31, 2006Date of Patent: February 17, 2009Assignee: International Business Machines CorporationInventors: Philip S. Yu, Haixun Wang, Hao He
-
Publication number: 20090043715Abstract: The method trains an inductive model to output multiple models from the inductive model and trains an error correlation model to estimate an average output of predictions made by the multiple models. Then the method can determine an error estimation of each of the multiple models using the error correlation model.Type: ApplicationFiled: April 2, 2008Publication date: February 12, 2009Applicant: International Business Machines CorporationInventors: Wei Fan, Philip S. Yu
-
Patent number: 7487206Abstract: A computer implemented method for performing load diffusion to process data stream pairs. A data stream pair is received for correlation. The data stream pair is partitioned into portions to meet correlation constraints for correlating data in the data stream pair to form a partitioned data stream pair. The partitioned data stream pair is sent to a set of nodes for correlation processing to perform the load diffusion.Type: GrantFiled: July 15, 2005Date of Patent: February 3, 2009Assignee: International Business Machines CorporationInventors: Xiaohui Gu, Philip S. Yu