Patents by Inventor Philip S. Yu

Philip S. Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEMS FOR STRUCTURAL CLUSTERING OF TIME SEQUENCES

Publication number: 20090327185

Abstract: Arrangements are provided for performing structural clustering between different time series. Time series data relating to a plurality of time series is accepted, structural features relating to the time series data are ascertained, and at least one distance between different time series via employing the structural features is determined. The different time series may be partitioned into clusters based on the at least one distance, and/or the k closest matches to a given time series query based on the at least one distance may be returned.

Type: Application

Filed: May 5, 2008

Publication date: December 31, 2009

Applicant: International Business Machines Corporation

Inventors: Vittorio Castelli, Michail Vlachos, Philip S. Yu
SYSTEM AND METHOD FOR CLASSIFYING DATA STREAMS WITH VERY LARGE CARDINALITY

Publication number: 20090281971

Abstract: Systems and methods for object classification are provided. An object is identified along with the attributes that describe that object. These attributes are grouped into attribute patterns. Classes to be used in the classification are also identified. For each identified class a sketch table containing a plurality of parallel hash tables is created and trained using known objects with known classifications. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table. This results in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern. This produces a discriminatory power for each attribute pattern. Those attribute patterns having a discriminatory power above a given threshold are selected. The selected attribute patterns and associated sketch table values are added.

Type: Application

Filed: May 9, 2008

Publication date: November 12, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Charu C. Aggarwal, Philip S. Yu
Method and apparatus for adaptive load shedding

Patent number: 7610397

Abstract: One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output.

Type: Grant

Filed: February 28, 2005

Date of Patent: October 27, 2009

Assignee: International Business Machines Corporation

Inventors: Bugra Gedik, Kun-Lung Wu, Philip S. Yu
Systems and Methods for Metadata Embedding in Streaming Medical Data

Publication number: 20090226056

Abstract: Systems and methods for embedding metadata such as personal patient information within actual medical data signals obtained from a patient are provided wherein two watermarks, a robust watermark and a fragile watermark are embedded in a given medical data signal. The robust watermark includes a binary coded representation of the metadata that is incorporated into the frequency domain of the medical data signal using discrete Fourier transformations and additive embedding. Error correcting code can also be added to the binary representation of the metadata using Hamming coding. A given robust watermark can be incorporated multiple times in the medical data signal. The fragile watermark is added on top of the modified medical signal containing the robust watermark in the spatial domain of the modified medical signal. The fragile watermark utilizes hash function to generate random sequences that are incorporated through the medical data signal.

Type: Application

Filed: March 5, 2008

Publication date: September 10, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michail Vlachos, Philip S. Yu
SYSTEM AND METHOD FOR LOAD SHEDDING IN DATA MINING AND KNOWLEDGE DISCOVERY FROM STREAM DATA

Publication number: 20090187914

Abstract: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.

Type: Application

Filed: February 17, 2009

Publication date: July 23, 2009

Applicant: International Business Machines Corporation

Inventors: Yun Chi, Haixun Wang, Philip S. Yu
System and method for mining time-changing data streams

Patent number: 7565369

Abstract: A general framework for mining concept-drifting data streams using weighted ensemble classifiers. An ensemble of classification models, such as C4.5, RIPPER, naive Bayesian, etc., is trained from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. An empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.

Type: Grant

Filed: May 28, 2004

Date of Patent: July 21, 2009

Assignee: International Business Machines Corporation

Inventors: Wei Fan, Haixun Wang, Philip S. Yu
System and method for sequence-based subspace pattern clustering

Patent number: 7565346

Abstract: Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including e-Commerce target marketing, bioinformatics (large scale scientific data analysis), and automatic computing (web usage analysis), etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences.

Type: Grant

Filed: May 31, 2004

Date of Patent: July 21, 2009

Assignee: International Business Machines Corporation

Inventors: Wei Fan, Haixun Wang, Philip S. Yu
Systems and methods for optimal component composition in a stream processing system

Patent number: 7562355

Abstract: A system and method are provided for optimizing component composition in a distributed stream-processing environment having a plurality of nodes capable of being associated with one or more of a plurality of stream processing components. The system includes an adaptive composition probing (ACP) module and a hierarchical state manager. The ACP module probes a subset of the plurality of stream processing components to determine the optimal component composition in response to a stream processing request. The hierarchical state manager manages local and global information for use by said ACP module in determining the optimal component composition.

Type: Grant

Filed: March 1, 2005

Date of Patent: July 14, 2009

Assignee: International Business Machines Corporation

Inventors: Xiaohui Gu, Philip S. Yu
METHOD AND SYSTEM FOR PREDICTING RESOURCE USAGE OF REUSABLE STREAM PROCESSING ELEMENTS

Publication number: 20090119238

Abstract: A method is provided for generating a resource function estimate of resource usage by an instance of a processing element configured to consume zero or more input data streams in a stream processing system having a set of available resources that comprises receiving at least one specified performance metric for the zero or more input data streams and a processing power of the set of available resources, wherein one specified performance metric is stream rate; generating a multi-part signature of executable-specific information for the processing element and a multi-part signature of context-specific information for the instance; accessing a database of resource functions to identify a static resource function corresponding to the executable-specific information and a context-dependent resource function corresponding to the context-specific information; combining the static resource function and the context-dependent resource function to form a composite resource function for the instance; and applying the res

Type: Application

Filed: November 5, 2007

Publication date: May 7, 2009

Applicant: International Business Machines Corporation

Inventors: Lisa Amini, Henrique Andrade, Wei Fan, James R. Giles, Kirsten W. Hildrum, Deepak Rajan, Deepak S. Turaga, Rohit Wagle, Joel L. Wolf, Philip S. Yu
RESOURCE ADAPTIVE SPECTRUM ESTIMATION OF STREAMING DATA

Publication number: 20090074043

Abstract: Streaming environments typically dictate incomplete or approximate algorithm execution, in order to cope with sudden surges in the data rate. Such limitations are even more accentuated in mobile environments (such as sensor networks) where computational and memory resources are typically limited. Introduced herein is a novel “resource adaptive” algorithm for spectrum and periodicity estimation on a continuous stream of data. The formulation is based on the derivation of a closed-form incremental computation of the spectrum, augmented by an intelligent load-shedding scheme that can adapt to available CPU resources. Experimentation indicates that the proposed technique can be a viable and resource efficient solution for real-time spectrum estimation.

Type: Application

Filed: July 22, 2008

Publication date: March 19, 2009

Applicant: International Business Machines Corporation

Inventors: Deepak Srinivac Turaga, Michail Vlachos, Philip S. Yu
Systems and methods for simultaneous summarization of data cube streams

Patent number: 7505876

Abstract: In an exemplary embodiment, some of the main aspects of the present invention are the following: (i) Data model: We introduce tensor streams to deal with large collections of multi-aspect streams; and (ii) Algorithmic framework: We propose window-based tensor analysis (WTA) to effectively extract core patterns from tensor streams. The tensor representation is related to data cube in On-Line Analytical Processing (OLAP). However, our present invention focuses on constructing simple summaries for each window, rather than merely organizing the data to produce simple aggregates along each aspect or combination of aspects.

Type: Grant

Filed: January 7, 2007

Date of Patent: March 17, 2009

Assignee: International Business Machines Corporation

Inventors: Spyridon Papadimitriou, Jimeng Sun, Philip S. Yu
METHODS, APPARATUSES, AND COMPUTER PROGRAM PRODUCTS FOR CLASSIFYING UNCERTAIN DATA

Publication number: 20090060095

Abstract: Uncertain data is classified by constructing an error adjusted probability density estimate for the data, and applying a subspace exploration process to the probability density estimate to classify the data.

Type: Application

Filed: August 28, 2007

Publication date: March 5, 2009

Applicant: INTERNATIONAL BUSINESS MACHINE CORPORATION

Inventors: Charu Aggarwal, Philip S. Yu
System and Method for Historical Diagnosis of Sensor Networks

Publication number: 20090063432

Abstract: A method of querying a hierarchically organized sensor network, said network being sensor network with a global coordinator node at a top level which receives data from lower level intermediate nodes which are either leader nodes for lower level nodes or sensor nodes, wherein a sensor node i at a lowest level receives a signal Y(i,t) at time t, said method including constructing a sketch Swkt=(Swkt1, . . . ,Swktn) for an internal node k from S wkt j = ? i ? LeafDescendents ? ( k ) ? ? q = 1 i ? b wiq · r iq j , wherein component Swktj is a sketch of a descendent of node k, ritj is a random variable associated with each sensor node i and time instant t wherein index j refers to independently drawn instantiations of the random variable, bit bwit represents a state of sensor node i for signal value w=Y(i,t) at time t, and LeafDescendents(k) are the lowest level sensor nodes under node k, wherein said sketch is adapted for responding to queries regarding a state of said network.

Type: Application

Filed: August 28, 2007

Publication date: March 5, 2009

Inventors: Charu Chandra Aggarwal, Philip S. Yu
Systems and methods for maintaining closed frequent itemsets over a data stream sliding window

Patent number: 7496592

Abstract: Towards mining closed frequent itemsets over a sliding window using limited memory space, a synopsis data structure to monitor transactions in the sliding window so that one can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets, but monitoring only frequent itemsets makes it difficult to detect new itemsets when they become frequent. Herein, there is introduced a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of itemsets over a sliding-window. The selected itemsets include a boundary between closed frequent itemsets and the rest of the itemsets Because the boundary is relatively stable, the cost of mining closed frequent itemsets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET.

Type: Grant

Filed: January 31, 2005

Date of Patent: February 24, 2009

Assignee: International Business Machines Corporation

Inventors: Yun Chi, Haixun Wang, Philip S. Yu
METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR PRESERVING PRIVACY IN DATA MINING

Publication number: 20090049069

Abstract: Privacy in data mining of sparse high dimensional data records is preserved by transforming the data records into anonymized data records. This transformation involves creating a sketch-based private representation of each data record, each data record containing only a small number of non-zero attribute value in relation to the high dimensionality of the data records.

Type: Application

Filed: August 9, 2007

Publication date: February 19, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Charu Aggarwal, Philip S. Yu
METHOD AND APPARATUS FOR ADAPTIVE LOAD SHEDDING

Publication number: 20090049187

Abstract: One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output.

Type: Application

Filed: June 30, 2008

Publication date: February 19, 2009

Inventors: BUGRA GEDIK, Kun-Lung Wu, Philip S. Yu
Space and time efficient XML graph labeling

Patent number: 7492727

Abstract: There is provided a method for determining reachability between any two nodes within a graph. The inventive method utilizes a dual-labeling scheme. Initially, a spanning tree is defined for a group of nodes within a graph. Each node in the spanning tree is assigned a unique interval-based label, that describes its dependency from an ancestor node. Non-tree labels are then assigned to each node in the spanning tree that is connected to another node in the spanning tree by a non-tree link. From these labels, reachability of any two nodes in the spanning tree is determined by using only the interval-based labels and the non-tree labels.

Type: Grant

Filed: March 31, 2006

Date of Patent: February 17, 2009

Assignee: International Business Machines Corporation

Inventors: Philip S. Yu, Haixun Wang, Hao He
System and method for load shedding in data mining and knowledge discovery from stream data

Patent number: 7493346

Abstract: Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.

Type: Grant

Filed: February 16, 2005

Date of Patent: February 17, 2009

Assignee: International Business Machines Corporation

Inventors: Yun Chi, Haixun Wang, Philip S. Yu
Method to Continuously Diagnose and Model Changes of Real-Valued Streaming Variables

Publication number: 20090043715

Abstract: The method trains an inductive model to output multiple models from the inductive model and trains an error correlation model to estimate an average output of predictions made by the multiple models. Then the method can determine an error estimation of each of the multiple models using the error correlation model.

Type: Application

Filed: April 2, 2008

Publication date: February 12, 2009

Applicant: International Business Machines Corporation

Inventors: Wei Fan, Philip S. Yu
Method for providing load diffusion in data stream correlations

Patent number: 7487206

Abstract: A computer implemented method for performing load diffusion to process data stream pairs. A data stream pair is received for correlation. The data stream pair is partitioned into portions to meet correlation constraints for correlating data in the data stream pair to form a partitioned data stream pair. The partitioned data stream pair is sent to a set of nodes for correlation processing to perform the load diffusion.

Type: Grant

Filed: July 15, 2005

Date of Patent: February 3, 2009

Assignee: International Business Machines Corporation

Inventors: Xiaohui Gu, Philip S. Yu

prev 1 2 3 4 5 6 7 next