Patents by Inventor Philips Shi-Lung Yu

Philips Shi-Lung Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7882126
    Abstract: There are provided a method and a system for computation of optimal distance bounds on compressed time-series data. In a method for similarity search, the method includes the step of transforming sequence data into a compressed sequence represented by top-k coefficients of the sequence data and a sum of the energy of omitted coefficients of the sequence data. The method further includes the step of computing at least one of a lower bound and an upper bound on a distance range between a query sequence and the compressed sequence, given a first and a second constraint. The first constraint is that a sum of squares of the omitted coefficients is less than a sum of the energy of the omitted coefficients. The second constraint is that the energy of the omitted coefficients is less than the energy of a lowest energy one of the top-k coefficients.
    Type: Grant
    Filed: February 7, 2008
    Date of Patent: February 1, 2011
    Assignee: International Business Machines Corporation
    Inventors: Michail Vlachos, Philip Shi-Lung Yu
  • Patent number: 7865456
    Abstract: Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data.
    Type: Grant
    Filed: June 6, 2008
    Date of Patent: January 4, 2011
    Assignee: Trend Micro Incorporated
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20100329563
    Abstract: Techniques are disclosed for detecting new events in a video stream that yield improved detection efficiency in real time. For example, a method determines whether a given event is a new event in a video stream. The video stream includes a plurality of events. A first step extracts a first set of features (e.g., text features) from the given event. The first set of features is computationally less expensive to process as compared to a second set of features (e.g., image features) associated with the given event. A second step computes one or more first dissimilarity values between the given event and one or more previous events in the video stream using only the first set of features when one or more first dissimilarity criteria exist. A third step determines whether the given event is a new event based on the one or more computed first dissimilarity values.
    Type: Application
    Filed: November 1, 2007
    Publication date: December 30, 2010
    Inventors: Gang Luo, Rong Yan, Philip Shi-Lung Yu
  • Patent number: 7844634
    Abstract: Techniques for community discovery in a network are disclosed. For example, a technique for discovering a community around a given entity in an interaction graph, wherein nodes in the graph represent entities and edges connecting nodes in the graph represent interactions between connected nodes, comprises the following steps/operations. Nodes in the interaction graph are partitioned into different sets of nodes based on interaction information associated with each node to minimize a number of interaction pairs that need to be considered. An objective function is minimized by moving entities between the different sets such that the community is discovered once a measure associated with the objective function is minimized.
    Type: Grant
    Filed: November 18, 2005
    Date of Patent: November 30, 2010
    Assignee: International Business Machines Corporation
    Inventors: Kirsten Weale Hildrum, Philip Shi-Lung Yu
  • Patent number: 7835953
    Abstract: A method and structure for monitoring continual queries over moving objects, including identifying a query region in a digital format. Each query region is strictly covered by at least one shingle such that each query region is completely covered by the at least one shingle and no section of any of the at least one shingle falls outside the query region.
    Type: Grant
    Filed: September 29, 2003
    Date of Patent: November 16, 2010
    Assignee: International Business Machines Corporation
    Inventors: Shyh-Kwei Chen, Kun-Lung Wu, Philip Shi-lung Yu
  • Publication number: 20100281028
    Abstract: There are provided methods, computer program products, and systems for indexing a data stream. A method for indexing a data stream having attribute values includes the steps of parsing the data stream, and forming an index of tuples for a subset of attribute values of the data stream. The index is configured for retrieving the top-K tuples that optimize linearly weighted sums of at least some of the attribute values in the subset.
    Type: Application
    Filed: April 2, 2008
    Publication date: November 4, 2010
    Inventors: Gang Luo, Kun-Lung Wu., Philip Shi-lung Yu
  • Publication number: 20100268734
    Abstract: Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied.
    Type: Application
    Filed: May 23, 2007
    Publication date: October 21, 2010
    Applicant: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7739284
    Abstract: A technique for processing a data stream includes the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure maybe used for classification of data in the data stream.
    Type: Grant
    Filed: April 20, 2005
    Date of Patent: June 15, 2010
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7730364
    Abstract: A system and method for using continuous failure predictions for proactive failure management in distributed cluster systems includes a sampling subsystem configured to continuously monitor and collect operation states of different system components. An analysis subsystem is configured to build classification models to perform on-line failure predictions. A failure prevention subsystem is configured to take preventive actions on failing components based on failure warnings generated by the analysis subsystem.
    Type: Grant
    Filed: April 5, 2007
    Date of Patent: June 1, 2010
    Assignee: International Business Machines Corporation
    Inventors: Shu-Ping Chang, Xiaohui Gu, Spyridon Papadimitriou, Philip Shi-lung Yu
  • Patent number: 7716154
    Abstract: Methods and apparatus are provided for generating a decision trees using linear discriminant analysis and implementing such a decision tree in the classification (also referred to as categorization) of data. The data is preferably in the form of multidimensional objects, e.g., data records including feature variables and class variables in a decision tree generation mode, and data records including only feature variables in a decision tree traversal mode. Such an inventive approach, for example, creates more effective supervised classification systems. In general, the present invention comprises splitting a decision tree, recursively, such that the greatest amount of separation among the class values of the training data is achieved. This is accomplished by finding effective combinations of variables in order to recursively split the training data and create the decision tree. The decision tree is then used to classify input testing data.
    Type: Grant
    Filed: August 20, 2007
    Date of Patent: May 11, 2010
    Assignee: International Business Machines Coporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20100011123
    Abstract: A technique for delivering content in a client-server system based on a request received at a computing device from a client includes determining a current load on a next-level computing device of a hierarchy. When this current load is such that a response time for delivery of the request from the next-level computing device would increase above a given threshold, a client type associated with the request is checked and, when the client type indicates that the client is below a given priority level, content to be delivered to the client in response to the request is personalized at the receiving computing device. When the current load is such that the response time would not increase above a given threshold, the request is sent from the receiving computing device to the next-level computing device and the content to be delivered is personalized at the next-level computing device.
    Type: Application
    Filed: September 18, 2009
    Publication date: January 14, 2010
    Applicant: International Business Machines Corporation
    Inventors: Paul M. Dantzig, Daniel M. Dias, Arun Kwangil Iyengar, Philip Shi-Lung Yu
  • Publication number: 20090319457
    Abstract: Techniques for classifying structural data with skewed distribution are disclosed. By way of example, a method classifying structural input data comprises a computer system performing the following steps. Multiple classifiers are constructed, wherein each classifier is constructed on a subset of training data, using one or more selected composite features from the subset of training data. A consensus among the multiple classifiers is computed in accordance with a voting scheme such that at least a portion of the structural input data is assigned to a particular class in accordance with the computed consensus. Such techniques for structured data classification are capable of handling skewed class distribution and partial feature coverage issues.
    Type: Application
    Filed: June 18, 2008
    Publication date: December 24, 2009
    Inventors: Hong Cheng, Wei Fan, Xifeng Yan, Philip Shi-lung Yu
  • Publication number: 20090319526
    Abstract: Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application.
    Type: Application
    Filed: May 13, 2008
    Publication date: December 24, 2009
    Applicant: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7631081
    Abstract: Techniques are provided for improved serving of content in a distributed data network. In one aspect of the invention, a technique for delivering content in a client-server system based on a request from a client comprises the following steps/operations. The request is obtained. A performance characteristic of at least one server or at least one cache of the client-server system is determined. Then, a level of data accuracy to be delivered to the client in response to the request is determined. The data accuracy determination is based on: (i) the determined performance characteristic of the at least one server or the at least one cache; and (ii) at least one preference associated with the client. The performance characteristic may comprise a load of the at least one server or the at least one cache. The level of data accuracy may comprise a level of personalization to be delivered to the client in response to the request.
    Type: Grant
    Filed: February 27, 2004
    Date of Patent: December 8, 2009
    Assignee: International Business Machines Corporation
    Inventors: Paul M. Dantzig, Daniel M. Dias, Arun Kwangil Ivengar, Philip Shi-Lung Yu
  • Patent number: 7630950
    Abstract: A system and method for learning models from scarce and/or skewed training data includes partitioning a data stream into a sequence of time windows. A most likely current class distribution to classify portions of the data stream is determined based on observing training data in a current time window and based on concept drift probability patterns using historical information.
    Type: Grant
    Filed: August 18, 2006
    Date of Patent: December 8, 2009
    Assignee: International Business Machines Corporation
    Inventors: Haixun Wang, Jian Yin, Philip Shi-lung Yu
  • Publication number: 20090248749
    Abstract: A computer implemented method, apparatus, and computer usable program code for processing multi-way stream correlations. Stream data are received for correlation. A task is formed for continuously partitioning a multi-way stream correlation workload into smaller workload pieces. Each of the smaller workload pieces may be processed by a single host. The stream data are sent to different hosts for correlation processing.
    Type: Application
    Filed: June 4, 2009
    Publication date: October 1, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Xiaohui Gu, Haixun Wang, Philip Shi-lung Yu
  • Publication number: 20090222472
    Abstract: Techniques are disclosed for aggregation in uncertain data in data processing systems. For example, a method of aggregation in an application that involves an uncertain data set includes the following steps. The uncertain data set along with uncertainty information is obtained. One or more clusters of data points are constructed from the data set. Aggregate statistics of the one or more clusters and uncertainty information are stored. The data set may be data from a data stream. It is realized that the use of even modest uncertainty information during an application such as a data mining process is sufficient to greatly improve the quality of the underlying results.
    Type: Application
    Filed: February 28, 2008
    Publication date: September 3, 2009
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20090222410
    Abstract: Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set.
    Type: Application
    Filed: February 28, 2008
    Publication date: September 3, 2009
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Publication number: 20090204574
    Abstract: There are provided a method and a system for computation of optimal distance bounds on compressed time-series data. In a method for similarity search, the method includes the step of transforming sequence data into a compressed sequence represented by top-k coefficients of the sequence data and a sum of the energy of omitted coefficients of the sequence data. The method further includes the step of computing at least one of a lower bound and an upper bound on a distance range between a query sequence and the compressed sequence, given a first and a second constraint. The first constraint is that a sum of squares of the omitted coefficients is less than a sum of the energy of the omitted coefficients. The second constraint is that the energy of the omitted coefficients is less than the energy of a lowest energy one of the top-k coefficients.
    Type: Application
    Filed: February 7, 2008
    Publication date: August 13, 2009
    Inventors: Michail Vlachos, Philip Shi-Lung Yu
  • Patent number: 7552099
    Abstract: There are provided methods, computer program products, and systems for indexing a data stream. A method for indexing a data stream having attribute values includes the steps of parsing the data stream, and forming an index of tuples for a subset of attribute values of the data stream. The index is configured for retrieving the top-K tuples that optimize linearly weighted sums of at least some of the attribute values in the subset.
    Type: Grant
    Filed: March 10, 2006
    Date of Patent: June 23, 2009
    Assignee: International Business Machines Corporation
    Inventors: Gang Luo, Kun-Lung Wu, Philip Shi-lung Yu