Patents by Inventor Philip Yu

Philip Yu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20050278324
    Abstract: Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including e-Commerce target marketing, bioinformatics (large scale scientific data analysis), and automatic computing (web usage analysis), etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences.
    Type: Application
    Filed: May 31, 2004
    Publication date: December 15, 2005
    Applicant: IBM Corporation
    Inventors: Wei Fan, Haixun Wang, Philip Yu
  • Publication number: 20050246262
    Abstract: Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value metric. The semantic models may define interactions between the participants and define at least a portion of information produced or consumed by the participants. The determination of the values and the direction of the information flow may be performed multiple times in order to modify the one or more value metrics. The direction of information flow may allow participants to be deleted from the network, may allow participants to be added to the network, or may allow behavior of the participants to be modified.
    Type: Application
    Filed: April 29, 2004
    Publication date: November 3, 2005
    Inventors: Charu Aggarwal, Murray Campbell, Yuan-Chi Chang, Matthew Hill, Chung-Sheng Li, Milind Naphade, Sriram Padmanabhan, John Smith, Min Wang, Kun-Lung Wu, Philip Yu
  • Publication number: 20050234877
    Abstract: The present invention is directed to a system and a method for generating a temporally ranked set of search results in response to a query. Each result in the set of search results can be ranked temporally or based on the reputation associated with authors of each result and the reputation associated with the repository where each result is located. Temporal ranking takes into account a present importance weight and a future importance weight are assigned to each result. The present importance of each result uses creation date, publication date, in-link dates and search frequency, and the future importance uses an aging factor based on the elapsed time from publication for each search result and a rate at which each search result decreases in importance. Temporal ranking can be applied as a modification of existing and common search engine algorithms include PageRank and HITS.
    Type: Application
    Filed: April 8, 2004
    Publication date: October 20, 2005
    Inventor: Philip Yu
  • Publication number: 20050210027
    Abstract: Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.
    Type: Application
    Filed: March 16, 2004
    Publication date: September 22, 2005
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20050193110
    Abstract: Techniques are provided for improved serving of content in a distributed data network. In one aspect of the invention, a technique for delivering content in a client-server system based on a request from a client comprises the following steps/operations. The request is obtained. A performance characteristic of at least one server or at least one cache of the client-server system is determined. Then, a level of data accuracy to be delivered to the client in response to the request is determined. The data accuracy determination is based on: (i) the determined performance characteristic of the at least one server or the at least one cache; and (ii) at least one preference associated with the client. The performance characteristic may comprise a load of the at least one server or the at least one cache. The level of data accuracy may comprise a level of personalization to be delivered to the client in response to the request.
    Type: Application
    Filed: February 27, 2004
    Publication date: September 1, 2005
    Applicant: International Business Machines Corporation
    Inventors: Paul Dantzig, Daniel Dias, Arun Ivengar, Philip Yu
  • Publication number: 20050177545
    Abstract: Techniques are provided for representing and managing data and associated relationships. In one aspect of the invention, a technique for managing data associated with a given domain comprises the following steps. A specification of data attributes representing one or more types of data to be managed is maintained. Further, a specification of algorithms representing one or more types of operations performable in accordance with the data attributes is maintained. Still further, a specification of relationships representing relationships between the data attributes and the algorithms is maintained. The data attribute specification, the algorithm specification and the relationship specification are maintained in a storage framework having multiple levels, the multiple levels being specified based on the given domain with which the data being managed is associated. The techniques may be provided in support of service level management.
    Type: Application
    Filed: February 11, 2004
    Publication date: August 11, 2005
    Applicant: International Business Machines Corporation
    Inventors: Melissa Buco, Rong Chang, Laura Luan, Zon-Yin Shae, Christopher Ward, Joel Wolf, Philip Yu
  • Publication number: 20050131873
    Abstract: Disclosed in a method and structure for searching data in databases using an ensemble of models. First the invention performs training. This training orders models within the ensemble in order of prediction accuracy and joins different numbers of models together to form sub-ensembles. The models are joined together in the sub-ensemble in the order of prediction accuracy. Next in the training process, the invention calculates confidence values of each of the sub-ensembles. The confidence is a measure of how closely results form the sub-ensemble will match results from the ensemble. The size of each of the sub-ensembles is variable depending upon the level of confidence, while, to the contrary, the size of the ensemble is fixed. After the training, the invention can make a prediction. First, the invention selects a sub-ensemble that meets a given level of confidence.
    Type: Application
    Filed: December 16, 2003
    Publication date: June 16, 2005
    Inventors: Wei Fan, Haixun Wang, Philip Yu
  • Publication number: 20050125434
    Abstract: A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset into N subsets of data and developing an estimated learning model for the dataset by developing a learning model for a first subset of the N subsets.
    Type: Application
    Filed: December 3, 2003
    Publication date: June 9, 2005
    Applicant: International Business Machines Corporation
    Inventors: Wei Fan, Haixun Wang, Philip Yu
  • Publication number: 20050114331
    Abstract: Similarity searching techniques are provided. In one aspect, a method for use in finding near-neighbors in a set of objects comprises the following steps. Subspace pattern similarities that the objects in the set exhibit in multi-dimensional spaces are identified. Subspace correlations are defined between two or more of the objects in the set based on the identified subspace pattern similarities for use in identifying near-neighbor objects. A pattern distance index may be created. A method of performing a near-neighbor search of one or more query objects against a set of objects is also provided.
    Type: Application
    Filed: November 26, 2003
    Publication date: May 26, 2005
    Applicant: International Business Machines Corporation
    Inventors: Haixun Wang, Philip Yu
  • Publication number: 20050114314
    Abstract: The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure.
    Type: Application
    Filed: November 26, 2003
    Publication date: May 26, 2005
    Inventors: Wei Fan, Haixun Wang, Philip Yu
  • Publication number: 20050114298
    Abstract: The present invention provides an index structure for managing weighted-sequences in large databases. A weighted-sequence is defined as a two-dimensional structure in which each element in the sequence is associated with a weight. A series of network events, for instance, is a weighted-sequence because each event is associated with a timestamp. Querying a large sequence database by events' occurrence patterns is a first step towards understanding the temporal causal relationships among the events. The index structure proposed herein enables the efficient retrieval from the database of all subsequences (contiguous and non-contiguous) that match a given query sequence both by events and by weights. The index structure also takes into consideration the nonuniform frequency distribution of events in the sequence data.
    Type: Application
    Filed: November 26, 2003
    Publication date: May 26, 2005
    Inventors: Wei Fan, Chang-Shing Perng, Haixun Wang, Philip Yu
  • Publication number: 20050108170
    Abstract: A method for distributing and utilizing software is provided. In the method of distribution, a software application is provided on a hardware device by a manufacturer of the software application, wherein the software application is executable on the hardware device. The hardware device is enclosed within a box and distributed. The manufacturer provides continued services for the software application, wherein the hardware device is connectable between at least one end user's computer and the manufacturer. The hardware device is adapted to provide the continued services via a communication link between the hardware device and the manufacturer.
    Type: Application
    Filed: November 17, 2003
    Publication date: May 19, 2005
    Inventors: Brent Hailpern, John Turek, Philip Yu
  • Publication number: 20050096841
    Abstract: The present invention is directed to a system and a method for evaluating a plurality of moving queries over moving objects. The method, which can be embodied in a computer readable medium containing computer readable code, constructs motion-adaptive bounding boxes around the objects and queries and indexes the objects and queries based upon the bounding boxes. Predictive query results are used to optimize the evaluation of the moving queries. The bounding boxes vary in size and shape depending on the speed and motion direction of the objects and queries. The system of the present invention includes the moving objects and queries, each having an associated motion-adaptive bounding box. The system also provides for a monitoring system capable of monitoring the location and motion of the moving objects and moving queries and of evaluating the moving queries. The monitoring system includes a motion-adaptive query index and a motion-adaptive object index.
    Type: Application
    Filed: November 3, 2003
    Publication date: May 5, 2005
    Inventors: Bugra Gedik, Kun-Lung Wu, Philip Yu
  • Publication number: 20050091524
    Abstract: Various embodiments for maintaining security and confidentiality of data and operations within a fraud detection system. Each of these embodiments utilizes a secure architecture in which: (1) access to data is limited to only approved or authorized entities; (2) confidential details in received data can be readily identified and concealed; and (3) confidential details that have become non-confidential can be identified and exposed.
    Type: Application
    Filed: October 22, 2003
    Publication date: April 28, 2005
    Applicant: International Business Machines Corporation
    Inventors: Naoki Abe, Carl Abrams, Chidanand Apte, Bishwaranjan Bhattacharjee, Kenneth Goldman, Matthias Gruetzner, Matthew Hilbert, John Langford, Sriram Padmanabhan, Charles Tresser, Kathleen Troidle, Philip Yu
  • Publication number: 20050071083
    Abstract: A method and structure for monitoring continual queries over moving objects, including identifying a query region in a digital format. Each query region is strictly covered by at least one shingle such that each query region is completely covered by the at least one shingle and no section of any of the at least one shingle falls outside the query region.
    Type: Application
    Filed: September 29, 2003
    Publication date: March 31, 2005
    Applicant: International Business Machines Corporation
    Inventors: Shyh-Kwei Chen, Kun-Lung Wu, Philip Yu
  • Publication number: 20050071322
    Abstract: This invention introduces a new concept called virtual construct intervals (VCI), where each predicate interval is decomposed into one or more of these construct intervals. These VCIs strictly cover the predicate interval. Namely, every attribute value covered by the predicate interval is also covered by at least one of the decomposed VCIs, and vice versa. Each construct interval has a unique ID or interval coordinate and a set of endpoints. A construct interval is considered activated when a predicate interval using it in its decomposition is added to the system. The predicate ID is then inserted into the ID lists associated with the decomposed VCIs. To facilitate fast search, a bitmap vector is used to indicate the activation of VCIs that cover an event value. The challenge is to find an appropriate set of construct intervals to make predicate decomposition simple and, more importantly, to build efficient bitmap indexes.
    Type: Application
    Filed: September 29, 2003
    Publication date: March 31, 2005
    Inventors: Shyh-Kwei Chen, Mark Mei, Kun-Lung Wu, Philip Yu
  • Publication number: 20050055697
    Abstract: The present invention relates to the problem of scheduling work for employees and/or other resources in a help desk or similar environment. The employees have different levels of training and availabilities. The jobs, which occur as a result of dynamically occurring events, consist of multiple tasks ordered by chain precedence. Each job and/or task carries with it a penalty which is a step function of the time taken to complete it, the deadlines and penalties having been negotiated as part of one or more service level agreement contracts. The goal is to minimize the total amount of penalties paid. The invention consists of a pair of heuristic schemes for this difficult scheduling problem, one greedy and one randomized. The greedy scheme is used to provide a quick initial solution, while the greedy and randomized schemes are combined in order to think more deeply about particular problem instances.
    Type: Application
    Filed: September 9, 2003
    Publication date: March 10, 2005
    Applicant: International Business Machines Corporation
    Inventors: Melissa Buco, Rong Chang, Laura Luan, Christopher Ward, Joel Wolf, Philip Yu
  • Publication number: 20050049991
    Abstract: Methods and apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process are provided. First, data statistics are constructed from the at least one input data set. Then, an output data set is generated from the data statistics. The output data set differs from the input data set but maintains one or more correlations from within the input data set. The correlations may be the inherent correlations between different dimensions of a multidimensional input data set. A significant amount of information from the input data set may be hidden so that the privacy level of the data mining process may be increased.
    Type: Application
    Filed: August 14, 2003
    Publication date: March 3, 2005
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20050038769
    Abstract: A technique of clustering data of a data stream is provided. Online statistics are first created from the data stream. Offline processing of the online statistics is then performed when offline processing either required or desired. Online statistics may be created through the reception of data points from the data stream and the formation and updating of data groups. Offline processing may be performed by reclustering groups of data points around sampled data points and reporting the newly formed clusters.
    Type: Application
    Filed: August 14, 2003
    Publication date: February 17, 2005
    Applicant: International Business Machines Corporation
    Inventors: Charu Aggarwal, Philip Yu
  • Publication number: 20050027710
    Abstract: Attribute association discovery techniques that support relational-based data mining are disclosed. In one aspect of the invention, a technique for mining attribute associations in a relational data set comprises the following steps/operations. Multiple items are obtained from the relational data set. Then, attribute associations are discovered using: (i) multi-attribute mining templates formed from at least a portion of the multiple items; and (ii) one or more mining preferences specified by a user. The invention provides a novel architecture for the mining search space so as to exploit the inter-relationships among patterns of different templates. The framework is relational-sensitive and supports interactive and online mining.
    Type: Application
    Filed: July 30, 2003
    Publication date: February 3, 2005
    Applicant: International Business Machines Corporation
    Inventors: Sheng Ma, Chang-shing Perng, Haixun Wang, Philip Yu