Patents by Inventor Taifeng Wang

Taifeng Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8229968
    Abstract: Embodiments for caching and accessing Directed Acyclic Graph (DAG) data to and from a computing device of a DAG distributed execution engine during the processing of an iterative algorithm. In accordance with one embodiment, a method includes processing a first subgraph of the plurality of subgraphs from the distributed storage system in the computing device. The first subgraph being processed with associated input values in the computing device to generate first output values in an iteration. The method further includes storing a second subgraph in a cache of the device. The second subgraph being a duplicate of the first subgraph. Moreover, the method also includes processing the second subgraph with the first output values to generate second output values if the device is to process the first subgraph in each of one or more subsequent iterations.
    Type: Grant
    Filed: March 26, 2008
    Date of Patent: July 24, 2012
    Assignee: Microsoft Corporation
    Inventors: Taifeng Wang, Tie-Yan Liu, Minghao Liu, Zhi Chen
  • Patent number: 8224825
    Abstract: Systems, methods, and devices for sorting and processing various types of graph data are described herein. Partitioning graph data into master data and associated slave data allows for sorting of the graph data by sorting the master data. In another embodiment, promoting a data bucket having a first data bucket size to a data bucket having a second data bucket size greater than the first data bucket size upon reaching a memory limit allows for the reduction of temporary files output by the data bucket.
    Type: Grant
    Filed: May 31, 2010
    Date of Patent: July 17, 2012
    Assignee: Microsoft Corporation
    Inventors: Taifeng Wang, Tie-Yan Liu
  • Publication number: 20120143844
    Abstract: Some implementations provide techniques for determining which URLs to select for crawling from a pool of URLs. For example, the selection of URLs for crawling may be made based on maintaining a high coverage of the known URLs and/or high discoverability of the World Wide Web. Some implementations provide a multi-level coverage strategy for crawling selection. Further, some implementations provide techniques for discovering unseen URLs.
    Type: Application
    Filed: December 2, 2010
    Publication date: June 7, 2012
    Applicant: Microsoft Corporation
    Inventors: Taifeng Wang, Tie-Yan Liu, Bin Gao
  • Publication number: 20120143792
    Abstract: Some implementations provide techniques for selecting web pages for inclusion in an index. For example, some implementations apply regularization to select a subset of the crawled web pages for indexing based on link relationships between the crawled web pages, features extracted from the crawled web pages, and user behavior information determined for at least some of the crawled web pages. Further, in some implementations, the user behavior information may be used to sort a training set of crawled web pages into a plurality of labeled groups. The labeled groups may be represented in a directed graph that indicates relative priorities for being selected for indexing.
    Type: Application
    Filed: December 2, 2010
    Publication date: June 7, 2012
    Applicant: Microsoft Corporation
    Inventors: Taifeng Wang, Bin Gao, Tie-Yan Liu
  • Publication number: 20110295845
    Abstract: Importance ranking of web pages is performed by defining a graph-based regularization term based on document features, edge features, and a web graph of a plurality of web pages, and deriving a loss term based on human feedback data. The graph-based regularization term and the loss term are combined to obtain a global objective function. The global objective function is optimized to obtain parameters for the document features and edge features and to produce static rank scores for the plurality of web pages. Further, the plurality of web pages is ordered based on the static rank scores.
    Type: Application
    Filed: May 27, 2010
    Publication date: December 1, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Bin Gao, Taifeng Wang, Tie-Yan Liu
  • Publication number: 20110295855
    Abstract: Systems, methods, and devices for sorting and processing various types of graph data are described herein. Partitioning graph data into master data and associated slave data allows for sorting of the graph data by sorting the master data. In another embodiment, promoting a data bucket having a first data bucket size to a data bucket having a second data bucket size greater than the first data bucket size upon reaching a memory limit allows for the reduction of temporary files output by the data bucket.
    Type: Application
    Filed: May 31, 2010
    Publication date: December 1, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Taifeng Wang, Tie-Yan Liu
  • Publication number: 20100281078
    Abstract: A distributed data reorganization system and method for mapping and reducing raw data containing a plurality of data records. Embodiments of the distributed data reorganization system and method operate in a general-purpose parallel execution environment that use an arbitrary communication directed acyclic graph. The vertices of the graph accept multiple data inputs and generate multiple data inputs, and may be of different types. Embodiments of the distributed data reorganization system and method include a plurality of distributed mappers that use a mapping criteria supplied by a developer to map the plurality of data records to data buckets. The mapped data record and data bucket identifications are input for a plurality of distributed reducers. Each distributed reducer groups together data records having the same data bucket identification and then uses a merge logic supplied by the developer to reduce the grouped data records to obtain reorganized data.
    Type: Application
    Filed: April 30, 2009
    Publication date: November 4, 2010
    Applicant: Microsoft Corporation
    Inventors: Taifeng Wang, Tie-Yan Liu
  • Publication number: 20090249004
    Abstract: Embodiments for caching and accessing Directed Acyclic Graph (DAG) data to and from a computing device of a DAG distributed execution engine during the processing of an iterative algorithm. In accordance with one embodiment, a method includes processing a first subgraph of the plurality of subgraphs from the distributed storage system in the computing device. The first subgraph being processed with associated input values in the computing device to generate first output values in an iteration. The method further includes storing a second subgraph in a cache of the device. The second subgraph being a duplicate of the first subgraph. Moreover, the method also includes processing the second subgraph with the first output values to generate second output values if the device is to process the first subgraph in each of one or more subsequent iterations.
    Type: Application
    Filed: March 26, 2008
    Publication date: October 1, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Taifeng Wang, Tie-Yan Liu, Minghao Liu, Zhi Chen