Patents by Inventor Huican Zhu

Huican Zhu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10216847
    Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.
    Type: Grant
    Filed: June 8, 2017
    Date of Patent: February 26, 2019
    Assignee: Google LLC
    Inventors: Huican Zhu, Anurag Acharya, Max Ibel, Howard B. Gobioff
  • Patent number: 10210256
    Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.
    Type: Grant
    Filed: April 1, 2016
    Date of Patent: February 19, 2019
    Assignee: GOOGLE LLC
    Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya
  • Publication number: 20180089317
    Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.
    Type: Application
    Filed: June 8, 2017
    Publication date: March 29, 2018
    Inventors: Huican ZHU, Anurag ACHARYA, Max IBEL, Howard B. GOBIOFF
  • Patent number: 9679056
    Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.
    Type: Grant
    Filed: April 4, 2014
    Date of Patent: June 13, 2017
    Assignee: Google Inc.
    Inventors: Huican Zhu, Anurag Acharya, Max Ibel, Howard Bradley Gobioff
  • Publication number: 20170091324
    Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.
    Type: Application
    Filed: April 4, 2014
    Publication date: March 30, 2017
    Applicant: Google Inc.
    Inventors: Huican ZHU, Anurag ACHARYA, Max IBEL, Howard Bradley GOBIOFF
  • Publication number: 20160321252
    Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.
    Type: Application
    Filed: April 1, 2016
    Publication date: November 3, 2016
    Inventors: Huican ZHU, Jeffrey DEAN, Sanjay GHEMAWAT, Bwolen Po-Jen YANG, Anurag ACHARYA
  • Patent number: 9411889
    Abstract: Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.
    Type: Grant
    Filed: March 13, 2012
    Date of Patent: August 9, 2016
    Assignee: Google Inc.
    Inventors: Huican Zhu, Anurag Acharya
  • Patent number: 9305091
    Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.
    Type: Grant
    Filed: November 18, 2011
    Date of Patent: April 5, 2016
    Assignee: Google Inc.
    Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya
  • Publication number: 20150169623
    Abstract: The provided is a distributed file system, file access method and a client device. The file access method includes: accessing a file catalog stored by a master server, and obtaining routing information of a meta server associated with a to-be-accessed file from the master server; accessing the meta server according to the obtained routing information, and obtaining meta information of the to-be-accessed file from the meta server; and accessing the to-be-accessed file from multiple node servers according to the obtained meta information.
    Type: Application
    Filed: July 23, 2013
    Publication date: June 18, 2015
    Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Haijun Wu, Huican Zhu, Dafu Deng, Rui Li, Yongqiang Zou, Shengyu Dong, Taifu Que, Lei Wang, Shaopeng Yang, Shuxin Zhang, Dayong Zhao, Chang Liu, Xiaodong Chen, Yinfeng Zhang
  • Publication number: 20140222776
    Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.
    Type: Application
    Filed: April 4, 2014
    Publication date: August 7, 2014
    Applicant: Google Inc.
    Inventors: Huican ZHU, Anurag ACHARYA, Max IBEL, Howard Bradley GOBIOFF
  • Patent number: 8707313
    Abstract: A search engine crawler includes a distributed set of schedulers that are associated with one or more segments of document identifiers (e.g., URLs) corresponding to documents on a network (e.g., WWW). Each scheduler handles the scheduling of document identifiers (for crawling) for a subset of the known document identifiers. Using a starting set of document identifiers, such as the document identifiers crawled (or scheduled for crawling) during the most recent completed crawl, the scheduler removes from the starting set those document identifiers that have been unreachable in each of the last X crawls. Other filtering mechanisms may also be used to filter out some of the document identifiers in the starting set. The resulting list of document identifiers is written to a scheduled output file for use in a next crawl cycle.
    Type: Grant
    Filed: February 18, 2011
    Date of Patent: April 22, 2014
    Assignee: Google Inc.
    Inventors: Huican Zhu, Maximilian Ibel, Anurag Acharya, Howard Bradley Gobioff
  • Patent number: 8707312
    Abstract: A search engine crawler includes a scheduler for determining which documents to download from their respective host servers. Some documents, known to be stable based on one or more record from prior crawls, are reused from a document repository. A reuse flag is set in a scheduler record that also contains a document identifier, the reuse flag indicating whether the document should be retrieved from a first database, such as the World Wide Web, or a second database, such as a document repository. A set of such scheduler records are used during a crawl by the search engine crawler to determine which database to use when retrieving the documents identified in the scheduler records.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: April 22, 2014
    Assignee: Google Inc.
    Inventors: Huican Zhu, Maximilian Ibel, Anurag Acharya, Howard Bradley Gobioff
  • Patent number: 8660834
    Abstract: Systems and methods of classifying user input are disclosed. The user input can be, for example, in the form of Roman characters. An ambiguous word (e.g., a word that is a non-pinyin word written in Roman characters and a valid pinyin word) can be identified in the user input. Contextual words (e.g., words adjacent to the ambiguous word) are classified as a pinyin context or a non-pinyin context. The ambiguous word is classified based on the context of the contextual words.
    Type: Grant
    Filed: November 17, 2008
    Date of Patent: February 25, 2014
    Assignee: Google Inc.
    Inventors: Jun Wu, Huican Zhu, Hongjun Zhu
  • Patent number: 8484548
    Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.
    Type: Grant
    Filed: November 7, 2007
    Date of Patent: July 9, 2013
    Assignee: Google Inc.
    Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya
  • Publication number: 20120173552
    Abstract: Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.
    Type: Application
    Filed: March 13, 2012
    Publication date: July 5, 2012
    Inventors: Huican Zhu, Anurag Acharya
  • Publication number: 20120066576
    Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.
    Type: Application
    Filed: November 18, 2011
    Publication date: March 15, 2012
    Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya
  • Patent number: 8136025
    Abstract: Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.
    Type: Grant
    Filed: July 3, 2003
    Date of Patent: March 13, 2012
    Assignee: Google Inc.
    Inventors: Huican Zhu, Anurag Acharya
  • Patent number: 8042112
    Abstract: A search engine crawler includes a distributed set of schedulers that are associated with one or more segments of document identifiers (e.g., URLs) corresponding to documents on a network (e.g., WWW). Each scheduler handles the scheduling of document identifiers (for crawling) for a subset of the known document identifiers. Using a starting set of document identifiers, such as the document identifiers crawled (or scheduled for crawling) during the most recent completed crawl, the scheduler removes from the starting set those document identifiers that have been unreachable in each of the last X crawls. Other filtering mechanisms may also be used to filter out some of the document identifiers in the starting set. The resulting list of document identifiers is written to a scheduled output file for use in a next crawl cycle.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: October 18, 2011
    Assignee: Google Inc.
    Inventors: Huican Zhu, Maximilian Ibel, Anurag Acharya, Howard Bradley Gobioff
  • Publication number: 20090070097
    Abstract: Systems and methods of classifying user input are disclosed. The user input can be, for example, in the form of Roman characters. An ambiguous word (e.g., a word that is a non-pinyin word written in Roman characters and a valid pinyin word) can be identified in the user input. Contextual words (e.g., words adjacent to the ambiguous word) are classified as a pinyin context or a non-pinyin context. The ambiguous word is classified based on the context of the contextual words.
    Type: Application
    Filed: November 17, 2008
    Publication date: March 12, 2009
    Applicant: Google Inc.
    Inventors: Jun Wu, Huican Zhu, Hongjun Zhu
  • Patent number: 7308643
    Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.
    Type: Grant
    Filed: July 3, 2003
    Date of Patent: December 11, 2007
    Assignee: Google Inc.
    Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya