Patents by Inventor Huican Zhu

Huican Zhu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Document reuse in a search engine crawler

Patent number: 10216847

Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.

Type: Grant

Filed: June 8, 2017

Date of Patent: February 26, 2019

Assignee: Google LLC

Inventors: Huican Zhu, Anurag Acharya, Max Ibel, Howard B. Gobioff
Anchor tag indexing in a web crawler system

Patent number: 10210256

Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.

Type: Grant

Filed: April 1, 2016

Date of Patent: February 19, 2019

Assignee: GOOGLE LLC

Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya
DOCUMENT REUSE IN A SEARCH ENGINE CRAWLER

Publication number: 20180089317

Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.

Type: Application

Filed: June 8, 2017

Publication date: March 29, 2018

Inventors: Huican ZHU, Anurag ACHARYA, Max IBEL, Howard B. GOBIOFF
Document reuse in a search engine crawler

Patent number: 9679056

Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.

Type: Grant

Filed: April 4, 2014

Date of Patent: June 13, 2017

Assignee: Google Inc.

Inventors: Huican Zhu, Anurag Acharya, Max Ibel, Howard Bradley Gobioff
Document Reuse in a Search Engine Crawler

Publication number: 20170091324

Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.

Type: Application

Filed: April 4, 2014

Publication date: March 30, 2017

Applicant: Google Inc.

Inventors: Huican ZHU, Anurag ACHARYA, Max IBEL, Howard Bradley GOBIOFF
ANCHOR TAG INDEXING IN A WEB CRAWLER SYSTEM

Publication number: 20160321252

Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.

Type: Application

Filed: April 1, 2016

Publication date: November 3, 2016

Inventors: Huican ZHU, Jeffrey DEAN, Sanjay GHEMAWAT, Bwolen Po-Jen YANG, Anurag ACHARYA
Assigning document identification tags

Patent number: 9411889

Abstract: Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.

Type: Grant

Filed: March 13, 2012

Date of Patent: August 9, 2016

Assignee: Google Inc.

Inventors: Huican Zhu, Anurag Acharya
Anchor tag indexing in a web crawler system

Patent number: 9305091

Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.

Type: Grant

Filed: November 18, 2011

Date of Patent: April 5, 2016

Assignee: Google Inc.

Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya
Distributed File System, File Access Method and Client Device

Publication number: 20150169623

Abstract: The provided is a distributed file system, file access method and a client device. The file access method includes: accessing a file catalog stored by a master server, and obtaining routing information of a meta server associated with a to-be-accessed file from the master server; accessing the meta server according to the obtained routing information, and obtaining meta information of the to-be-accessed file from the meta server; and accessing the to-be-accessed file from multiple node servers according to the obtained meta information.

Type: Application

Filed: July 23, 2013

Publication date: June 18, 2015

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Haijun Wu, Huican Zhu, Dafu Deng, Rui Li, Yongqiang Zou, Shengyu Dong, Taifu Que, Lei Wang, Shaopeng Yang, Shuxin Zhang, Dayong Zhao, Chang Liu, Xiaodong Chen, Yinfeng Zhang
Document Reuse in a Search Engine Crawler

Publication number: 20140222776

Abstract: Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.

Type: Application

Filed: April 4, 2014

Publication date: August 7, 2014

Applicant: Google Inc.

Inventors: Huican ZHU, Anurag ACHARYA, Max IBEL, Howard Bradley GOBIOFF
Scheduler for search engine crawler

Patent number: 8707313

Abstract: A search engine crawler includes a distributed set of schedulers that are associated with one or more segments of document identifiers (e.g., URLs) corresponding to documents on a network (e.g., WWW). Each scheduler handles the scheduling of document identifiers (for crawling) for a subset of the known document identifiers. Using a starting set of document identifiers, such as the document identifiers crawled (or scheduled for crawling) during the most recent completed crawl, the scheduler removes from the starting set those document identifiers that have been unreachable in each of the last X crawls. Other filtering mechanisms may also be used to filter out some of the document identifiers in the starting set. The resulting list of document identifiers is written to a scheduled output file for use in a next crawl cycle.

Type: Grant

Filed: February 18, 2011

Date of Patent: April 22, 2014

Assignee: Google Inc.

Inventors: Huican Zhu, Maximilian Ibel, Anurag Acharya, Howard Bradley Gobioff
Document reuse in a search engine crawler

Patent number: 8707312

Abstract: A search engine crawler includes a scheduler for determining which documents to download from their respective host servers. Some documents, known to be stable based on one or more record from prior crawls, are reused from a document repository. A reuse flag is set in a scheduler record that also contains a document identifier, the reuse flag indicating whether the document should be retrieved from a first database, such as the World Wide Web, or a second database, such as a document repository. A set of such scheduler records are used during a crawl by the search engine crawler to determine which database to use when retrieving the documents identified in the scheduler records.

Type: Grant

Filed: June 30, 2004

Date of Patent: April 22, 2014

Assignee: Google Inc.

Inventors: Huican Zhu, Maximilian Ibel, Anurag Acharya, Howard Bradley Gobioff
User input classification

Patent number: 8660834

Abstract: Systems and methods of classifying user input are disclosed. The user input can be, for example, in the form of Roman characters. An ambiguous word (e.g., a word that is a non-pinyin word written in Roman characters and a valid pinyin word) can be identified in the user input. Contextual words (e.g., words adjacent to the ambiguous word) are classified as a pinyin context or a non-pinyin context. The ambiguous word is classified based on the context of the contextual words.

Type: Grant

Filed: November 17, 2008

Date of Patent: February 25, 2014

Assignee: Google Inc.

Inventors: Jun Wu, Huican Zhu, Hongjun Zhu
Anchor tag indexing in a web crawler system

Patent number: 8484548

Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.

Type: Grant

Filed: November 7, 2007

Date of Patent: July 9, 2013

Assignee: Google Inc.

Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya
Assigning Document Identification Tags

Publication number: 20120173552

Abstract: Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.

Type: Application

Filed: March 13, 2012

Publication date: July 5, 2012

Inventors: Huican Zhu, Anurag Acharya
Anchor Tag Indexing in a Web Crawler System

Publication number: 20120066576

Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.

Type: Application

Filed: November 18, 2011

Publication date: March 15, 2012

Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya
Assigning document identification tags

Patent number: 8136025

Abstract: Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.

Type: Grant

Filed: July 3, 2003

Date of Patent: March 13, 2012

Assignee: Google Inc.

Inventors: Huican Zhu, Anurag Acharya
Scheduler for search engine crawler

Patent number: 8042112

Abstract: A search engine crawler includes a distributed set of schedulers that are associated with one or more segments of document identifiers (e.g., URLs) corresponding to documents on a network (e.g., WWW). Each scheduler handles the scheduling of document identifiers (for crawling) for a subset of the known document identifiers. Using a starting set of document identifiers, such as the document identifiers crawled (or scheduled for crawling) during the most recent completed crawl, the scheduler removes from the starting set those document identifiers that have been unreachable in each of the last X crawls. Other filtering mechanisms may also be used to filter out some of the document identifiers in the starting set. The resulting list of document identifiers is written to a scheduled output file for use in a next crawl cycle.

Type: Grant

Filed: June 30, 2004

Date of Patent: October 18, 2011

Assignee: Google Inc.

Inventors: Huican Zhu, Maximilian Ibel, Anurag Acharya, Howard Bradley Gobioff
USER INPUT CLASSIFICATION

Publication number: 20090070097

Abstract: Systems and methods of classifying user input are disclosed. The user input can be, for example, in the form of Roman characters. An ambiguous word (e.g., a word that is a non-pinyin word written in Roman characters and a valid pinyin word) can be identified in the user input. Contextual words (e.g., words adjacent to the ambiguous word) are classified as a pinyin context or a non-pinyin context. The ambiguous word is classified based on the context of the contextual words.

Type: Application

Filed: November 17, 2008

Publication date: March 12, 2009

Applicant: Google Inc.

Inventors: Jun Wu, Huican Zhu, Hongjun Zhu
Anchor tag indexing in a web crawler system

Patent number: 7308643

Abstract: Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.

Type: Grant

Filed: July 3, 2003

Date of Patent: December 11, 2007

Assignee: Google Inc.

Inventors: Huican Zhu, Jeffrey Dean, Sanjay Ghemawat, Bwolen Po-Jen Yang, Anurag Acharya

1 2 next