Patents by Inventor Zheng Shao

Zheng Shao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multi-level data staging for low latency data access

Patent number: 10581957

Abstract: Techniques for facilitating and accelerating log data processing are disclosed herein. The front-end clusters generate a large amount of log data in real time and transfer the log data to an aggregating cluster. When the aggregating cluster is not available, the front-clusters write the log data to local filers and send the data when the aggregating cluster recovers. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further sends the aggregated log data stream to centralized NFS filers or a data warehouse cluster. The local filers and the aggregating cluster stage the log data for access by applications, so that the applications do not wait until the data reach the centralized NFS filers or data warehouse cluster.

Type: Grant

Filed: February 10, 2017

Date of Patent: March 3, 2020

Assignee: Facebook, Inc.

Inventors: Samuel Rash, Dhruba Borthakur, Zheng Shao, Guanghao Shen
Data stream splitting for low-latency data access

Patent number: 10223431

Abstract: Techniques for facilitating and accelerating log data processing by splitting data streams are disclosed herein. The front-end clusters generate large amount of log data in real time and transfer the log data to an aggregating cluster. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further splits the log data into a plurality of data streams so that the data streams are sent to a receiving application in parallel. In one embodiment, the log data are randomly split to ensure the log data are evenly distributed in the split data streams. In another embodiment, the application that receives the split data streams determines how to split the log data.

Type: Grant

Filed: January 31, 2013

Date of Patent: March 5, 2019

Assignee: Facebook, Inc.

Inventors: Samuel Rash, Dhruba Borthakur, Zheng Shao, Eric Hwang
MULTI-LEVEL DATA STAGING FOR LOW LATENCY DATA ACCESS

Publication number: 20170155707

Abstract: Techniques for facilitating and accelerating log data processing are disclosed herein. The front-end clusters generate a large amount of log data in real time and transfer the log data to an aggregating cluster. When the aggregating cluster is not available, the front-clusters write the log data to local filers and send the data when the aggregating cluster recovers. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further sends the aggregated log data stream to centralized NFS filers or a data warehouse cluster. The local filers and the aggregating cluster stage the log data for access by applications, so that the applications do not wait until the data reach the centralized NFS filers or data warehouse cluster.

Type: Application

Filed: February 10, 2017

Publication date: June 1, 2017

Inventors: Samuel Rash, Dhruba Borthakur, Zheng Shao, Guanghao Shen
Multi-level data staging for low latency data access

Patent number: 9609050

Abstract: Techniques for facilitating and accelerating log data processing are disclosed herein. The front-end clusters generate a large amount of log data in real time and transfer the log data to an aggregating cluster. When the aggregating cluster is not available, the front-clusters write the log data to local filers and send the data when the aggregating cluster recovers. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further sends the aggregated log data stream to centralized NFS filers or a data warehouse cluster. The local filers and the aggregating cluster stage the log data for access by applications, so that the applications do not wait until the data reach the centralized NFS filers or data warehouse cluster.

Type: Grant

Filed: January 31, 2013

Date of Patent: March 28, 2017

Assignee: Facebook, Inc.

Inventors: Samuel Rash, Dhruba Borthakur, Zheng Shao, Guanghao Shen
Use of incremental checkpoints to restore user data stream processes

Patent number: 9471436

Abstract: A method and system on failure recovery in a storage system are disclosed. In the storage system, user data streams (e.g., log data) are collected by a scribeh system. The scribeh system may include a plurality of Calligraphus servers, HDFS and Zookeeper. The Calligraphus servers may shard the user data streams based on keys (e.g., category and bucket pairs) and stream the user data streams to Puma nodes. Sharded user data streams may be aggregated according to the keys in memory of a specific Puma node. Periodically, aggregated user data streams cached in memory of the specific Puma node, together with a Incremental checkpoint, are persisted to HBase. When a specific process on the specific Puma node fails, Ptail retrieves the Incremental checkpoint from HBase and then restores the specific process by requesting user data streams processed by the specific process from the scribeh system according to the Incremental checkpoint.

Type: Grant

Filed: April 23, 2013

Date of Patent: October 18, 2016

Inventors: Samuel Rash, Dhrubajyoti Borthakur, Prakash Khemani, Zheng Shao
Anticoagulant polypeptide and applications thereof

Patent number: 9243044

Abstract: Disclosed is an anticoagulant polypeptide and applications thereof. The anticoagulant polypeptide comprises a polypeptide formed by an amino acid sequence as represented in Seq. ID No. 1; or comprises a derived polypeptide that selectively inhibits coagulation factor XIa and is formed by an amino acid sequence, as represented in Seq. ID No. 1, that has undergone one or multiple amino acid residue substitutions, deletions, or insertions. The anticoagulant polypeptide is a selective inhibitor for coagulation factor XIa, has anticoagulant activity and small side-effect, and can be used in preparing medicines for the prevention and treatment of thrombotic diseases.

Type: Grant

Filed: April 5, 2012

Date of Patent: January 26, 2016

Assignee: GUANGDONG MEDICAL COLLEGE

Inventors: Lifei Peng, Weiqiong Gan, Zheng Shao, Qingfeng He, Li Deng, Jingjing Hu, Shuli Liao, Jida Peng
ANTICOAGULANT POLYPEPTIDE AND APPLICATIONS THEREOF

Publication number: 20140323404

Abstract: Disclosed is an anticoagulant polypeptide and applications thereof. The anticoagulant polypeptide comprises a polypeptide formed by an amino acid sequence as represented in Seq. ID No. 1; or comprises a derived polypeptide that selectively inhibits coagulation factor XIa and is formed by an amino acid sequence, as represented in Seq. ID No. 1, that has undergone one or multiple amino acid residue substitutions, deletions, or insertions. The anticoagulant polypeptide is a selective inhibitor for coagulation factor XIa, has anticoagulant activity and small side-effect, and can be used in preparing medicines for the prevention and treatment of thrombotic diseases.

Type: Application

Filed: April 5, 2012

Publication date: October 30, 2014

Applicant: GUANGDONG MEDICAL COLLEGE

Inventors: Lifei Peng, Weiqiong Gan, Zheng Shao, Qingfeng He, Li Deng, Jingjing Hu, Shuli Liao, Jida Peng
INCREMENTAL CHECKPOINTS

Publication number: 20140317448

Abstract: A method and system on failure recovery in a storage system are disclosed. In the storage system, user data streams (e.g., log data) are collected by a scribeh system. The scribeh system may include a plurality of Calligraphus servers, HDFS and Zookeeper. The Calligraphus servers may shard the user data streams based on keys (e.g., category and bucket pairs) and stream the user data streams to Puma nodes. Sharded user data streams may be aggregated according to the keys in memory of a specific Puma node. Periodically, aggregated user data streams cached in memory of the specific Puma node, together with a Incremental checkpoint, are persisted to HBase. When a specific process on the specific Puma node fails, Ptail retrieves the Incremental checkpoint from HBase and then restores the specific process by requesting user data streams processed by the specific process from the scribeh system according to the Incremental checkpoint.

Type: Application

Filed: April 23, 2013

Publication date: October 23, 2014

Applicant: Facebook, Inc.

Inventors: Samuel Rash, Dhrubajyoti Borthakur, Prakash Khemani, Zheng Shao
MULTI-LEVEL DATA STAGING FOR LOW LATENCY DATA ACCESS

Publication number: 20140215007

Abstract: Techniques for facilitating and accelerating log data processing are disclosed herein. The front-end clusters generate a large amount of log data in real time and transfer the log data to an aggregating cluster. When the aggregating cluster is not available, the front-clusters write the log data to local filers and send the data when the aggregating cluster recovers. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further sends the aggregated log data stream to centralized NFS filers or a data warehouse cluster. The local filers and the aggregating cluster stage the log data for access by applications, so that the applications do not wait until the data reach the centralized NFS filers or data warehouse cluster.

Type: Application

Filed: January 31, 2013

Publication date: July 31, 2014

Inventors: Samuel Rash, Dhruba Borthakur, Zheng Shao, Guanghao Shen
DATA STREAM SPLITTING FOR LOW-LATENCY DATA ACCESS

Publication number: 20140214752

Abstract: Techniques for facilitating and accelerating log data processing by splitting data streams are disclosed herein. The front-end clusters generate large amount of log data in real time and transfer the log data to an aggregating cluster. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further splits the log data into a plurality of data streams so that the data streams are sent to a receiving application in parallel. In one embodiment, the log data are randomly split to ensure the log data are evenly distributed in the split data streams. In another embodiment, the application that receives the split data streams determines how to split the log data.

Type: Application

Filed: January 31, 2013

Publication date: July 31, 2014

Inventors: Samuel Rash, Dhrubajyoti Borthakur, Zheng Shao, Eric Hwang
Systems and methods of predicting resource usefulness using universal resource locators including counting the number of times URL features occur in training data

Patent number: 7908234

Abstract: A method, system and apparatus are provided to train a usefulness prediction model to generate a usefulness prediction in connection with a given universal resource locator (URL), the training of the usefulness prediction model being based on a training set of URLs and a count of negative URLs and a count of positive URLs identified by the training set, and for each feature extacted from the URLs in the training set, a count of the positive URLs in the training set that include the feature and a count of the negative URLs in the training set that include the feature. One or more features of the given URL are extracted, and the extracted features are used together with the usefulness prediction model to generate a usefulness prediction for the given URL.

Type: Grant

Filed: February 15, 2008

Date of Patent: March 15, 2011

Assignee: Yahoo! Inc.

Inventors: Zheng Shao, Wenjie Fu
SYSTEMS AND METHODS OF PREDICTING RESOURCE USEFULNESS USING UNIVERSAL RESOURCE LOCATORS

Publication number: 20090210369

Abstract: A method, system and apparatus are provided to train a usefulness prediction model to generate a usefulness prediction in connection with a given universal resource locator (URL), the training of the usefulness prediction model being based on a training set of URLs and a count of negative URLs and a count of positive URLs identified by the training set, and for each feature extacted from the URLs in the training set, a count of the positive URLs in the training set that include the feature and a count of the negative URLs in the training set that include the feature. One or more features of the given URL are extracted, and the extracted features are used together with the usefulness prediction model to generate a usefulness prediction for the given URL.

Type: Application

Filed: February 15, 2008

Publication date: August 20, 2009

Inventors: Zheng Shao, Wenjie Fu