Patents by Inventor Samuel Rash

Samuel Rash has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Techniques for consistent writes in a split message store

Patent number: 10645040

Abstract: Techniques for consistent writes in a split message store are described. In one embodiment, an apparatus may comprise a client front-end component of a messaging system operative to receive a message, the message comprising message metadata and a message body; and store the message in a message queue; and the message queue operative to initiate a storing of the message metadata in a metadata store; delay a storing of the message body in a message store until a metadata storage success indication is received from the metadata store; receive the metadata storage success indication from the metadata store; and store the message body in the message store in response to receiving the metadata storage success indication from the metadata store. Other embodiments are described and claimed.

Type: Grant

Filed: December 29, 2017

Date of Patent: May 5, 2020

Assignee: FACEBOOK, INC.

Inventors: Rajesh Nishtala, Jason Curtis Jenks, Zardosht Kasheff, Samuel Rash
Multi-level data staging for low latency data access

Patent number: 10581957

Abstract: Techniques for facilitating and accelerating log data processing are disclosed herein. The front-end clusters generate a large amount of log data in real time and transfer the log data to an aggregating cluster. When the aggregating cluster is not available, the front-clusters write the log data to local filers and send the data when the aggregating cluster recovers. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further sends the aggregated log data stream to centralized NFS filers or a data warehouse cluster. The local filers and the aggregating cluster stage the log data for access by applications, so that the applications do not wait until the data reach the centralized NFS filers or data warehouse cluster.

Type: Grant

Filed: February 10, 2017

Date of Patent: March 3, 2020

Assignee: Facebook, Inc.

Inventors: Samuel Rash, Dhruba Borthakur, Zheng Shao, Guanghao Shen
TECHNIQUES FOR CONSISTENT WRITES IN A SPLIT MESSAGE STORE

Publication number: 20190207882

Abstract: Techniques for consistent writes in a split message store are described. In one embodiment, an apparatus may comprise a client front-end component of a messaging system operative to receive a message, the message comprising message metadata and a message body; and store the message in a message queue; and the message queue operative to initiate a storing of the message metadata in a metadata store; delay a storing of the message body in a message store until a metadata storage success indication is received from the metadata store; receive the metadata storage success indication from the metadata store; and store the message body in the message store in response to receiving the metadata storage success indication from the metadata store. Other embodiments are described and claimed.

Type: Application

Filed: December 29, 2017

Publication date: July 4, 2019

Inventors: Rajesh Nishtala, Jason Curtis Jenks, Zardosht Kasheff, Samuel Rash
Data stream splitting for low-latency data access

Patent number: 10223431

Abstract: Techniques for facilitating and accelerating log data processing by splitting data streams are disclosed herein. The front-end clusters generate large amount of log data in real time and transfer the log data to an aggregating cluster. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further splits the log data into a plurality of data streams so that the data streams are sent to a receiving application in parallel. In one embodiment, the log data are randomly split to ensure the log data are evenly distributed in the split data streams. In another embodiment, the application that receives the split data streams determines how to split the log data.

Type: Grant

Filed: January 31, 2013

Date of Patent: March 5, 2019

Assignee: Facebook, Inc.

Inventors: Samuel Rash, Dhruba Borthakur, Zheng Shao, Eric Hwang
Query prediction

Patent number: 9734205

Abstract: Disclosed here are methods, systems, paradigms and structures for predicting queries, creating tables to store data for the predicted queries, and selecting a particular table to obtain the data from in response to a query. The methods include determining various combinations of a finite set of columns users may query on, based on (i) a list of columns users are interested in obtaining data for, and (ii) cardinality information of a column or combinations of columns in the list of columns. The methods further includes creating various tables based on the determined combinations of the columns using a meta query language. A query is responded to by selecting a table that has least number of rows, among the tables that satisfy query parameters. The methods include selecting a table that has a longest sequence of columns matching with a portion of the query parameters.

Type: Grant

Filed: April 18, 2013

Date of Patent: August 15, 2017

Assignee: Facebook, Inc.

Inventors: Samuel Rash, Timothy Williamson, Martin Traverso
MULTI-LEVEL DATA STAGING FOR LOW LATENCY DATA ACCESS

Publication number: 20170155707

Abstract: Techniques for facilitating and accelerating log data processing are disclosed herein. The front-end clusters generate a large amount of log data in real time and transfer the log data to an aggregating cluster. When the aggregating cluster is not available, the front-clusters write the log data to local filers and send the data when the aggregating cluster recovers. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further sends the aggregated log data stream to centralized NFS filers or a data warehouse cluster. The local filers and the aggregating cluster stage the log data for access by applications, so that the applications do not wait until the data reach the centralized NFS filers or data warehouse cluster.

Type: Application

Filed: February 10, 2017

Publication date: June 1, 2017

Inventors: Samuel Rash, Dhruba Borthakur, Zheng Shao, Guanghao Shen
Multi-level data staging for low latency data access

Patent number: 9609050

Abstract: Techniques for facilitating and accelerating log data processing are disclosed herein. The front-end clusters generate a large amount of log data in real time and transfer the log data to an aggregating cluster. When the aggregating cluster is not available, the front-clusters write the log data to local filers and send the data when the aggregating cluster recovers. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further sends the aggregated log data stream to centralized NFS filers or a data warehouse cluster. The local filers and the aggregating cluster stage the log data for access by applications, so that the applications do not wait until the data reach the centralized NFS filers or data warehouse cluster.

Type: Grant

Filed: January 31, 2013

Date of Patent: March 28, 2017

Assignee: Facebook, Inc.

Inventors: Samuel Rash, Dhruba Borthakur, Zheng Shao, Guanghao Shen
Intelligent caching

Patent number: 9507718

Abstract: Disclosed are methods, systems, paradigms and structures for managing cache memory in computer systems. Certain caching techniques anticipate queries and caches the data that may be required by the anticipated queries. The queries are predicted based on previously executed queries. The features of the previously executed queries are extracted and correlated to identify a usage pattern of the features. The prediction model predicts queries based on the identified usage pattern of the features. The disclosed method includes purging data from the cache based on predefined eviction policies that are influenced by the predicted queries. The disclosed method supports caching time series data. The disclosed system includes a storage unit that stores previously executed queries and features of the queries.

Type: Grant

Filed: April 16, 2013

Date of Patent: November 29, 2016

Assignee: Facebook, Inc.

Inventors: Samuel Rash, Timothy Williamson
Use of incremental checkpoints to restore user data stream processes

Patent number: 9471436

Abstract: A method and system on failure recovery in a storage system are disclosed. In the storage system, user data streams (e.g., log data) are collected by a scribeh system. The scribeh system may include a plurality of Calligraphus servers, HDFS and Zookeeper. The Calligraphus servers may shard the user data streams based on keys (e.g., category and bucket pairs) and stream the user data streams to Puma nodes. Sharded user data streams may be aggregated according to the keys in memory of a specific Puma node. Periodically, aggregated user data streams cached in memory of the specific Puma node, together with a Incremental checkpoint, are persisted to HBase. When a specific process on the specific Puma node fails, Ptail retrieves the Incremental checkpoint from HBase and then restores the specific process by requesting user data streams processed by the specific process from the scribeh system according to the Incremental checkpoint.

Type: Grant

Filed: April 23, 2013

Date of Patent: October 18, 2016

Inventors: Samuel Rash, Dhrubajyoti Borthakur, Prakash Khemani, Zheng Shao
Caching sliding window data

Patent number: 9141723

Abstract: Disclosed are methods, systems, paradigms and structures for caching data associated with a sliding window in computer systems. A sliding window can include a time window that progresses with time, and the data can include time series data. As time progresses, the sliding window changes bringing in new data. The cache is updated with new data as and when the sliding window moves. The sliding window data is cached at various granularity levels. The method includes storing a first portion of the data at a first granularity level and a second portion at a second granularity level. The data is cached at various granularity levels in order to effectively use the cache considering at least cache updating criteria such as (i) number of times a storage unit is queried to retrieve the data for updating the cache, (ii) the day/date/time at which the storage unit is queried.

Type: Grant

Filed: March 14, 2013

Date of Patent: September 22, 2015

Assignee: Facebook, Inc.

Inventors: Samuel Rash, Timothy Williamson, Martin Traverso
INCREMENTAL CHECKPOINTS

Publication number: 20140317448

Abstract: A method and system on failure recovery in a storage system are disclosed. In the storage system, user data streams (e.g., log data) are collected by a scribeh system. The scribeh system may include a plurality of Calligraphus servers, HDFS and Zookeeper. The Calligraphus servers may shard the user data streams based on keys (e.g., category and bucket pairs) and stream the user data streams to Puma nodes. Sharded user data streams may be aggregated according to the keys in memory of a specific Puma node. Periodically, aggregated user data streams cached in memory of the specific Puma node, together with a Incremental checkpoint, are persisted to HBase. When a specific process on the specific Puma node fails, Ptail retrieves the Incremental checkpoint from HBase and then restores the specific process by requesting user data streams processed by the specific process from the scribeh system according to the Incremental checkpoint.

Type: Application

Filed: April 23, 2013

Publication date: October 23, 2014

Applicant: Facebook, Inc.

Inventors: Samuel Rash, Dhrubajyoti Borthakur, Prakash Khemani, Zheng Shao
QUERY PREDICTION

Publication number: 20140317140

Abstract: Disclosed here are methods, systems, paradigms and structures for predicting queries, creating tables to store data for the predicted queries, and selecting a particular table to obtain the data from in response to a query. The methods include determining various combinations of a finite set of columns users may query on, based on (i) a list of columns users are interested in obtaining data for, and (ii) cardinality information of a column or combinations of columns in the list of columns. The methods further includes creating various tables based on the determined combinations of the columns using a meta query language. A query is responded to by selecting a table that has least number of rows, among the tables that satisfy query parameters. The methods include selecting a table that has a longest sequence of columns matching with a portion of the query parameters.

Type: Application

Filed: April 18, 2013

Publication date: October 23, 2014

Inventors: SAMUEL RASH, TIMOTHY WILLIAMSON, MARTIN TRAVERSO
INTELLIGENT CACHING

Publication number: 20140310470

Abstract: Disclosed are methods, systems, paradigms and structures for managing cache memory in computer systems. Certain caching techniques anticipate queries and caches the data that may be required by the anticipated queries. The queries are predicted based on previously executed queries. The features of the previously executed queries are extracted and correlated to identify a usage pattern of the features. The prediction model predicts queries based on the identified usage pattern of the features. The disclosed method includes purging data from the cache based on predefined eviction policies that are influenced by the predicted queries. The disclosed method supports caching time series data. The disclosed system includes a storage unit that stores previously executed queries and features of the queries.

Type: Application

Filed: April 16, 2013

Publication date: October 16, 2014

Inventors: Samuel Rash, Timothy Williamson
CACHING SLIDING WINDOW DATA

Publication number: 20140280126

Abstract: Disclosed are methods, systems, paradigms and structures for caching data associated with a sliding window in computer systems. A sliding window can include a time window that progresses with time, and the data can include time series data. As time progresses, the sliding window changes bringing in new data. The cache is updated with new data as and when the sliding window moves. The sliding window data is cached at various granularity levels. The method includes storing a first portion of the data at a first granularity level and a second portion at a second granularity level. The data is cached at various granularity levels in order to effectively use the cache considering at least cache updating criteria such as (i) number of times a storage unit is queried to retrieve the data for updating the cache, (ii) the day/date/time at which the storage unit is queried.

Type: Application

Filed: March 14, 2013

Publication date: September 18, 2014

Applicant: Facebook, Inc.

Inventors: Samuel Rash, Timothy Williamson, Martin Traverso
DATA STREAM SPLITTING FOR LOW-LATENCY DATA ACCESS

Publication number: 20140214752

Abstract: Techniques for facilitating and accelerating log data processing by splitting data streams are disclosed herein. The front-end clusters generate large amount of log data in real time and transfer the log data to an aggregating cluster. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further splits the log data into a plurality of data streams so that the data streams are sent to a receiving application in parallel. In one embodiment, the log data are randomly split to ensure the log data are evenly distributed in the split data streams. In another embodiment, the application that receives the split data streams determines how to split the log data.

Type: Application

Filed: January 31, 2013

Publication date: July 31, 2014

Inventors: Samuel Rash, Dhrubajyoti Borthakur, Zheng Shao, Eric Hwang
MULTI-LEVEL DATA STAGING FOR LOW LATENCY DATA ACCESS

Publication number: 20140215007

Abstract: Techniques for facilitating and accelerating log data processing are disclosed herein. The front-end clusters generate a large amount of log data in real time and transfer the log data to an aggregating cluster. When the aggregating cluster is not available, the front-clusters write the log data to local filers and send the data when the aggregating cluster recovers. The aggregating cluster is designed to aggregate incoming log data streams from different front-end servers and clusters. The aggregating cluster further sends the aggregated log data stream to centralized NFS filers or a data warehouse cluster. The local filers and the aggregating cluster stage the log data for access by applications, so that the applications do not wait until the data reach the centralized NFS filers or data warehouse cluster.

Type: Application

Filed: January 31, 2013

Publication date: July 31, 2014

Inventors: Samuel Rash, Dhruba Borthakur, Zheng Shao, Guanghao Shen