Patents by Inventor Pavan Edara

Pavan Edara has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SCALABLE EXACTLY-ONCE DATA PROCESSING USING TRANSACTIONAL STREAMING WRITES

Publication number: 20240143469

Abstract: A method for processing data exactly once using transactional stream writes includes receiving, from a client, a batch of data blocks for storage on memory hardware in communication with the data processing hardware. The batch of data blocks is associated with a corresponding sequence number and represents a number of rows of a table stored on the memory hardware. The method also includes partitioning the batch of data blocks into a plurality of sub-batches of data blocks. For each sub-batch of data blocks, the method further includes assigning the sub-batch of data blocks to a buffered stream; writing, using the assigned buffered stream, the sub-batch of data blocks to the memory hardware; updating a storage log with an intent to commit the sub-batch of data blocks using the assigned buffered stream; and committing the sub-batch of data blocks to the memory hardware.

Type: Application

Filed: December 20, 2023

Publication date: May 2, 2024

Applicant: Google LLC

Inventors: Pavan Edara, Reuven Lax, Ji Yang, Gurpreet Singh Nanda
Scalable exactly-once data processing using transactional streaming writes

Patent number: 11880290

Abstract: A method for processing data exactly once using transactional stream writes includes receiving, from a client, a batch of data blocks for storage on memory hardware in communication with the data processing hardware. The batch of data blocks is associated with a corresponding sequence number and represents a number of rows of a table stored on the memory hardware. The method also includes partitioning the batch of data blocks into a plurality of sub-batches of data blocks. For each sub-batch of data blocks, the method further includes assigning the sub-batch of data blocks to a buffered stream; writing, using the assigned buffered stream, the sub-batch of data blocks to the memory hardware; updating a storage log with an intent to commit the sub-batch of data blocks using the assigned buffered stream; and committing the sub-batch of data blocks to the memory hardware.

Type: Grant

Filed: February 6, 2023

Date of Patent: January 23, 2024

Assignee: Google LLC

Inventors: Pavan Edara, Reuven Lax, Yi Yang, Gurpreet Singh Nanda
Shuffle-less reclustering of clustered tables

Patent number: 11860907

Abstract: A method for shuffle-less reclustering of clustered tables includes receiving a first and second group of clustered data blocks sorted by a clustering key value. A range of clustering key values of one or more the data blocks in the second group overlaps with the range of clustering key values of a data block in the first group. The method also includes generating split points for partitioning the first and second groups of clustered data blocks into a third group. The method also includes partitioning using the split points, the first and second groups into the third group. Each data block in the third group includes a range of clustering key values that do not overlap with any other data block in the third group. Each split point defines an upper limit or lower limit for the range of clustering key values a data block in the third group.

Type: Grant

Filed: August 3, 2022

Date of Patent: January 2, 2024

Assignee: Google LLC

Inventors: Hua Zhang, Pavan Edara, Nhan Nguyen
Moving Window Data Deduplication in Distributed Storage

Publication number: 20230376470

Abstract: The present disclosure describes a service which provides primary in-line deduplication. A streaming application program interface (API) may allow for streaming records into a storage system with high throughput and low latency. As part of this process, the API allows user to add identifiers as a field used for data deduplication. The deduplication service keeps a moving window of the identifiers in memory and does in-line deduplication by quickly determining whether data is a duplicate. Keeping only deduplication keys in memory reduces the cost of running the service. Moreover, the real-time nature of the moving window approach allows for storing deduplication information alongside the data and accessing it immediately on read. In this regard, read after write consistency is supported, and costs are reduced.

Type: Application

Filed: July 26, 2023

Publication date: November 23, 2023

Inventors: Pavlo Padinker, Pavan Edara, Bigang Li
Moving window data deduplication in distributed storage

Patent number: 11762821

Abstract: The present disclosure describes a service which provides primary in-line deduplication. A streaming application program interface (API) may allow for streaming records into a storage system with high throughput and low latency. As part of this process, the API allows user to add identifiers as a field used for data deduplication. The deduplication service keeps a moving window of the identifiers in memory and does in-line deduplication by quickly determining whether data is a duplicate. Keeping only deduplication keys in memory reduces the cost of running the service. Moreover, the real-time nature of the moving window approach allows for storing deduplication information alongside the data and accessing it immediately on read. In this regard, read after write consistency is supported, and costs are reduced.

Type: Grant

Filed: July 29, 2022

Date of Patent: September 19, 2023

Assignee: Google LLC

Inventors: Pavlo Padinker, Pavan Edara, Bigang Li
Metadata management for a transactional storage system

Patent number: 11762868

Abstract: A method for managing metadata for a transactional storage system include receiving a query request at a snapshot timestamp. The query request requests return of at least one data block from a plurality of data blocks. Each data block includes a corresponding write epoch timestamp and a corresponding conversion indicator indicating whether the data block is active or has been converted at a respective conversion timestamp. The method also includes setting a read epoch timestamp equal to the earliest one of the write epoch and determining whether any of the respective conversion timestamps occurring at or before the snapshot timestamp occur after the read epoch timestamp. The method also includes determining the at least one data block requested by the query request by scanning each of the data blocks including corresponding write epoch timestamps occurring at or after the read epoch timestamp.

Type: Grant

Filed: August 19, 2021

Date of Patent: September 19, 2023

Assignee: Google LLC

Inventors: Pavan Edara, Yi Yang
Zero Copy Optimization for SELECT * Queries

Publication number: 20230229657

Abstract: A computer-implemented method includes receiving a query specifying an operation to perform on a first table of a plurality of data blocks stored. Each data block in the first table includes a respective reference count indicating a number of tables referencing the data block. The method also includes determining that the operation specified by the query includes copying the plurality of data blocks in the first table into a second table and, in response, for each data block of the plurality of data blocks in the first table copied into the second table, incrementing, the respective reference count associated with the data block in the first table, appending, by the data processing hardware, into metadata of the second table, a reference of the corresponding data block copied into the second table.

Type: Application

Filed: March 17, 2023

Publication date: July 20, 2023

Applicant: Google LLC

Inventors: Pavan Edara, Jordan Tigani
Synchronous Replication Of High Throughput Streaming Data

Publication number: 20230195331

Abstract: A method for synchronous replication of stream data includes receiving a stream of data blocks for storage at a first storage location associated with a first geographical region and at a second storage location associated with a second geographical region. The method also includes synchronously writing the stream of data blocks to the first storage location and to the second storage location. While synchronously writing the stream of data blocks, the method includes determining an unrecoverable failure at the second storage location. The method also includes determining a failure point in the writing of the stream of data blocks that demarcates data blocks that were successfully written and not successfully written to the second storage location. The method also includes synchronously writing, starting at the failure point, the stream of data blocks to the first storage location and to a third storage location associated with a third geographical region.

Type: Application

Filed: February 9, 2023

Publication date: June 22, 2023

Applicant: Google LLC

Inventors: Pavan Edara, Jonathan Forbes
Scalable Exactly-Once Data Processing Using Transactional Streaming Writes

Publication number: 20230185688

Abstract: A method for processing data exactly once using transactional stream writes includes receiving, from a client, a batch of data blocks for storage on memory hardware in communication with the data processing hardware. The batch of data blocks is associated with a corresponding sequence number and represents a number of rows of a table stored on the memory hardware. The method also includes partitioning the batch of data blocks into a plurality of sub-batches of data blocks. For each sub-batch of data blocks, the method further includes assigning the sub-batch of data blocks to a buffered stream; writing, using the assigned buffered stream, the sub-batch of data blocks to the memory hardware; updating a storage log with an intent to commit the sub-batch of data blocks using the assigned buffered stream; and committing the sub-batch of data blocks to the memory hardware.

Type: Application

Filed: February 6, 2023

Publication date: June 15, 2023

Applicant: Google LLC

Inventors: Pavan Edara, Reuven Lax, Yi Yang, Gurpreet Singh Nanda
Columnar Techniques for Big Metadata Management

Publication number: 20230185816

Abstract: A method for managing big metadata using columnar techniques includes receiving a query request requesting data blocks from a data table that match query parameters. The data table is associated with system tables that each includes metadata for a corresponding data block of the data table. The method includes generating, based on the query request, a system query to return a subset of rows that correspond to the data blocks that match the query parameters. The method further includes generating, based on the query request and the system query, a final query to return a subset of data blocks from the data table corresponding to the subset of rows. The method also includes determining whether any of the data blocks in the subset of data blocks match the query parameters, and returning the matching data blocks when one or more data blocks match the query parameters.

Type: Application

Filed: February 8, 2023

Publication date: June 15, 2023

Applicant: Google LLC

Inventors: Pavan Edara, Mosha Pasumansky
Zero copy optimization for select * queries

Patent number: 11609909

Abstract: A computer-implemented method includes receiving a query specifying an operation to perform on a first table of a plurality of data blocks stored. Each data block in the first table includes a respective reference count indicating a number of tables referencing the data block. The method also includes determining that the operation specified by the query includes copying the plurality of data blocks in the first table into a second table and, in response, for each data block of the plurality of data blocks in the first table copied into the second table, incrementing, the respective reference count associated with the data block in the first table, appending, by the data processing hardware, into metadata of the second table, a reference of the corresponding data block copied into the second table.

Type: Grant

Filed: May 8, 2021

Date of Patent: March 21, 2023

Assignee: Google LLC

Inventors: Pavan Edara, Jordan Tigani
Managing Real Time Data Stream Processing

Publication number: 20230070710

Abstract: A method for managing data processing includes receiving, from a user of a data query system, a data query for data stored in a data store in communication with the data query system. The method also includes receiving a staleness parameter indicating an upper time boundary for the data query. The upper time boundary limits a query response to data within the data store that is older than the upper time boundary. The method further includes determining whether the data stored within the data store satisfies the staleness parameter. When a portion of the data within the data store fails to satisfy the staleness parameter, the method includes generating the query response that excludes the portion of the data that fails to satisfy the staleness parameter.

Type: Application

Filed: November 11, 2022

Publication date: March 9, 2023

Applicant: Google LLC

Inventors: Pavan Edara, Jonathan Forbes, Yang YI
Synchronous replication of high throughput streaming data

Patent number: 11579778

Abstract: A method for synchronous replication of stream data includes receiving a stream of data blocks for storage at a first storage location associated with a first geographical region and at a second storage location associated with a second geographical region. The method also includes synchronously writing the stream of data blocks to the first storage location and to the second storage location. While synchronously writing the stream of data blocks, the method includes determining an unrecoverable failure at the second storage location. The method also includes determining a failure point in the writing of the stream of data blocks that demarcates data blocks that were successfully written and not successfully written to the second storage location. The method also includes synchronously writing, starting at the failure point, the stream of data blocks to the first storage location and to a third storage location associated with a third geographical region.

Type: Grant

Filed: November 13, 2020

Date of Patent: February 14, 2023

Assignee: Google LLC

Inventors: Pavan Edara, Jonathan Forbes
Columnar techniques for big metadata management

Patent number: 11580123

Abstract: A method for managing big metadata using columnar techniques includes receiving a query request requesting data blocks from a data table that match query parameters. The data table is associated with system tables that each includes metadata for a corresponding data block of the data table. The method includes generating, based on the query request, a system query to return a subset of rows that correspond to the data blocks that match the query parameters. The method further includes generating, based on the query request and the system query, a final query to return a subset of data blocks from the data table corresponding to the subset of rows. The method also includes determining whether any of the data blocks in the subset of data blocks match the query parameters, and returning the matching data blocks when one or more data blocks match the query parameters.

Type: Grant

Filed: November 13, 2020

Date of Patent: February 14, 2023

Assignee: Google LLC

Inventors: Pavan Edara, Mosha Pasumansky
Scalable exactly-once data processing using transactional streaming writes

Patent number: 11573876

Abstract: A method for processing data exactly once using transactional stream writes includes receiving, from a client, a batch of data blocks for storage on memory hardware in communication with the data processing hardware. The batch of data blocks is associated with a corresponding sequence number and represents a number of rows of a table stored on the memory hardware. The method also includes partitioning the batch of data blocks into a plurality of sub-batches of data blocks. For each sub-batch of data blocks, the method further includes assigning the sub-batch of data blocks to a buffered stream; writing, using the assigned buffered stream, the sub-batch of data blocks to the memory hardware; updating a storage log with an intent to commit the sub-batch of data blocks using the assigned buffered stream; and committing the sub-batch of data blocks to the memory hardware.

Type: Grant

Filed: October 30, 2020

Date of Patent: February 7, 2023

Assignee: Google LLC

Inventors: Pavan Edara, Reuven Lax, Yi Yang, Gurpreet Singh Nanda
Managing real time data stream processing

Patent number: 11520796

Abstract: A method for managing data processing includes receiving, from a user of a data query system, a data query for data stored in a data store in communication with the data query system. The method also includes receiving a staleness parameter indicating an upper time boundary for the data query. The upper time boundary limits a query response to data within the data store that is older than the upper time boundary. The method further includes determining whether the data stored within the data store satisfies the staleness parameter. When a portion of the data within the data store fails to satisfy the staleness parameter, the method includes generating the query response that excludes the portion of the data that fails to satisfy the staleness parameter.

Type: Grant

Filed: April 14, 2020

Date of Patent: December 6, 2022

Assignee: Google LLC

Inventors: Pavan Edara, Jonathan Forbes, Yang Yi
Shuffle-less Reclustering of Clustered Tables

Publication number: 20220374455

Abstract: A method for shuffle-less reclustering of clustered tables includes receiving a first and second group of clustered data blocks sorted by a clustering key value. A range of clustering key values of one or more the data blocks in the second group overlaps with the range of clustering key values of a data block in the first group. The method also includes generating split points for partitioning the first and second groups of clustered data blocks into a third group. The method also includes partitioning using the split points, the first and second groups into the third group. Each data block in the third group includes a range of clustering key values that do not overlap with any other data block in the third group. Each split point defines an upper limit or lower limit for the range of clustering key values a data block in the third group.

Type: Application

Filed: August 3, 2022

Publication date: November 24, 2022

Applicant: Google LLC

Inventors: Hua Zhang, Pavan Edara, Nhan Nguyen
Moving Window Data Deduplication in Distributed Storage

Publication number: 20220365914

Abstract: The present disclosure describes a service which provides primary in-line deduplication. A streaming application program interface (API) may allow for streaming records into a storage system with high throughput and low latency. As part of this process, the API allows user to add identifiers as a field used for data deduplication. The deduplication service keeps a moving window of the identifiers in memory and does in-line deduplication by quickly determining whether data is a duplicate. Keeping only deduplication keys in memory reduces the cost of running the service. Moreover, the real-time nature of the moving window approach allows for storing deduplication information alongside the data and accessing it immediately on read. In this regard, read after write consistency is supported, and costs are reduced.

Type: Application

Filed: July 29, 2022

Publication date: November 17, 2022

Inventors: Pavlo Padinker, Pavan Edara, Bigang Li
Execution-Time Dynamic Range Partitioning Transformations

Publication number: 20220358142

Abstract: An example method includes receiving a data load request requesting loading and partitioning of an unknown quantity of user data for storage at a data storage system. The user data including a partitioning key; a total data size of the user data; a plurality of rows, each row of the plurality of rows associated with a value defined by the partitioning key; and one or more columns. The method also includes identifying one or more storage constraints for the data storage system. The method further includes, after receiving the user data, determining a plurality of partitioning quantiles defining respective ranges of values of the partitioning key based on the user data and the one or more storage constraints for the data storage system; and range partitioning each row of the user data into files based on the value associated with the row defined by the partitioning key, and the respective ranges of the values of the partitioning key defined by the plurality of partitioning quantiles.

Type: Application

Filed: July 25, 2022

Publication date: November 10, 2022

Applicant: Google LLC

Inventors: Seyed Omid Fatemieh, Mikhail Entin, Adrian Baras, Pavan Edara, Aleksandras Surna
Moving window data deduplication in distributed storage

Patent number: 11442911

Abstract: The present disclosure describes a service which provides primary in-line deduplication. A streaming application program interface (API) may allow for streaming records into a storage system with high throughput and low latency. As part of this process, the API allows user to add identifiers as a field used for data deduplication. The deduplication service keeps a moving window of the identifiers in memory and does in-line deduplication by quickly determining whether data is a duplicate. Keeping only deduplication keys in memory reduces the cost of running the service. Moreover, the real-time nature of the moving window approach allows for storing deduplication information alongside the data and accessing it immediately on read. In this regard, read after write consistency is supported, and costs are reduced.

Type: Grant

Filed: August 31, 2020

Date of Patent: September 13, 2022

Assignee: Google LLC

Inventors: Pavlo Padinker, Pavan Edara, Bigang Li

1 2 next