Patents by Inventor Shubham Tagra

Shubham Tagra has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Techniques for source-side metadata enrichment

Patent number: 12361023

Abstract: Methods, systems, and devices for data management are described. A data enrichment service supported by a data management system (DMS) may receive, from a first application in a destination computing environment of the DMS, a set of enrichment definitions for metadata synchronization between the first application and a second application in a source computing environment of the DMS. A change data capture (CDC) service supported by the DMS may generate a set of data records that correspond to metadata changes associated with the second application. The data enrichment service may transform the set of data records by using data enrichment components to modify the set of data records according to the set of enrichment definitions provided by the first application. The data enrichment components may be dynamically partitioned into groups that execute in parallel. The second application may push the enriched data records to the first application in real-time.

Type: Grant

Filed: June 2, 2023

Date of Patent: July 15, 2025

Assignee: Rubrik, Inc.

Inventors: Dhawal Upadhyay, Shubham Tagra, Akhilesh Krishnan, Vijay Karthik, Akshay Agrawal
FULL SNAPSHOT SELECTION FOR REVERSE OPERATIONS

Publication number: 20250225035

Abstract: Methods, systems, and devices for data management are described. In some systems, a data management system (DMS) may obtain a full snapshot of a data block and incremental snapshots that include data associated with changes to partitions of the data block since the full snapshot. The full snapshot and the incremental snapshots may be stored as a snapshot chain. A most recently obtained incremental snapshot in the chain may be marked for deletion. Accordingly, the DMS may select, from the snapshot chain, an incremental snapshot to convert to a new full snapshot as part of a reverse operation. The incremental snapshot may be a next most recent incremental snapshot in the snapshot chain that is not marked for deletion. The DMS may perform the reverse operation to reverse an order of the snapshot chain and convert the incremental snapshot to the new full snapshot.

Type: Application

Filed: January 5, 2024

Publication date: July 10, 2025

Inventors: Harmandeep Singh, Shubham Tagra
Batch consolidation of computing object snapshots

Patent number: 12353353

Abstract: Methods, systems, and devices for data management are described. A backup cluster may store incremental or base snapshots of computing objects. When a snapshot expires, a data management system (DMS) that manages the backup cluster may in some cases merge or consolidate the expired snapshot with a non-expired snapshot to create a new merged snapshot. In some cases, however, consolidation may be deferred until a chain of multiple expired snapshots satisfies one or more heuristic thresholds, to conserve resources. Example heuristic thresholds may be a length of the chain, an amount of space reclaimable by consolidating the snapshots, an age of the expired snapshots in the chain of incremental snapshots, or an amount of free space on the backup cluster.

Type: Grant

Filed: October 10, 2023

Date of Patent: July 8, 2025

Assignee: Rubrik, Inc.

Inventors: Sayantan Jana, Vaiapuri Ramasubramaniam, Shubham Tagra
INLINE SNAPSHOT DEDUPLICATION

Publication number: 20250181260

Abstract: A data management system (DMS) may select, prior to obtaining a first snapshot of a first virtual machine (VM) and from among one or more snapshots previously obtained by the DMS, a second snapshot to use for deduplication of the first snapshot. The DMS may obtain the first snapshot after selecting the second snapshot. Obtaining the first snapshot may include writing a first subset of data blocks from the first VM to a snapshot file for the first snapshot based on the first subset of the data blocks from the first VM being different from a first corresponding subset of the second snapshot and refraining from writing a second subset of the data blocks from the first VM to the snapshot file for the first snapshot based on the second subset of the data blocks from the first VM matching a second corresponding subset of the second snapshot.

Type: Application

Filed: January 29, 2025

Publication date: June 5, 2025

Inventor: Shubham Tagra
BATCH CONSOLIDATION OF COMPUTING OBJECT SNAPSHOTS

Publication number: 20250117361

Abstract: Methods, systems, and devices for data management are described. A backup cluster may store incremental or base snapshots of computing objects. When a snapshot expires, a data management system (DMS) that manages the backup cluster may in some cases merge or consolidate the expired snapshot with a non-expired snapshot to create a new merged snapshot. In some cases, however, consolidation may be deferred until a chain of multiple expired snapshots satisfies one or more heuristic thresholds, to conserve resources. Example heuristic thresholds may be a length of the chain, an amount of space reclaimable by consolidating the snapshots, an age of the expired snapshots in the chain of incremental snapshots, or an amount of free space on the backup cluster.

Type: Application

Filed: October 10, 2023

Publication date: April 10, 2025

Inventors: Sayantan Jana, Vaiapuri Ramasubramaniam, Shubham Tagra
Inline snapshot deduplication

Patent number: 12271613

Abstract: A data management system (DMS) may select, prior to obtaining a first snapshot of a first virtual machine (VM) and from among one or more snapshots previously obtained by the DMS, a second snapshot to use for deduplication of the first snapshot. The DMS may obtain the first snapshot after selecting the second snapshot. Obtaining the first snapshot may include writing a first subset of data blocks from the first VM to a snapshot file for the first snapshot based on the first subset of the data blocks from the first VM being different from a first corresponding subset of the second snapshot and refraining from writing a second subset of the data blocks from the first VM to the snapshot file for the first snapshot based on the second subset of the data blocks from the first VM matching a second corresponding subset of the second snapshot.

Type: Grant

Filed: November 1, 2022

Date of Patent: April 8, 2025

Assignee: Rubrik, Inc.

Inventor: Shubham Tagra
TECHNIQUES FOR ASYNCHRONOUSLY PUSHING METADATA IN BULK

Publication number: 20250086201

Abstract: Methods, systems, and devices for data management are described. A first application in a destination computing environment of a data management system (DMS) may determine that a bulk-push criterion is satisfied for a second application in a source computing environment of the DMS. The first application may transmit, to an asynchronous metadata service, a request indicating the second application for which the bulk-push criterion is satisfied. The request may be configured to cause the asynchronous metadata service to query a database in the source computing environment, identify a latest version of one or more rows that include metadata associated with the second application, and generate data records indicating the latest version of the one or more rows that include the metadata associated with the second application. The first application may receive the data records via an asynchronous data stream between the first application and the second application.

Type: Application

Filed: November 25, 2024

Publication date: March 13, 2025

Inventors: Dhawal Upadhyay, Shubham Shubham Tagra, Akhilesh Krishnan, Vijay Vijay Karthik, Akshay Agrawal
Techniques for asynchronously pushing metadata in bulk

Patent number: 12182165

Abstract: Methods, systems, and devices for data management are described. A first application in a destination computing environment of a data management system (DMS) may determine that a bulk-push criterion is satisfied for a second application in a source computing environment of the DMS. The first application may transmit, to an asynchronous metadata service, a request indicating the second application for which the bulk-push criterion is satisfied. The request may be configured to cause the asynchronous metadata service to query a database in the source computing environment, identify a latest version of one or more rows that include metadata associated with the second application, and generate data records indicating the latest version of the one or more rows that include the metadata associated with the second application. The first application may receive the data records via an asynchronous data stream between the first application and the second application.

Type: Grant

Filed: June 2, 2023

Date of Patent: December 31, 2024

Assignee: Rubrik, Inc.

Inventors: Dhawal Upadhyay, Shubham Tagra, Akhilesh Krishnan, Vijay Karthik, Akshay Agrawal
TECHNIQUES FOR SOURCE-SIDE METADATA ENRICHMENT

Publication number: 20240403324

Abstract: Methods, systems, and devices for data management are described. A data enrichment service supported by a data management system (DMS) may receive, from a first application in a destination computing environment of the DMS, a set of enrichment definitions for metadata synchronization between the first application and a second application in a source computing environment of the DMS. A change data capture (CDC) service supported by the DMS may generate a set of data records that correspond to metadata changes associated with the second application. The data enrichment service may transform the set of data records by using data enrichment components to modify the set of data records according to the set of enrichment definitions provided by the first application. The data enrichment components may be dynamically partitioned into groups that execute in parallel. The second application may push the enriched data records to the first application in real-time.

Type: Application

Filed: June 2, 2023

Publication date: December 5, 2024

Inventors: Dhawal Upadhyay, Shubham Tagra, Akhilesh Krishnan, Vijay Karthik, Akshay Agrawal
TECHNIQUES FOR ASYNCHRONOUSLY PUSHING METADATA IN BULK

Publication number: 20240403321

Abstract: Methods, systems, and devices for data management are described. A first application in a destination computing environment of a data management system (DMS) may determine that a bulk-push criterion is satisfied for a second application in a source computing environment of the DMS. The first application may transmit, to an asynchronous metadata service, a request indicating the second application for which the bulk-push criterion is satisfied. The request may be configured to cause the asynchronous metadata service to query a database in the source computing environment, identify a latest version of one or more rows that include metadata associated with the second application, and generate data records indicating the latest version of the one or more rows that include the metadata associated with the second application. The first application may receive the data records via an asynchronous data stream between the first application and the second application.

Type: Application

Filed: June 2, 2023

Publication date: December 5, 2024

Inventors: Dhawal Upadhyay, Shubham Tagra, Akhilesh Krishnan, Vijay Karthik, Akshay Agrawal
TECHNIQUES FOR REAL-TIME SYNCHRONIZATION OF METADATA

Publication number: 20240338382

Abstract: Methods, systems, and devices for data management are described. A destination data storage environment of a data management system may transmit, to a source data storage environment configured to run one or more applications, a request to synchronize metadata for the one or more applications from the source data storage environment to the destination data storage environment. In some examples, the request may include configuration information indicating one or more filtering parameters for filtering a data stream to identify a subset of a set of data records and start and stop times for pushing data to the destination data storage environment. The destination data storage environment may receive, from the source data storage environment, the subset of the set of data records based on the configuration information, where the subset of the set of data records are determined from a filtering operation at the source data storage environment.

Type: Application

Filed: April 6, 2023

Publication date: October 10, 2024

Inventors: Dhawal Upadhyay, Shubham Tagra, Akhilesh Krishnan, Vijay Karthik, Akshay Agrawal
INLINE SNAPSHOT DEDUPLICATION

Publication number: 20240143212

Abstract: A data management system (DMS) may select, prior to obtaining a first snapshot of a first virtual machine (VM) and from among one or more snapshots previously obtained by the DMS, a second snapshot to use for deduplication of the first snapshot. The DMS may obtain the first snapshot after selecting the second snapshot. Obtaining the first snapshot may include writing a first subset of data blocks from the first VM to a snapshot file for the first snapshot based on the first subset of the data blocks from the first VM being different from a first corresponding subset of the second snapshot and refraining from writing a second subset of the data blocks from the first VM to the snapshot file for the first snapshot based on the second subset of the data blocks from the first VM matching a second corresponding subset of the second snapshot.

Type: Application

Filed: November 1, 2022

Publication date: May 2, 2024

Inventor: Shubham Tagra
Systems and methods for determining peak memory requirements in SQL processing engines with concurrent subtasks

Patent number: 11704316

Abstract: The present invention is generally directed to systems and methods of determining and provisioning peak memory requirements in Structured Query Language Processing engines. More specifically, methods may include determining or obtaining a query execution plan; gathering statistics associated with each database table; breaking the query execution plan into one or more subtasks: calculating an estimated memory usage for each subtask using the statistics; determining or obtaining a dependency graph of the one or more subtasks; based at least in part on the dependency graph, determining which subtasks can execute concurrently on a single worker node; and totaling the amount of estimated memory for each subtask that can execute concurrently on a single worker node and setting this amount of estimated memory as the estimated peak memory requirement for the specefic database query.

Type: Grant

Filed: July 24, 2019

Date of Patent: July 18, 2023

Assignee: Qubole, Inc.

Inventors: Ankit Dixit, Shubham Tagra
System and method for scheduling and running interactive database queries with service level agreements in a multi-tenant processing system

Patent number: 11144360

Abstract: The invention is directed to systems and methods for scheduling interactive database queries from multiple tenants onto distributed query processing clusters with service level agreements (SLAs). SLAs may be provided through a combination of estimation of resources per query followed by scheduling of that query onto a cluster if enough resources are available or triggering proactive autoscaling to spawn new clusters if they are not. In some embodiments systems may include a workflow manager; a resource estimator cluster; one or more execution clusters; and one or more metastores. A workflow manager may include an active node and a passive node configured to send a query to the resource estimator cluster and receive a resource estimate. A resource estimator cluster may be in communication with the workflow manager. One or more execution clusters may be scaled by the workflow manager as part of a schedule or autoscale based on workload.

Type: Grant

Filed: July 25, 2019

Date of Patent: October 12, 2021

Assignee: QUBOLE, INC.

Inventors: Vijay Mann, Ankit Dixit, Shubham Tagra, Raunaq Morarka, Rajat Venkatesh, Ting Yao
Caching framework for big-data engines in the cloud

Patent number: 11080207

Abstract: The present invention is generally directed to a caching framework that provides a common abstraction across one or more big data engines, comprising a cache filesystem including a cache filesystem interface used by applications to access cloud storage through a cache subsystem, the cache filesystem interface in communication with a big data engine extension and a cache manager; the big data engine extension, providing cluster information to the cache filesystem and working with the cache filesystem interface to determine which nodes cache which part of a file; and a cache manager for maintaining metadata about the cache, the metadata comprising the status of blocks for each file. The invention may provide common abstraction across big data engines that does not require changes to the setup of infrastructure or user workloads, allows sharing of cached data and caching only the parts of files that are required, can process columnar format.

Type: Grant

Filed: June 7, 2017

Date of Patent: August 3, 2021

Assignee: Qubole, Inc.

Inventors: Joydeep Sen Sarma, Rajat Venkatesh, Shubham Tagra
System and Method for Scheduling and Running Interactive Database Queries with Service Level Agreements in a Multi-Tenant Processing System

Publication number: 20200379806

Abstract: The invention is directed to systems and methods for scheduling interactive database queries from multiple tenants onto distributed query processing clusters with service level agreements (SLAs). SLAs may be provided through a combination of estimation of resources per query followed by scheduling of that query onto a cluster if enough resources are available or triggering proactive autoscaling to spawn new clusters if they are not. In some embodiments systems may include a workflow manager; a resource estimator cluster; one or more execution clusters; and one or more metastores. A workflow manager may include an active node and a passive node configured to send a query to the resource estimator cluster and receive a resource estimate. A resource estimator cluster may be in communication with the workflow manager. One or more execution clusters may be scaled by the workflow manager as part of a schedule or autoscale based on workload.

Type: Application

Filed: July 25, 2019

Publication date: December 3, 2020

Inventors: Vijay Mann, Ankit Dixit, Shubham Tagra, Raunaq Morarka, Rajat Venkatesh, Ting Yao
Systems and Methods for Determining Peak Memory Requirements in SQL Processing Engines with Concurrent Subtasks

Publication number: 20200379998

Abstract: The present invention is generally directed to systems and methods of determining and provisioning peak memory requirements in Structured Query Language Processing engines. More specifically, methods may include determining or obtaining a query execution plan; gathering statistics associated with each database table; breaking the query execution plan into one or more subtasks: calculating an estimated memory usage for each subtask using the statistics; determining or obtaining a dependency graph of the one or more subtasks; based at least in part on the dependency graph, determining which subtasks can execute concurrently on a single worker node; and totaling the amount of estimated memory for each subtask that can execute concurrently on a single worker node and setting this amount of estimated memory as the estimated peak memory requirement for the specefic database query.

Type: Application

Filed: July 24, 2019

Publication date: December 3, 2020

Inventors: Ankit Dixit, Shubham Tagra
Detecting high availability readiness of a distributed computing system

Patent number: 10055268

Abstract: Technology is disclosed for determining high availability readiness of a distributed computing system (“system”). A confidence measure (CM) can be computed for a particular controller in the system to determine whether a takeover by the particular controller from a first controller would be successful. The CM can be a percentage value. A CM of 0% indicates that a takeover would be a failure, which results in loss of access to data managed by the first controller. A CM of 100% indicates a successful takeover with no performance impact on the system. A CM between 0% and 100% indicates a successful takeover but with a performance impact. The CM can be computed based on events occurring in the system, e.g., veto and non-veto events. The CM is computed as a function of various weights and/or indices associated with the veto events and/or non-veto events.

Type: Grant

Filed: September 27, 2016

Date of Patent: August 21, 2018

Assignee: NetApp, Inc.

Inventors: Senthil Kumar Veluswamy, Sathiya Kumaran Mani, Shubham Tagra
Caching Framework for Big-Data Engines in the Cloud

Publication number: 20170351620

Abstract: The present invention is generally directed to a caching framework that provides a common abstraction across one or more big data engines, comprising a cache filesystem including a cache filesystem interface used by applications to access cloud storage through a cache subsystem, the cache filesystem interface in communication with a big data engine extension and a cache manager; the big data engine extension, providing cluster information to the cache filesystem and working with the cache filesystem interface to determine which nodes cache which part of a file; and a cache manager for maintaining metadata about the cache, the metadata comprising the status of blocks for each file. The invention may provide common abstraction across big data engines that does not require changes to the setup of infrastructure or user workloads, allows sharing of cached data and caching only the parts of files that are required, can process columnar format.

Type: Application

Filed: June 7, 2017

Publication date: December 7, 2017

Inventors: Joydeep Sen Sarma, Rajat Venkatesh, Shubham Tagra
DETECTING HIGH AVAILABILITY READINESS OF A DISTRIBUTED COMPUTING SYSTEM

Publication number: 20170017535

Abstract: Technology is disclosed for determining high availability readiness of a distributed computing system (“system”). A confidence measure (CM) can be computed for a particular controller in the system to determine whether a takeover by the particular controller from a first controller would be successful. The CM can be a percentage value. A CM of 0% indicates that a takeover would be a failure, which results in loss of access to data managed by the first controller. A CM of 100% indicates a successful takeover with no performance impact on the system. A CM between 0% and 100% indicates a successful takeover but with a performance impact. The CM can be computed based on events occurring in the system, e.g., veto and non-veto events. The CM is computed as a function of various weights and/or indices associated with the veto events and/or non-veto events.

Type: Application

Filed: September 27, 2016

Publication date: January 19, 2017

Inventors: Senthil Kumar Veluswamy, Sathiya Kumaran Mani, Shubham Tagra

1 2 next