Abstract: Normal virtual machine operation is observed to automatically determine patterns of resource utilization. Backup activities are then scheduled, taking into account these utilization patterns. For example, if a normally scheduled backup would occur during a busy period, it may be rescheduled to a less busy period. As another example, backups made by made opportunistically during less busy periods even if not required by the normal backup schedule, in order to alleviate backup demands during more busy periods.
Abstract: A system for validating a recovery plan for machines in a compute infrastructure is provided. In some examples, a system includes processors and a memory storing instructions that, when executed by at least one processor among the processors, cause the system to perform certain operations. The operations may include collecting statistics on network connections between machines in the compute infrastructure, based on the collected statistics, determining dependencies between the machines in the compute infrastructure, and identifying inconsistencies between the dependencies and an order of recovery for the machines specified in an existing recovery plan for the machines.
Abstract: Techniques are described for determining whether a backup chain is within a recoverable range. The recoverable range defines a time interval in which data from a database system is recoverable to a point in time within the time interval. The backup chain is preserved while the backup chain is within the recoverable range. Upon determining that the backup chain is not within the recoverable range, operations are described for checking a retention policy for the backup chain, determining whether to preserve or expire the backup chain based on the retention policy, and preserving the backup chain based on that determination.
Abstract: Example embodiments relate generally to systems and methods for continuous data protection (CDP) and more specifically to an input and output (I/O) filtering framework and log management system to seek a near-zero recovery point objective (RPO).
Type:
Grant
Filed:
April 30, 2019
Date of Patent:
March 21, 2023
Assignee:
Rubrik, Inc.
Inventors:
Benjamin Travis Meadowcroft, Li Ding, Shaomin Chen, Hardik Vohra, Arijit Banerjee, Abhay Mitra, Kushaagra Goyal, Arnav Gautum Mishra, Samir Rishi Chaudhry, Suman Swaroop, Kunal Sean Munshani, Mudit Malpani
Abstract: In one aspect, a computer system automatically identifies style issues in a source code base. A reference set for a known style issue includes source code examples that exhibit the style issue. The source code examples in the reference set are compared to the source code base, for example using string convolution. Based on the comparison, locations in the source code base that are likely to exhibit the style issue are identified. Various steps in the processing may be implemented using machine learning models, clustering or other automated data science techniques.
Abstract: Methods and systems for managing, storing, and serving data within a virtualized environment are described. In some embodiments, a data management system may manage the extraction and storage of virtual machine snapshots, provide near instantaneous restoration of a virtual machine or one or more files located on the virtual machine, and enable secondary workloads to directly use the data management system as a primary storage target to read or modify past versions of data. The data management system may allow a virtual machine snapshot of a virtual machine stored within the system to be directly mounted to enable substantially instantaneous virtual machine recovery of the virtual machine.
Abstract: A lightweight deduplication system can perform resource efficient data deduplication using an extent index and a content index. The extent index can store full fingerprints of data segments to be deduplicated and the content index can store shortened versions of the full fingerprints. The system can alternate between the extent and content indexes, and cache portions of the indices to perform lightweight data deduplication. Further, the system can be configured with an efficient heuristic approach for selecting content index data lookups for chains of volumes for deduplication, such as a long chain of snapshots.
Type:
Grant
Filed:
April 29, 2020
Date of Patent:
May 3, 2022
Assignee:
RUBRIK, INC.
Inventors:
Anshul Gupta, Abdullah Reza, Guilherme Vale Ferreira Menezes
Abstract: A computer-implemented method at a data management system comprises: generating, with one or more processors, a containerized runtime in a memory in communication with the one or more processors; instantiating, with the one or more processors, an app in the runtime; receiving, with the one or more processors, a request from the app for data; retrieving, with the one or more processors, a copy of the requested data from a data source; and transmitting, with the one or more processors, the data to the containerized runtime for the app to operate on.
Type:
Grant
Filed:
March 6, 2020
Date of Patent:
January 16, 2024
Assignee:
Rubrik, Inc.
Inventors:
Abhay Mitra, Vijay Karthik, Vivek Sanjay Jain, Avishek Ganguli, Arohi Kumar, Kushaagra Goyal, Christopher Wong
Abstract: Methods and systems for managing, storing, and serving data within a virtualized environment are described. In some embodiments, a data management system may manage the extraction and storage of virtual machine snapshots, provide near instantaneous restoration of a virtual machine or one or more files located on the virtual machine, and enable secondary workloads to directly use the data management system as a primary storage target to read or modify past versions of data. The data management system may allow a virtual machine snapshot of a virtual machine stored within the system to be directly mounted to enable substantially instantaneous virtual machine recovery of the virtual machine.
Abstract: Embodiments disclosed herein provide systems, methods, and computer readable media for sub-cluster recovery in a data storage environment having a plurality of storage nodes. In a particular embodiment, the method provides scanning data items in the plurality of nodes. While scanning, the method further provides indexing the data items into an index of a plurality of partition groups. Each partition group includes data items owned by a particular one of the plurality of storage nodes. The method then provides storing the index.
Type:
Grant
Filed:
March 14, 2022
Date of Patent:
November 21, 2023
Assignee:
Rubrik, Inc.
Inventors:
Rohit Shekhar, Hyo Jun Kim, Prasenjit Sarkar, Maohua Lu, Ajaykrishna Raghavan, Pin Zhou
Abstract: A method of restoring version data stored across two or more cloud environments is provided. An example method includes accessing, in a second cloud environment, first metadata describing a first data version, the first data version including first data items and first metadata, wherein at least the first data items are stored in a first cloud environment and the first metadata is stored in a third cloud environment. In response to an instruction received in the second cloud environment, the first data items are restored to the second cloud environment using the first metadata.
Abstract: Embodiments disclosed herein provide systems, methods, and computer readable media for sub-cluster recovery in a data storage environment having a plurality of storage nodes. In a particular embodiment, the method provides scanning data items in the plurality of nodes. While scanning, the method further provides indexing the data items into an index of a plurality of partition groups. Each partition group includes data items owned by a particular one of the plurality of storage nodes. The method then provides storing the index.
Type:
Grant
Filed:
October 22, 2020
Date of Patent:
April 5, 2022
Assignee:
Rubrik, Inc.
Inventors:
Rohit Shekhar, Hyo Jun Kim, Prasenjit Sarkar, Maohua Lu, Ajaykrishna Raghavan, Pin Zhou
Abstract: A computer-implemented method at a data management system comprises: receiving, at a storage appliance from a server hosting a virtual machine, a write made to the virtual machine; computing, at the storage appliance, a fingerprint of the transmitted write; comparing, at the storage appliance, the computed fingerprint to malware fingerprints in a malware catalog; repeating the computing and comparing; and disabling the virtual machine if a number of matches from the comparing breaches a predetermined threshold over a predetermined amount of time.
Type:
Grant
Filed:
January 28, 2020
Date of Patent:
March 14, 2023
Assignee:
Rubrik, Inc.
Inventors:
Abhay Mitra, Vijay Karthik, Vivek Sanjay Jain, Avishek Ganguli, Arohi Kumar, Kushaagra Goyal, Christopher Wong
Abstract: A database can be backed up and recovered by a cluster mapped to the database. Nodes of the cluster are mapped over channels to directories of the database. Scripts are generated from one or more templates that specify the order and values to be executed to perform a database job, such as database backup or recovery. To initiate a given database job, a template can be executed that generates and populates scripts, which are processed on the host of the database to perform the database job in a nearly instant manner using the mapped nodes of the cluster.
Abstract: In one aspect, a computer system automatically identifies style issues in a source code base. A reference set for a known style issue includes source code examples that exhibit the style issue. The source code examples in the reference set are compared to the source code base, for example using string convolution. Based on the comparison, locations in the source code base that are likely to exhibit the style issue are identified. Various steps in the processing may be implemented using machine learning models, clustering or other automated data science techniques.
Abstract: Embodiments disclosed herein provide systems, methods, and computer readable media for sub-cluster recovery in a data storage environment having a plurality of storage nodes. In a particular embodiment, the method provides scanning data items in the plurality of nodes. While scanning, the method further provides indexing the data items into an index of a plurality of partition groups. Each partition group includes data items owned by a particular one of the plurality of storage nodes. The method then provides storing the index.
Abstract: Example embodiments relate generally to systems and methods for continuous data protection (CDP) and more specifically to an input and output (I/O) filtering framework and log management system to seek a near-zero recovery point objective (RPO).
Type:
Grant
Filed:
April 30, 2019
Date of Patent:
November 15, 2022
Assignee:
Rubrik, Inc.
Inventors:
Benjamin Travis Meadowcroft, Li Ding, Shaomin Chen, Hardik Vohra, Arijit Banerjee, Abhay Mitra, Kushaagra Goyal, Arnav Gautum Mishra, Samir Rishi Chaudhry, Suman Swaroop, Kunal Sean Munshani, Mudit Malpani
Abstract: A system for reducing VM stunting during backup of a set of virtual machines is provided. In some examples, a system comprises processors and a memory storing instructions that, when executed by at least one processor among the processors, cause the system to perform certain operations. Example operations may include running an analytic process to learn resource utilization patterns of a hypervisor system monitoring the set of virtual machines, determining an opportunistic window of reduced resource utilization based on the resource utilization patterns, and scheduling backup for the set of virtual machines during the opportunistic window.
Abstract: In some examples, a method comprises: receiving a request to read data within a specified range from a backup file storing at least one base snapshot and at least one incremental snapshot; looking up the specified range in range filters from the backup file, the range filters corresponding to snapshots stored in the backup file and each range filter comprising bits indicating whether data exists at respective ranges within the snapshot corresponding to the respective range filter; and in response to the looking up, reading the requested data from the looked-up range in the backup file.
Abstract: According to various embodiments, with respect to a target set of files being managed (e.g., protected by data snapshots), each file in the target set of files is classified into one of two or more filesets (discontiguous filesets), where each of these filesets comprises one or more files that are related to each other by one or more factors, such as frequency of file change or purpose of existence (e.g., used by a software application). Once classified, files within the target set of files can be uniquely processed by a data management operation (e.g., incremental data snapshot process) according to their association to a discontiguous fileset.