Patents by Inventor Abhinav Duggal
Abhinav Duggal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20200341892Abstract: Systems and methods for performing data protection operations including garbage collection operations and copy forward operations. For deduplicated data stored in a cloud-based storage or in a cloud tier that stores containers containing dead and live regions such as compression regions, the dead segments in the dead compression regions are deleted by copying the live compression regions into new containers and then deleting the old containers. The copy forward is based on a recipe from a data protection system and is performed using a microservices based approach.Type: ApplicationFiled: April 26, 2019Publication date: October 29, 2020Inventors: Abhinav Duggal, Ramprasad Chinthekindi, Philip Shilane
-
Publication number: 20200341891Abstract: Systems and methods for performing data protection operations including garbage collection operations and copy forward operations. For deduplicated data stored in a cloud-based storage or in a cloud tier that stores containers containing dead and live segments, the dead segments are deleted by copying live segments into new containers and then deleting the old containers. The copy forward is based on a recipe from a data protection system and is performed using a microservices that can be run as needed in the cloud.Type: ApplicationFiled: April 26, 2019Publication date: October 29, 2020Inventors: Philip Shilane, Abhinav Duggal, Ramprasad Chinthekindi
-
Patent number: 10810162Abstract: A perfect hash vector (PHVEC) is created to track segments in a deduplication file system. Files are represented by segment trees having hierarchical segment levels. Containers store the segments and fingerprints of segments. Upper-level segments are traversed to identify a first set of fingerprints of each level. These fingerprints correspond to segments that should be present. The first set of fingerprints are hashed and bits are set in the PHVEC corresponding to positions from the hashing. The containers are read to identify a second set of fingerprints. These fingerprints correspond to segments that are present. The second set of fingerprints are hashed and bits are cleared in the PHVEC corresponding to positions from the hashing. If a bit was set and not cleared, a determination is that there is at least one segment missing. If all bits set were also cleared, a determination is that no segments are missing.Type: GrantFiled: July 12, 2018Date of Patent: October 20, 2020Assignee: EMC IP Holding Company LLCInventors: Tony Wong, Abhinav Duggal, Ramprasad Chinthekindi
-
Patent number: 10795812Abstract: A garbage collection (GC) process within a deduplication backup network comprising a GC component identifying metadata stored in file system (FS) segments, storing the metadata in a metadata container locally on the server as well as on cloud storage, and reading the locally stored metadata container through the GC process to obtain metadata of the FS containers and determine live data regions of the FS containers, wherein the metadata contains fingerprints of all segments of the file system containers; and a copy forward component forwarding the live data regions to new containers written both locally on the server and on the cloud storage, writing live portions of the metadata container to a new metadata container written both locally on the server and on the cloud storage, and deleting dead compression regions from the cloud storage and deleting the original metadata container from local storage and the cloud storage.Type: GrantFiled: June 30, 2017Date of Patent: October 6, 2020Assignee: EMC IP Holding Company LLCInventors: Abhinav Duggal, Chinthekindi Ramprasad, Mahesh Kamat, Bhimsen Bhanjois
-
Patent number: 10761765Abstract: A source site includes a controller, a set of source worker nodes, and a message queue connected between the controller and source worker nodes. A destination site includes a set of destination worker nodes. The controller identifies differences between a first snapshot created at the source site at a first time and a second snapshot created at a second time, after the first time. Based on the differences, a set of tasks are generated. The tasks include one or more of copying an object from the source to destination or deleting an object from the destination. The controller places the tasks onto the message queue. A first source worker node retrieves the first task and coordinates with a first destination worker node to perform the first task. A second source worker nodes retrieves the second task and coordinates with a second destination worker node to perform the second task.Type: GrantFiled: February 2, 2018Date of Patent: September 1, 2020Assignee: EMC IP Holding Company LLCInventors: Abhinav Duggal, Atul Avinash Karmarkar, Philip Shilane, Kevin Xu
-
Publication number: 20200249995Abstract: Embodiments for dynamically resizing buffers for a slab allocator process are described. The slab allocator informs the consumer that the memory buffer must be shrunk to a smaller size. A buffer allocation process dynamically reclaims portions of larger memory buffers to make room for a smaller allocation by shrinking data objects in larger slabs and returning slabs to reserve or free slab lists. Initially a large limit is set, and it is dynamically reduced once all the available memory is exhausted. This allows the slab allocator to adapt to the workload.Type: ApplicationFiled: January 31, 2019Publication date: August 6, 2020Inventors: Tony Wong, Abhinav Duggal, Hemanth Satyanarayana
-
Publication number: 20200233752Abstract: Embodiments for a mostly unique file selection process for a deduplication backup system are described. The process assigns tags to files. A tag serves as a hint about the similarity of files in a deduplication file system. It is expected that files from the same client machine will be assigned the same tag. The tag is the smallest unit of migration and serves as a hint of the similarity of the files. The MUFS process measures the uniqueness using a u-index that is a function of the total unique size of a tag relative to the total size of the tag. A load balancer then selects the most unique tags for migration to free the maximum space. It uses the u-index to measure the uniqueness percentage of a tag, so that tags with the highest u-index are selected for migration to free up maximum space on the source node.Type: ApplicationFiled: January 18, 2019Publication date: July 23, 2020Inventors: Tony Wong, Hemanth Satyanarayana, Abhinav Duggal
-
Patent number: 10713217Abstract: In general, embodiments of the invention relate to a method and system for managing persistent storage in a local computing device. More specifically, embodiments of the invention relate to determining the amount of space that will be freed up (or become available) in the persistent storage during a data transfer using a perfect hash function. Once the amount of data to be transferred is determined, embodiments of the invention initiate the allocation of an appropriate amount of space in the remote storage device and, subsequently, initiate the transfer of the data to the remote storage device.Type: GrantFiled: October 30, 2018Date of Patent: July 14, 2020Assignee: EMC IP Holding Company LLCInventors: Srikanth Srinivasan, Ramprasad Chinthekindi, Abhinav Duggal
-
Publication number: 20200159611Abstract: A controller at a source site generates a set of tasks associated with a replication job. Each task includes one or more of copying an object from the source to destination site, or deleting an object from the destination site. The tasks are placed onto a message queue at the source site. Source worker nodes at the source site retrieve the tasks from the source site message queue for processing in conjunction with destination worker nodes at the destination site. A destination worker node, upon receiving a task from a source worker nodes, places the task onto a message queue at the destination site for retrieval by a backend worker node that handles writing to an object store at the destination site.Type: ApplicationFiled: January 14, 2020Publication date: May 21, 2020Inventors: Philip Shilane, Kevin Xu, Abhinav Duggal, Atul Avinash Karmarkar
-
Patent number: 10649682Abstract: Described is a deduplicated storage system that may perform a focused sanitization process by reducing the number of data storage containers that must be sanitized. The system leverages additional characteristics of the files that need to be sanitized such as an initial storage date (e.g. data breach date) of when a sensitive file (e.g. file to be sanitization) was actually stored on the deduplicated storage system. By maintaining a creation date of data containers, the system may limit sanitization to those containers having a creation date on or after the initial storage date of the sensitive file. Accordingly, the system is capable of performing a more focused overwriting of data thereby improving the overall efficiency of the sanitization process.Type: GrantFiled: October 6, 2017Date of Patent: May 12, 2020Assignee: EMC IP HOLDING COMPANY LLCInventors: Ramprasad Chinthekindi, Shah Veeral, Abhinav Duggal
-
Patent number: 10649807Abstract: In an embodiment, a method for validating data integrity of a seeding process is described. The seeding process for migrating data from a source tier to a target tier persists a perfect hash vector (PHV) to a disk when the seeding process is suspended for various reasons. The PHV includes bits for fingerprints for data segments corresponding to the data, and can be reloaded into memory upon resumption of the seeding process. One or more bits corresponding to fingerprints for copied data segments are reset prior to starting the copy phase in the resumed run. A checksum of the PHV is calculated after the seeding process completes copying data segments in the containers. A non-zero checksum of the PHV indicates that one or more data segments are missing on the source tier or the data segments are not successfully copied to the target tier. The missing data segments and/or one or more related files are reported to a user via a user interface.Type: GrantFiled: October 24, 2018Date of Patent: May 12, 2020Assignee: EMC IP HOLDING COMPANY LLCInventors: Ramprasad Chinthekindi, Abhinav Duggal, Srikanth Srinivasan, Lan Bai
-
Publication number: 20200133720Abstract: In an embodiment, a method for validating data integrity of a seeding process is described. The seeding process for migrating data from a source tier to a target tier persists a perfect hash vector (PHV) to a disk when the seeding process is suspended for various reasons. The PHV includes bits for fingerprints for data segments corresponding to the data, and can be reloaded into memory upon resumption of the seeding process. One or more bits corresponding to fingerprints for copied data segments are reset prior to starting the copy phase in the resumed run. A checksum of the PHV is calculated after the seeding process completes copying data segments in the containers. A non-zero checksum of the PHV indicates that one or more data segments are missing on the source tier or the data segments are not successfully copied to the target tier. The missing data segments and/or one or more related files are reported to a user via a user interface.Type: ApplicationFiled: October 24, 2018Publication date: April 30, 2020Inventors: Ramprasad Chinthekindi, Abhinav Duggal, Srikanth Srinivasan, Lan Bai
-
Publication number: 20200133719Abstract: In an embodiment, a system and method for supporting a seeding process with suspend and resume capabilities are described. A resumable seeding component in a data seeding module can be used to move data from a source tier to a target tier. A resumption context including a perfect hash function (PHF) and a perfect hash vector (PHV) persists a state of a seeding process at the end of each operation in the seeding process. The PHV represents data segments of the data using the PHF. The resumption context is loaded into memory upon resumption of the seeding process after it is suspended. Information in the resumable context is used to determine a last successfully completed operation, and a last copied container. The seeding process is resumed by executing an operation following the completed operation in the resumable context.Type: ApplicationFiled: October 24, 2018Publication date: April 30, 2020Inventors: Ramprasad Chinthekindi, Abhinav Duggal, Srikanth Srinivasan, Lan Bai
-
Publication number: 20200134042Abstract: In general, embodiments of the invention relate to a method and system for managing persistent storage in a local computing device. More specifically, embodiments of the invention relate to determining the amount of space that will be freed up (or become available) in the persistent storage during a data transfer using a perfect hash function. Once the amount of data to be transferred is determined, embodiments of the invention initiate the allocation of an appropriate amount of space in the remote storage device and, subsequently, initiate the transfer of the data to the remote storage device.Type: ApplicationFiled: October 30, 2018Publication date: April 30, 2020Inventors: Srikanth Srinivasan, Ramprasad Chinthekindi, Abhinav Duggal
-
Publication number: 20200125410Abstract: A schedule is stored indicating a frequency of replication from source to destination sites. When a replication job is initiated, information identifying one or more objects at the source site to be replicated is copied into a snapshot without pausing user operations against the one or more objects. The snapshot is compared with a previous snapshot to generate replication tasks for the replication job. The replication tasks are placed onto a message queue at the source site, where a worker node at the source site retrieves a replication task from the message queue and processes the replication task in conjunction with a worker node at the destination site.Type: ApplicationFiled: October 29, 2019Publication date: April 23, 2020Inventors: Atul Avinash Karmarkar, Philip Shilane, Kevin Xu, Abhinav Duggal
-
Patent number: 10628298Abstract: Generate first data structure based on unique identifiers of objects in object storages. Set indicators in positions in first data structure corresponding to hashes of unique identifiers of active objects in storages. When garbage collection is suspended, store suspension information to persistent storage. Set indicators in second data structure positions corresponding to hashes of unique identifiers of data objects that are deduplicated to storages while garbage collection is suspended. When garbage collection is resumed, retrieve suspension information from persistent storage. Set indicators in positions in first data structure corresponding to hashes of unique identifiers of data objects corresponding to indicators set in second data structure positions. Copy active objects from first object storage to second if number of active objects in first object storage does not satisfy threshold.Type: GrantFiled: October 26, 2018Date of Patent: April 21, 2020Assignee: EMC IP HOLDING COMPANY LLCInventors: Ramprasad Chinthekindi, Abhinav Duggal
-
Publication number: 20200117546Abstract: Embodiments for a memory efficient perfect hashing for large records. A container ID set is divided into multiple fixed range sizes. These ranges are then mapped into perfect hash buckets until each bucket is filled to uniformly distribute the container IDs across different perfect hash buckets so that the number of CIDs in every perfect hash bucket is the same or nearly the same. Individual perfect hash functions are created for each perfect hash bucket. With container IDs as keys, the process maps n keys to n positions to reduce any extra memory overhead. The perfect hash function is implemented using a compress, hash, displace (CHD) algorithm using two levels of hash functions. The level 1 hash functions divides the keys into multiple internal buckets with a defined average number of keys per bucket. The CHD algorithm iteratively tries different level 2 hash variables to achieve collision-free mapping.Type: ApplicationFiled: October 12, 2018Publication date: April 16, 2020Inventors: Tony Wong, Hemanth Satyanarayana, Abhinav Duggal, Ranganathan Dhathri Purohith
-
Patent number: 10592158Abstract: A method for transferring data includes populating a perfect hash bit vector (PHV) using a perfect hash function (PHF) and a target index file to obtain a populated PHV, determining required segment references using the populated PHV and received segment references, providing the required segment references to a source storage device, and receiving segments corresponding to the required segment references from the source storage device.Type: GrantFiled: October 30, 2018Date of Patent: March 17, 2020Assignee: EMC IP Holding Company LLCInventors: Ramprasad Chinthekindi, Abhinav Duggal
-
Patent number: 10585746Abstract: A controller at a source site generates a set of tasks associated with a replication job. Each task involves a source worker node from among a set of source worker nodes at the source site, a destination worker node from among a set of destination worker nodes at the destination site, and includes one or more of copying an object from the source to destination site, or deleting an object from the destination site. Status update messages concerning the tasks are received at a message queue connected between the controller and the set of source worker nodes. The status update messages are logged into a persistent key-value store. Upon a failure to complete the replication job, the key-value store is accessed to identify tasks that were and were not completed before the failure. The tasks that were not completed are resent to the source worker nodes.Type: GrantFiled: February 2, 2018Date of Patent: March 10, 2020Assignee: EMC IP Holding Company LLCInventors: Philip Shilane, Kevin Xu, Abhinav Duggal, Atul Avinash Karmarkar
-
Publication number: 20200019623Abstract: A perfect hash vector (PHVEC) is created to track segments in a deduplication file system. Files are represented by segment trees having hierarchical segment levels. Containers store the segments and fingerprints of segments. Upper-level segments are traversed to identify a first set of fingerprints of each level. These fingerprints correspond to segments that should be present. The first set of fingerprints are hashed and bits are set in the PHVEC corresponding to positions from the hashing. The containers are read to identify a second set of fingerprints. These fingerprints correspond to segments that are present. The second set of fingerprints are hashed and bits are cleared in the PHVEC corresponding to positions from the hashing. If a bit was set and not cleared, a determination is that there is at least one segment missing. If all bits set were also cleared, a determination is that no segments are missing.Type: ApplicationFiled: July 12, 2018Publication date: January 16, 2020Inventors: Tony Wong, Abhinav Duggal, Ramprasad Chinthekindi