Patents by Inventor Junlong Gao

Junlong Gao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Tiering Data to a Cold Storage Tier of Cloud Object Storage

Publication number: 20220066882

Abstract: Techniques for tiering data to a cold storage tier of a cloud object storage platform are provided. In one set of embodiments, a computer system can identify one or more old snapshots of a data set that reside in a first storage tier of the cloud object storage platform, where the one or more old snapshots are snapshots that are unlikely to be deleted from the cloud object storage platform within a period of N days. The computer system can further, for each snapshot in the one or more old snapshots: identify one or more data blocks in the snapshot that are superseded by a more recent snapshot in the one or more old snapshots; write the one or more data blocks to a second (i.e., cold) storage tier of the cloud object storage platform that has a lower storage cost than the first storage tier; and cause the one or more data blocks to be deleted from the first storage tier.

Type: Application

Filed: August 25, 2020

Publication date: March 3, 2022

Inventors: Wenguang Wang, Vamsi Gunturu, Junlong Gao, Petr Vandrovec, Ilya Languev, Maxime Austruy, Ilia Sokolinski, Satish Pudi
Recovering the Metadata of Data Backed Up in Cloud Object Storage

Publication number: 20220066883

Abstract: Techniques for recovering metadata associated with data backed up in cloud object storage are provided. In one set of embodiments, a computer system can create a snapshot of a data set, where the snapshot includes a plurality of data blocks of the data set that have been modified since the creation of a prior snapshot of the data set. The computer system can further upload the snapshot to a cloud object storage platform of a cloud infrastructure, where the snapshot is uploaded as a plurality of log segments conforming to an object format of the cloud object storage platform, and where each log segment includes one or more data blocks in the plurality of data blocks, and a set of metadata comprising, for each of the one or more data blocks, an identifier of the data set, an identifier of the snapshot, and a logical block address (LBA) of the data block.

Type: Application

Filed: August 25, 2020

Publication date: March 3, 2022

Inventors: Wenguang Wang, Vamsi Gunturu, Junlong Gao, Petr Vandrovec, Ilya Languev, Maxime Austruy, Ilia Sokolinski, Satish Pudi
SCALABLE SEGMENT CLEANING FOR A LOG-STRUCTURED FILE SYSTEM

Publication number: 20220057955

Abstract: Scalable segment cleaning for log-structured file systems (LFSs) includes determining counts of segment cleaners and virtual nodes, with each virtual node being associated with a plurality of objects. Each virtual node is assigned to a selected segment cleaner. Based at least on the assignments, performing, for each virtual node, segment cleaning of the objects by the assigned segment cleaner. A portion, less than all, of the virtual nodes are reassigned to a newly selected segment cleaner based on a change of the count of the segment cleaners and/or a change of the count of the virtual nodes. Based at least on the reassignments, segment cleaning of the objects is performed, for each reassigned virtual node, by the reassigned segment cleaner. In some examples, the objects comprise virtual machine disks (VMDKs) and the segment cleaning uses a segment usage table (SUT) to track segment usage and identify segment cleaning candidates.

Type: Application

Filed: August 21, 2020

Publication date: February 24, 2022

Inventors: Wenguang Wang, Junlong Gao, Vamsi Gunturu
LOG-STRUCTURED FORMATS FOR MANAGING ARCHIVED STORAGE OF OBJECTS

Publication number: 20220058094

Abstract: Solutions for managing archived storage include receiving, at a first node, a snapshot comprising object data (e.g., a virtual machine disk snapshot) from a second node (e.g., a software defined data center), and storing the snapshot in a tiered structure that includes a data tier and a metadata tier. Snapshots may be used for fail-over operations and/or backups, to support disaster recovery. The data tier comprises a log-structured file system (LFS), and the metadata tier comprises a content addressable storage (CAS) identifying addresses within the LFS. The metadata tier also comprises a logical layer indicating content in the CAS. Segment cleaning of the data tier is performed using a segment usage table (SUT). Some examples include performing a fail-over operation from the second node to a third node using at least the stored snapshot for workload recovery. In some examples, the CAS comprises a log-structured merge-tree (LSM-tree).

Type: Application

Filed: August 20, 2020

Publication date: February 24, 2022

Inventors: Vamsi Gunturu, Wenguang Wang, Junlong Gao, Ilia Langouev, Petr Vandrovec, Maxime Austruy, Ilia Sokolinski, Satish Pudi
ENHANCING EFFICIENCY OF SEGMENT CLEANING FOR A LOG-STRUCTURED FILE SYSTEM

Publication number: 20220058161

Abstract: The efficiency of segment cleaning for a log-structured file system (LFS) is enhanced at least by storing additional information in a segment usage table (SUT). Live blocks (representing portions of stored objects) in an LFS are determined based at least on the SUT. Chunk identifiers associated with the live blocks are read. The live blocks are coalesced at least by writing at least a portion of the live blocks into at least one new segment. A blind update of at least a portion of the chunk identifiers in a chunk map is performed to indicate the new segment. The blind update includes writing to the chunk map without reading from the chunk map. In some examples, the objects comprise virtual machine disks (VMDKs) and the SUT changes between a list format and a bitmap format, to minimize size.

Type: Application

Filed: August 21, 2020

Publication date: February 24, 2022

Inventors: Wenguang Wang, Ilia Langouev, Vamsi Gunturu, Junlong Gao
Managing metadata for a backup data storage

Patent number: 11221944

Abstract: A method for managing metadata for data stored in a cloud storage is provided. The method receives, at a first of a plurality of metadata servers, information associated with an object stored in the cloud storage, the information comprising a plurality of LBAs for where the object is stored. Each metadata server allocates contiguous chunk IDs for a group of objects. The method generates a new chunk ID for the object, which is a combination of a unique fixed value and a monotonically incrementing local value associated with each LBA, such that a first LBA is mapped to a first chunk ID having a first local value and a next LBA is mapped to a second chunk ID having the first local value incremented as a second local value. The method stores the new chunk ID and other metadata in one or more tables stored in a metadata storage.

Type: Grant

Filed: August 25, 2020

Date of Patent: January 11, 2022

Assignee: VMware, Inc.

Inventors: Wenguang Wang, Vamsi Gunturu, Junlong Gao, Ilya Languev, Petr Vandrovec, Maxime Austruy, Ilia Sokolinski, Satish Pudi
Using Erasure Coding in a Single Region to Reduce the Likelihood of Losing Objects Maintained in Cloud Object Storage

Publication number: 20210365318

Abstract: Techniques for using erasure coding in a single region to reduce the likelihood of losing objects in a cloud object storage platform are provided. In one set of embodiments, a computer system can upload a plurality of data objects to a region of a cloud object storage platform, where the plurality of data objects including modifications to a data set. The computer system can further compute a parity object based on the plurality of data objects, where the parity object encodes parity information for the plurality of data objects. The computer system can then upload the parity object to the same region where the plurality of data objects was uploaded.

Type: Application

Filed: May 22, 2020

Publication date: November 25, 2021

Inventors: Wenguang Wang, Vamsi Gunturu, Junlong Gao
Using Data Mirroring Across Multiple Regions to Reduce the Likelihood of Losing Objects Maintained in Cloud Object Storage

Publication number: 20210365365

Abstract: Techniques for using data mirroring across regions to reduce the likelihood of losing objects in a cloud object storage platform are provided. In one set of embodiments, a computer system can upload first and second copies of a data object to first and second regions of the cloud object storage platform respectively, where the first and second copies are identical. The computer system can then attempt to read the first copy of the data object from the first region. If the read attempt fails, the computer system can retrieve the second copy of the data object from the second region.

Type: Application

Filed: May 22, 2020

Publication date: November 25, 2021

Inventors: Wenguang Wang, Vamsi Gunturu, Junlong Gao
Using Erasure Coding Across Multiple Regions to Reduce the Likelihood of Losing Objects Maintained in Cloud Object Storage

Publication number: 20210365319

Abstract: Techniques for using erasure coding across multiple regions to reduce the likelihood of losing objects in a cloud object storage platform are provided. In one set of embodiments, a computer system can upload each of a plurality of data objects to each of a plurality of regions of the cloud object storage platform. The computer system can further compute a parity object based on the plurality of data objects, where the parity object encodes parity information for the plurality of data objects. The computer system can then upload the parity object to another region of the cloud object storage platform different from the plurality of regions.

Type: Application

Filed: May 22, 2020

Publication date: November 25, 2021

Inventors: Wenguang Wang, Junlong Gao, Vamsi Gunturu
Lockless synchronization of LSM tree metadata in a distributed system

Patent number: 11176099

Abstract: The disclosure herein describes synchronizing a data cache and an LSM tree file system on an object storage platform. Instructions to send a cached data set from the data cache to the LSM tree file system are received. An updated metadata catalog is generated. If the LSM tree structure is out of shape, compaction is performed on the LSM tree file system which may be on a different system or server. When an unmerged compacted metadata catalog is identified, a merged metadata catalog is generated, based on the compacted metadata catalog and the cached data set, and associated with the cached data set. The cached data set and the associated metadata catalog are sent to the LSM tree file system, whereby the data cache and the LSM tree file system are synchronized. Synchronization is enabled without the data cache or file system being locked and/or waiting for the other entity.

Type: Grant

Filed: December 21, 2018

Date of Patent: November 16, 2021

Assignee: VMware, Inc.

Inventors: Wenguang Wang, Junlong Gao, Richard P. Spillane, Robert T. Johnson, Christos Karamanolis, Maxime Austruy
Lockless method for writing updated versions of a configuration data file for a distributed file system using directory renaming

Patent number: 11163461

Abstract: System and method for writing updated versions of a configuration data file for a distributed file system in a storage system uses a directory renaming operation to write a new updated version of the configuration data file using the latest version of the configuration data file and a target directory. After the latest version of the configuration data file is modified by a particular host computer in the storage system, the modified configuration data file is written to a temporary file. The directory naming operation is then initiated on the temporary file to change the directory for the temporary file to the target directory. If the directory renaming operation has failed, a retry is performed by the particular host computer to write the new updated version of the configuration data file using a new latest version and a new target directory.

Type: Grant

Filed: April 20, 2020

Date of Patent: November 2, 2021

Assignee: VMware, Inc.

Inventors: Ye Zhang, Wenguang Wang, Sriram Patil, Richard P. Spillane, Junlong Gao, Wangping He, Zhaohui Guo, Yang Yang
LOCKLESS METHOD FOR WRITING UPDATED VERSIONS OF A CONFIGURATION DATA FILE FOR A DISTRIBUTED FILE SYSTEM USING DIRECTORY RENAMING

Publication number: 20210326049

Abstract: System and method for writing updated versions of a configuration data file for a distributed file system in a storage system uses a directory renaming operation to write a new updated version of the configuration data file using the latest version of the configuration data file and a target directory. After the latest version of the configuration data file is modified by a particular host computer in the storage system, the modified configuration data file is written to a temporary file. The directory naming operation is then initiated on the temporary file to change the directory for the temporary file to the target directory. If the directory renaming operation has failed, a retry is performed by the particular host computer to write the new updated version of the configuration data file using a new latest version and a new target directory.

Type: Application

Filed: April 20, 2020

Publication date: October 21, 2021

Inventors: Ye ZHANG, Wenguang WANG, Sriram PATIL, Richard P. SPILLANE, Junlong GAO, Wangping HE, Zhaohui GUO, Yang YANG
Using an LSM tree file structure for the on-disk format of an object storage platform

Patent number: 11093472

Abstract: The disclosure herein describes providing and accessing data on an object storage platform using a log-structured merge (LSM) tree file system. The LSM tree file system on the object storage platform includes sorted data tables, each sorted data table including a payload portion and an index portion. Data is written to the LSM tree file system in at least one new sorted data table. Data is ready by identifying a data location of the data based on index portions of the sorted data tables and reading the data from a sorted data table associated with the identified data location. The use of the LSM tree file system on the object storage platform provides an efficient means for interacting with the data stored thereon.

Type: Grant

Filed: December 7, 2018

Date of Patent: August 17, 2021

Assignee: VMware, Inc.

Inventors: Richard P. Spillane, Wenguang Wang, Junlong Gao, Robert T. Johnson, Christos Karamanolis, Maxime Austruy
System and method of a highly concurrent cache replacement algorithm

Patent number: 11086779

Abstract: Disclosed are a method and system for managing multi-threaded concurrent access to a cache data structure. The cache data structure includes a hash table and three queues. The hash table includes a list of elements for each hash bucket with each hash bucket containing a mutex object and elements in each of the queues containing lock objects. Multiple threads can each lock a different hash bucket to have access to the list, and multiple threads can each lock a different element in the queues. The locks permit highly concurrent access to the cache data structure without conflict. Also, atomic operations are used to obtain pointers to elements in the queues so that a thread can safely advance each pointer. Race conditions that are encountered with locking an element in the queues or entering an element into the hash table are detected, and the operation encountering the race condition is retried.

Type: Grant

Filed: November 11, 2019

Date of Patent: August 10, 2021

Assignee: VMware, Inc.

Inventors: Wenguang Wang, Mounesh Badiger, Abhay Kumar Jain, Junlong Gao, Zhaohui Guo, Richard P. Spillane
Synchronization of index copies in an LSM tree file system

Patent number: 11074225

Abstract: The disclosure herein describes synchronizing cached index copies at a first site with indexes of a log-structured merge (LSM) tree file system on an object storage platform at a second site. An indication that the LSM tree file system has been compacted based on a compaction process is received. A cached metadata catalog of the included parent catalog version at the first site is accessed. A set of cached index copies is identified at the first site based on the metadata of the cached metadata catalog. The compaction process is applied to the identified set of cached index copies and a compacted set of cached index copies is generated at the first site, whereby the compacted set of cached index copies is synchronized with a respective set of indexes of the plurality of sorted data tables of the LSM tree file system at the second site.

Type: Grant

Filed: December 21, 2018

Date of Patent: July 27, 2021

Assignee: VMware, Inc.

Inventors: Wenguang Wang, Richard P. Spillane, Junlong Gao, Robert T. Johnson, Christos Karamanolis, Maxime Austruy
Scale out chunk store to multiple nodes to allow concurrent deduplication

Patent number: 11055265

Abstract: The present disclosure provides techniques for scaling out deduplication of files among a plurality of nodes. The techniques include designating a master component for the coordination of deduplication. The master component divides files to be deduplicated among several slave nodes, and provides to each slave node a set of unique identifiers that are to be assigned to chunks during the deduplication process. The techniques herein preserve integrity of the deduplication process that has been scaled out among several nodes. The scaled out deduplication process deduplicates files faster by allowing several deduplication modules to work in parallel to deduplicate files.

Type: Grant

Filed: August 27, 2019

Date of Patent: July 6, 2021

Assignee: VMware, Inc.

Inventors: Wenguang Wang, Junlong Gao, Marcos K. Aguilera, Richard P. Spillane, Christos Karamanolis, Maxime Austruy
SYSTEM AND METHOD OF A HIGHLY CONCURRENT CACHE REPLACEMENT ALGORITHM

Publication number: 20210141728

Abstract: Disclosed are a method and system for managing multi-threaded concurrent access to a cache data structure. The cache data structure includes a hash table and three queues. The hash table includes a list of elements for each hash bucket with each hash bucket containing a mutex object and elements in each of the queues containing lock objects. Multiple threads can each lock a different hash bucket to have access to the list, and multiple threads can each lock a different element in the queues. The locks permit highly concurrent access to the cache data structure without conflict. Also, atomic operations are used to obtain pointers to elements in the queues so that a thread can safely advance each pointer. Race conditions that are encountered with locking an element in the queues or entering an element into the hash table are detected, and the operation encountering the race condition is retried.

Type: Application

Filed: November 11, 2019

Publication date: May 13, 2021

Inventors: Wenguang WANG, Mounesh BADIGER, Abhay Kumar JAIN, Junlong GAO, Zhaohui GUO, Richard P. SPILLANE
SMALL IN-MEMORY CACHE TO SPEED UP CHUNK STORE OPERATION FOR DEDUPLICATION

Publication number: 20210064581

Abstract: The present disclosure provides techniques for deduplicating files. The techniques include creating a cache or subset of a large data structure. The large data structure organizes information by random hash values. The random hash values result in a random organization of information within the data structure, with the information spanning a large number of storage blocks within a storage system. The cache, however, is within memory and is small relative to the data structure. The cache is created so as to contain information that is likely to be needed during deduplication of a file. Having needed information within memory rather than in storage results in faster read and write operations to that information, improving the performance of a computing system.

Type: Application

Filed: August 27, 2019

Publication date: March 4, 2021

Inventors: Wenguang WANG, Junlong GAO, Marcos K. AGUILERA, Richard P. SPILLANE, Christos KARAMANOLIS, Maxime AUSTRUY
FAST ALGORITHM TO FIND FILE SYSTEM DIFFERENCE FOR DEDUPLICATION

Publication number: 20210064580

Abstract: The disclosure provides techniques for deduplicating files. The techniques include, upon creating or modifying a file, placing a logical timestamp of the current logical time, within a queue associated with the directory of the file. The techniques further include placing the logical timestamp within a queue of each parent directory of the directory of the file. To determine a set of files for deduplication, the techniques disclosed herein identify files that have been modified within a logical time range. The set of files modified within a logical time is identified by traversing directories of a storage system, the directories being organized within a tree structure. If a directory's queue does not contain a timestamp that is within the logical time range, then all child directories can be skipped over for further processing, such that no files within the child directories end up being within the set of files for deduplication.

Type: Application

Filed: August 27, 2019

Publication date: March 4, 2021

Inventors: Junlong GAO, Wenguang WANG, Marcos K. AGUILERA, Richard P. SPILLANE, Christos KARAMANOLIS, Maxime AUSTRUY
PROBABILISTIC ALGORITHM TO CHECK WHETHER A FILE IS UNIQUE FOR DEDUPLICATION

Publication number: 20210064579

Abstract: Disclosed techniques include deduplication. Techniques include determining whether a file is unique, and depending on whether the file is unique, deduplicating only part of the file or the entire file. The techniques include processing the first chunk of a file to determine whether the hash of the chunk hash is already within a chunk hash table, and if not, then a percentage of chunks of the file is similarly processed. If any of the hashes of chunks are already in the chunk hash table, then at least some of file has been previously deduplicated, and file is not unique the storage system. If none of the processed chunks have a hash that is already in the chunk hash table, then the file is considered to be unique within chunk store and only a partial percentage of the file's chunks are deduplicated. Not all of a unique file's chunks are deduplicated.

Type: Application

Filed: August 27, 2019

Publication date: March 4, 2021

Inventors: Wenguang WANG, Junlong GAO, Marcos K. AGUILERA, Richard P. SPILLANE, Christos KARAMANOLIS, Maxime AUSTRUY

prev 1 2 3 4 next