Patents by Inventor Junlong Gao

Junlong Gao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220066882
    Abstract: Techniques for tiering data to a cold storage tier of a cloud object storage platform are provided. In one set of embodiments, a computer system can identify one or more old snapshots of a data set that reside in a first storage tier of the cloud object storage platform, where the one or more old snapshots are snapshots that are unlikely to be deleted from the cloud object storage platform within a period of N days. The computer system can further, for each snapshot in the one or more old snapshots: identify one or more data blocks in the snapshot that are superseded by a more recent snapshot in the one or more old snapshots; write the one or more data blocks to a second (i.e., cold) storage tier of the cloud object storage platform that has a lower storage cost than the first storage tier; and cause the one or more data blocks to be deleted from the first storage tier.
    Type: Application
    Filed: August 25, 2020
    Publication date: March 3, 2022
    Inventors: Wenguang Wang, Vamsi Gunturu, Junlong Gao, Petr Vandrovec, Ilya Languev, Maxime Austruy, Ilia Sokolinski, Satish Pudi
  • Publication number: 20220066883
    Abstract: Techniques for recovering metadata associated with data backed up in cloud object storage are provided. In one set of embodiments, a computer system can create a snapshot of a data set, where the snapshot includes a plurality of data blocks of the data set that have been modified since the creation of a prior snapshot of the data set. The computer system can further upload the snapshot to a cloud object storage platform of a cloud infrastructure, where the snapshot is uploaded as a plurality of log segments conforming to an object format of the cloud object storage platform, and where each log segment includes one or more data blocks in the plurality of data blocks, and a set of metadata comprising, for each of the one or more data blocks, an identifier of the data set, an identifier of the snapshot, and a logical block address (LBA) of the data block.
    Type: Application
    Filed: August 25, 2020
    Publication date: March 3, 2022
    Inventors: Wenguang Wang, Vamsi Gunturu, Junlong Gao, Petr Vandrovec, Ilya Languev, Maxime Austruy, Ilia Sokolinski, Satish Pudi
  • Publication number: 20220057955
    Abstract: Scalable segment cleaning for log-structured file systems (LFSs) includes determining counts of segment cleaners and virtual nodes, with each virtual node being associated with a plurality of objects. Each virtual node is assigned to a selected segment cleaner. Based at least on the assignments, performing, for each virtual node, segment cleaning of the objects by the assigned segment cleaner. A portion, less than all, of the virtual nodes are reassigned to a newly selected segment cleaner based on a change of the count of the segment cleaners and/or a change of the count of the virtual nodes. Based at least on the reassignments, segment cleaning of the objects is performed, for each reassigned virtual node, by the reassigned segment cleaner. In some examples, the objects comprise virtual machine disks (VMDKs) and the segment cleaning uses a segment usage table (SUT) to track segment usage and identify segment cleaning candidates.
    Type: Application
    Filed: August 21, 2020
    Publication date: February 24, 2022
    Inventors: Wenguang Wang, Junlong Gao, Vamsi Gunturu
  • Publication number: 20220058094
    Abstract: Solutions for managing archived storage include receiving, at a first node, a snapshot comprising object data (e.g., a virtual machine disk snapshot) from a second node (e.g., a software defined data center), and storing the snapshot in a tiered structure that includes a data tier and a metadata tier. Snapshots may be used for fail-over operations and/or backups, to support disaster recovery. The data tier comprises a log-structured file system (LFS), and the metadata tier comprises a content addressable storage (CAS) identifying addresses within the LFS. The metadata tier also comprises a logical layer indicating content in the CAS. Segment cleaning of the data tier is performed using a segment usage table (SUT). Some examples include performing a fail-over operation from the second node to a third node using at least the stored snapshot for workload recovery. In some examples, the CAS comprises a log-structured merge-tree (LSM-tree).
    Type: Application
    Filed: August 20, 2020
    Publication date: February 24, 2022
    Inventors: Vamsi Gunturu, Wenguang Wang, Junlong Gao, Ilia Langouev, Petr Vandrovec, Maxime Austruy, Ilia Sokolinski, Satish Pudi
  • Publication number: 20220058161
    Abstract: The efficiency of segment cleaning for a log-structured file system (LFS) is enhanced at least by storing additional information in a segment usage table (SUT). Live blocks (representing portions of stored objects) in an LFS are determined based at least on the SUT. Chunk identifiers associated with the live blocks are read. The live blocks are coalesced at least by writing at least a portion of the live blocks into at least one new segment. A blind update of at least a portion of the chunk identifiers in a chunk map is performed to indicate the new segment. The blind update includes writing to the chunk map without reading from the chunk map. In some examples, the objects comprise virtual machine disks (VMDKs) and the SUT changes between a list format and a bitmap format, to minimize size.
    Type: Application
    Filed: August 21, 2020
    Publication date: February 24, 2022
    Inventors: Wenguang Wang, Ilia Langouev, Vamsi Gunturu, Junlong Gao
  • Patent number: 11221944
    Abstract: A method for managing metadata for data stored in a cloud storage is provided. The method receives, at a first of a plurality of metadata servers, information associated with an object stored in the cloud storage, the information comprising a plurality of LBAs for where the object is stored. Each metadata server allocates contiguous chunk IDs for a group of objects. The method generates a new chunk ID for the object, which is a combination of a unique fixed value and a monotonically incrementing local value associated with each LBA, such that a first LBA is mapped to a first chunk ID having a first local value and a next LBA is mapped to a second chunk ID having the first local value incremented as a second local value. The method stores the new chunk ID and other metadata in one or more tables stored in a metadata storage.
    Type: Grant
    Filed: August 25, 2020
    Date of Patent: January 11, 2022
    Assignee: VMware, Inc.
    Inventors: Wenguang Wang, Vamsi Gunturu, Junlong Gao, Ilya Languev, Petr Vandrovec, Maxime Austruy, Ilia Sokolinski, Satish Pudi
  • Publication number: 20210365318
    Abstract: Techniques for using erasure coding in a single region to reduce the likelihood of losing objects in a cloud object storage platform are provided. In one set of embodiments, a computer system can upload a plurality of data objects to a region of a cloud object storage platform, where the plurality of data objects including modifications to a data set. The computer system can further compute a parity object based on the plurality of data objects, where the parity object encodes parity information for the plurality of data objects. The computer system can then upload the parity object to the same region where the plurality of data objects was uploaded.
    Type: Application
    Filed: May 22, 2020
    Publication date: November 25, 2021
    Inventors: Wenguang Wang, Vamsi Gunturu, Junlong Gao
  • Publication number: 20210365365
    Abstract: Techniques for using data mirroring across regions to reduce the likelihood of losing objects in a cloud object storage platform are provided. In one set of embodiments, a computer system can upload first and second copies of a data object to first and second regions of the cloud object storage platform respectively, where the first and second copies are identical. The computer system can then attempt to read the first copy of the data object from the first region. If the read attempt fails, the computer system can retrieve the second copy of the data object from the second region.
    Type: Application
    Filed: May 22, 2020
    Publication date: November 25, 2021
    Inventors: Wenguang Wang, Vamsi Gunturu, Junlong Gao
  • Publication number: 20210365319
    Abstract: Techniques for using erasure coding across multiple regions to reduce the likelihood of losing objects in a cloud object storage platform are provided. In one set of embodiments, a computer system can upload each of a plurality of data objects to each of a plurality of regions of the cloud object storage platform. The computer system can further compute a parity object based on the plurality of data objects, where the parity object encodes parity information for the plurality of data objects. The computer system can then upload the parity object to another region of the cloud object storage platform different from the plurality of regions.
    Type: Application
    Filed: May 22, 2020
    Publication date: November 25, 2021
    Inventors: Wenguang Wang, Junlong Gao, Vamsi Gunturu
  • Patent number: 11176099
    Abstract: The disclosure herein describes synchronizing a data cache and an LSM tree file system on an object storage platform. Instructions to send a cached data set from the data cache to the LSM tree file system are received. An updated metadata catalog is generated. If the LSM tree structure is out of shape, compaction is performed on the LSM tree file system which may be on a different system or server. When an unmerged compacted metadata catalog is identified, a merged metadata catalog is generated, based on the compacted metadata catalog and the cached data set, and associated with the cached data set. The cached data set and the associated metadata catalog are sent to the LSM tree file system, whereby the data cache and the LSM tree file system are synchronized. Synchronization is enabled without the data cache or file system being locked and/or waiting for the other entity.
    Type: Grant
    Filed: December 21, 2018
    Date of Patent: November 16, 2021
    Assignee: VMware, Inc.
    Inventors: Wenguang Wang, Junlong Gao, Richard P. Spillane, Robert T. Johnson, Christos Karamanolis, Maxime Austruy
  • Patent number: 11163461
    Abstract: System and method for writing updated versions of a configuration data file for a distributed file system in a storage system uses a directory renaming operation to write a new updated version of the configuration data file using the latest version of the configuration data file and a target directory. After the latest version of the configuration data file is modified by a particular host computer in the storage system, the modified configuration data file is written to a temporary file. The directory naming operation is then initiated on the temporary file to change the directory for the temporary file to the target directory. If the directory renaming operation has failed, a retry is performed by the particular host computer to write the new updated version of the configuration data file using a new latest version and a new target directory.
    Type: Grant
    Filed: April 20, 2020
    Date of Patent: November 2, 2021
    Assignee: VMware, Inc.
    Inventors: Ye Zhang, Wenguang Wang, Sriram Patil, Richard P. Spillane, Junlong Gao, Wangping He, Zhaohui Guo, Yang Yang
  • Publication number: 20210326049
    Abstract: System and method for writing updated versions of a configuration data file for a distributed file system in a storage system uses a directory renaming operation to write a new updated version of the configuration data file using the latest version of the configuration data file and a target directory. After the latest version of the configuration data file is modified by a particular host computer in the storage system, the modified configuration data file is written to a temporary file. The directory naming operation is then initiated on the temporary file to change the directory for the temporary file to the target directory. If the directory renaming operation has failed, a retry is performed by the particular host computer to write the new updated version of the configuration data file using a new latest version and a new target directory.
    Type: Application
    Filed: April 20, 2020
    Publication date: October 21, 2021
    Inventors: Ye ZHANG, Wenguang WANG, Sriram PATIL, Richard P. SPILLANE, Junlong GAO, Wangping HE, Zhaohui GUO, Yang YANG
  • Patent number: 11093472
    Abstract: The disclosure herein describes providing and accessing data on an object storage platform using a log-structured merge (LSM) tree file system. The LSM tree file system on the object storage platform includes sorted data tables, each sorted data table including a payload portion and an index portion. Data is written to the LSM tree file system in at least one new sorted data table. Data is ready by identifying a data location of the data based on index portions of the sorted data tables and reading the data from a sorted data table associated with the identified data location. The use of the LSM tree file system on the object storage platform provides an efficient means for interacting with the data stored thereon.
    Type: Grant
    Filed: December 7, 2018
    Date of Patent: August 17, 2021
    Assignee: VMware, Inc.
    Inventors: Richard P. Spillane, Wenguang Wang, Junlong Gao, Robert T. Johnson, Christos Karamanolis, Maxime Austruy
  • Patent number: 11086779
    Abstract: Disclosed are a method and system for managing multi-threaded concurrent access to a cache data structure. The cache data structure includes a hash table and three queues. The hash table includes a list of elements for each hash bucket with each hash bucket containing a mutex object and elements in each of the queues containing lock objects. Multiple threads can each lock a different hash bucket to have access to the list, and multiple threads can each lock a different element in the queues. The locks permit highly concurrent access to the cache data structure without conflict. Also, atomic operations are used to obtain pointers to elements in the queues so that a thread can safely advance each pointer. Race conditions that are encountered with locking an element in the queues or entering an element into the hash table are detected, and the operation encountering the race condition is retried.
    Type: Grant
    Filed: November 11, 2019
    Date of Patent: August 10, 2021
    Assignee: VMware, Inc.
    Inventors: Wenguang Wang, Mounesh Badiger, Abhay Kumar Jain, Junlong Gao, Zhaohui Guo, Richard P. Spillane
  • Patent number: 11074225
    Abstract: The disclosure herein describes synchronizing cached index copies at a first site with indexes of a log-structured merge (LSM) tree file system on an object storage platform at a second site. An indication that the LSM tree file system has been compacted based on a compaction process is received. A cached metadata catalog of the included parent catalog version at the first site is accessed. A set of cached index copies is identified at the first site based on the metadata of the cached metadata catalog. The compaction process is applied to the identified set of cached index copies and a compacted set of cached index copies is generated at the first site, whereby the compacted set of cached index copies is synchronized with a respective set of indexes of the plurality of sorted data tables of the LSM tree file system at the second site.
    Type: Grant
    Filed: December 21, 2018
    Date of Patent: July 27, 2021
    Assignee: VMware, Inc.
    Inventors: Wenguang Wang, Richard P. Spillane, Junlong Gao, Robert T. Johnson, Christos Karamanolis, Maxime Austruy
  • Patent number: 11055265
    Abstract: The present disclosure provides techniques for scaling out deduplication of files among a plurality of nodes. The techniques include designating a master component for the coordination of deduplication. The master component divides files to be deduplicated among several slave nodes, and provides to each slave node a set of unique identifiers that are to be assigned to chunks during the deduplication process. The techniques herein preserve integrity of the deduplication process that has been scaled out among several nodes. The scaled out deduplication process deduplicates files faster by allowing several deduplication modules to work in parallel to deduplicate files.
    Type: Grant
    Filed: August 27, 2019
    Date of Patent: July 6, 2021
    Assignee: VMware, Inc.
    Inventors: Wenguang Wang, Junlong Gao, Marcos K. Aguilera, Richard P. Spillane, Christos Karamanolis, Maxime Austruy
  • Publication number: 20210141728
    Abstract: Disclosed are a method and system for managing multi-threaded concurrent access to a cache data structure. The cache data structure includes a hash table and three queues. The hash table includes a list of elements for each hash bucket with each hash bucket containing a mutex object and elements in each of the queues containing lock objects. Multiple threads can each lock a different hash bucket to have access to the list, and multiple threads can each lock a different element in the queues. The locks permit highly concurrent access to the cache data structure without conflict. Also, atomic operations are used to obtain pointers to elements in the queues so that a thread can safely advance each pointer. Race conditions that are encountered with locking an element in the queues or entering an element into the hash table are detected, and the operation encountering the race condition is retried.
    Type: Application
    Filed: November 11, 2019
    Publication date: May 13, 2021
    Inventors: Wenguang WANG, Mounesh BADIGER, Abhay Kumar JAIN, Junlong GAO, Zhaohui GUO, Richard P. SPILLANE
  • Publication number: 20210064581
    Abstract: The present disclosure provides techniques for deduplicating files. The techniques include creating a cache or subset of a large data structure. The large data structure organizes information by random hash values. The random hash values result in a random organization of information within the data structure, with the information spanning a large number of storage blocks within a storage system. The cache, however, is within memory and is small relative to the data structure. The cache is created so as to contain information that is likely to be needed during deduplication of a file. Having needed information within memory rather than in storage results in faster read and write operations to that information, improving the performance of a computing system.
    Type: Application
    Filed: August 27, 2019
    Publication date: March 4, 2021
    Inventors: Wenguang WANG, Junlong GAO, Marcos K. AGUILERA, Richard P. SPILLANE, Christos KARAMANOLIS, Maxime AUSTRUY
  • Publication number: 20210064580
    Abstract: The disclosure provides techniques for deduplicating files. The techniques include, upon creating or modifying a file, placing a logical timestamp of the current logical time, within a queue associated with the directory of the file. The techniques further include placing the logical timestamp within a queue of each parent directory of the directory of the file. To determine a set of files for deduplication, the techniques disclosed herein identify files that have been modified within a logical time range. The set of files modified within a logical time is identified by traversing directories of a storage system, the directories being organized within a tree structure. If a directory's queue does not contain a timestamp that is within the logical time range, then all child directories can be skipped over for further processing, such that no files within the child directories end up being within the set of files for deduplication.
    Type: Application
    Filed: August 27, 2019
    Publication date: March 4, 2021
    Inventors: Junlong GAO, Wenguang WANG, Marcos K. AGUILERA, Richard P. SPILLANE, Christos KARAMANOLIS, Maxime AUSTRUY
  • Publication number: 20210064579
    Abstract: Disclosed techniques include deduplication. Techniques include determining whether a file is unique, and depending on whether the file is unique, deduplicating only part of the file or the entire file. The techniques include processing the first chunk of a file to determine whether the hash of the chunk hash is already within a chunk hash table, and if not, then a percentage of chunks of the file is similarly processed. If any of the hashes of chunks are already in the chunk hash table, then at least some of file has been previously deduplicated, and file is not unique the storage system. If none of the processed chunks have a hash that is already in the chunk hash table, then the file is considered to be unique within chunk store and only a partial percentage of the file's chunks are deduplicated. Not all of a unique file's chunks are deduplicated.
    Type: Application
    Filed: August 27, 2019
    Publication date: March 4, 2021
    Inventors: Wenguang WANG, Junlong GAO, Marcos K. AGUILERA, Richard P. SPILLANE, Christos KARAMANOLIS, Maxime AUSTRUY