OBJECT STORAGE-BASED INDEXING SYSTEMS AND METHOD

A file system and a related method are presented. The file system includes an object storage configured to store file data for one or more files and a plurality of namespace entries corresponding to file data and/or metadata of the one or more files as one or more objects. Each namespace entry of the plurality of namespace entries includes an operation type conducted on the file data and/or metadata captured in a particular snapshot and a version number corresponding to the particular snapshot. The file system further includes an indexing system configured to generate the plurality of namespace entries; store the plurality of namespace entries as one or more objects in the object storage; and identify, in response to a search query, one or more files for retrieval from the object storage based on a list of the plurality of namespace entries sorted on the version numbers.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS REFERENCE TO RELATED APPLICATION

This application claims a benefit of, and priority to, India Provisional Patent Application No. 202241002128, filed Jan. 13, 2022, the contents of which is incorporated by reference in its entirety.

BACKGROUND

Embodiments of the present invention generally relate to systems and methods for indexing metadata in cloud-based storage solutions, and more particularly to systems and methods for indexing metadata in object-based storage solutions.

Modern businesses often rely on computer systems and computer networks. It is undesirable to experience an irretrievable loss of data in such business computer systems. To prevent loss of data, computer systems are periodically backed up using a data backup file system configured to store the backup data on a storage server (e.g., a cloud-based storage). Backup data includes data blocks and metadata. For generating data blocks source data is split into chunks and stored on the storage server (e.g., a cloud storage). The metadata is the additional information maintained to allow restore of backed up data back into its original form.

Typically, a distributed database is used to store the metadata as a searchable index. Distributed databases typically may be implemented on top of solid-state devices, which have high access cost. In a distributed database, for every key search, a random read operation to the solid-state devices is potentially needed. Similarly, each store operation could potentially require a random write operation to the solid-state devices. Since random input-output operations are not supported efficiently by conventional hard disks, distributed databases often use solid state drives, which increases the operational cost of running such an index.

SUMMARY

The following summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, example embodiments, and features described, further aspects, example embodiments, and features will become apparent by reference to the drawings and the following detailed description.

Briefly, according to an example embodiment, a file system is presented. The file system includes an object storage configured to store file data for one or more files and a plurality of namespace entries corresponding to file data and/or metadata of the one or more files as one or more objects. Each namespace entry of the plurality of namespace entries includes an operation type conducted on the file data and/or metadata captured in a particular snapshot and a version number corresponding to the particular snapshot. The file system further includes an indexing system configured to generate the plurality of namespace entries; store the plurality of namespace entries as one or more objects in the object storage; and identify, in response to a search query, one or more files for retrieval from the object storage based on a list of the plurality of namespace entries sorted on the version numbers.

According to another example embodiment, an indexing system is presented. The indexing system includes a memory storing one or more processor-executable routines; and a processor communicatively coupled to the memory. The processor is configured to execute the one or more processor-executable routines to generate a plurality of namespace entries, wherein each namespace entry of the plurality of namespace entries includes an operation type conducted on a file data and/or file metadata captured in a particular snapshot and a version number corresponding to the particular snapshot. The processor is configured to execute the one or more processor-executable routines to store the plurality of namespace entries as one or more objects in an object storage; and identify, in response to a search query, one or more files for retrieval from the object storage based on the plurality of namespace entries sorted on the version numbers.

According to another example embodiment, an indexing method is presented. The indexing method includes generating a plurality of namespace entries, wherein each namespace entry of the plurality of namespace entries includes an operation type conducted on a file data and/or file metadata captured in a particular snapshot and a version number corresponding to the particular snapshot. The method further includes storing the plurality of namespace entries as one or more objects in an object storage, and identifying, in response to a search query, one or more files for retrieval from the object storage based on the plurality of namespace entries sorted on the version numbers.

BRIEF DESCRIPTION OF THE FIGURES

These and other features, aspects, and advantages of the example embodiments will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:

FIG. 1 is a block diagram illustrating an example back-up system environment, according to some aspects of the present description,

FIG. 2 is a block diagram illustrating an example file system, according to some aspects of the present description,

FIG. 3 is a diagram illustrating an example workflow for indexing according to some aspects of the present description,

FIG. 4 is a diagram illustrating an example workflow for indexing according to some aspects of the present description,

FIG. 5 is a flow chart illustrating an example indexing method, according to some aspects of the present description, and

FIG. 6 is a block diagram illustrating an example computer system, according to some aspects of the present description.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

Various example embodiments will now be described more fully with reference to the accompanying drawings in which only some example embodiments are shown. Specific structural and functional details disclosed herein are merely representative for purposes of describing example embodiments. Example embodiments, however, may be embodied in many alternate forms and should not be construed as limited to only the example embodiments set forth herein. On the contrary, example embodiments are to cover all modifications, equivalents, and alternatives thereof.

The drawings are to be regarded as being schematic representations and elements illustrated in the drawings are not necessarily shown to scale. Rather, the various elements are represented such that their function and general purpose become apparent to a person skilled in the art. Any connection or coupling between functional blocks, devices, components, or other physical or functional units shown in the drawings or described herein may also be implemented by an indirect connection or coupling. A coupling between components may also be established over a wireless connection. Functional blocks may be implemented in hardware, firmware, software, or a combination thereof.

Before discussing example embodiments in more detail, it is noted that some example embodiments are described as processes or methods depicted as flowcharts. Although the flowcharts describe the operations as sequential processes, many of the operations may be performed in parallel, concurrently, or simultaneously. In addition, the order of operations may be re-arranged. The processes may be terminated when their operations are completed, but may also have additional steps not included in the figures. It should also be noted that in some alternative implementations, the functions/acts/steps noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

Further, although the terms first, second, etc. may be used herein to describe various elements, components, regions, layers and/or sections, it should be understood that these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are used only to distinguish one element, component, region, layer, or section from another region, layer, or a section. Thus, a first element, component, region, layer, or section discussed below could be termed a second element, component, region, layer, or section without departing from the scope of example embodiments.

Spatial and functional relationships between elements (for example, between modules) are described using various terms, including “connected,” “engaged,” “interfaced,” and “coupled.” Unless explicitly described as being “direct,” when a relationship between first and second elements is described in the description below, that relationship encompasses a direct relationship where no other intervening elements are present between the first and second elements, and also an indirect relationship where one or more intervening elements are present (either spatially or functionally) between the first and second elements. In contrast, when an element is referred to as being “directly” connected, engaged, interfaced, or coupled to another element, there are no intervening elements present. Other words used to describe the relationship between elements should be interpreted in a like fashion (e.g., “between,” versus “directly between,” “adjacent,” versus “directly adjacent,” etc.).

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, the singular forms “a,” “an,” and “the,” are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the terms “and/or” and “at least one of” include any and all combinations of one or more of the associated listed items. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Unless specifically stated otherwise, or as is apparent from the description, terms such as “processing” or “computing” or “calculating” or “determining” of “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device/hardware, that manipulates and transforms data represented as physical, electronic quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

Example embodiments of the present description provide systems and methods for indexing metadata in cloud-based storage solutions, and more particularly to systems and methods for indexing metadata in object-based storage solutions. The systems and methods for indexing metadata, according to embodiments of the present description, enable cost-effective and efficient cloud-based storage architecture solutions.

As mentioned earlier, distributed database-based indexing systems may not be cost effective. Further, in previous object storage-based indexing approaches a creation version (cver) and a deletion version (dyer) are recorded for every namespace entry as a single record. Although such an approach is cost-effective with respect to storage cost, it can be inefficient due to the need for reading a record before modifying it. This incurs an additional read request cost and also increases overall compute requirements as the entire object needs to be read to fetch a single item. Embodiments of the present description address the noted shortcomings in the art

FIG. 1 illustrates an example backup system environment (herein referred to as system environment 110), in accordance with embodiments of the present description. The system environment 100 includes a data backup system 110, one or more client devices 120A, 120B, . . . 120N (collectively referred to as “client device” 120″), an object storage 130, and an optional distributed database 140.

The data backup system 110 may be a software or a hardware component that enables the one or more client devices 120A, 120B, . . . 120N to back up and index one or more backup datasets. The data backup system 110 includes an indexing system 112 and an optional deduplication indexing system 114, as described in detail later. In some embodiments, the data backup system 110 is a cloud-based service. The data backup system 110 may optionally further provide a graphical user interface 111 for individual clients to access object storage 130 for cloud data management. For example, a graphical user interface 111 may be a front-end cloud storage interface. Additionally, or alternatively, the data backup system 110 may provide APIs for the access and management of data from the object storage 130.

A client device 120 may send a request to read, search, add, delete, or modify data stored on a cloud environment via a front-end graphical user interface 111 provided and operated by the data backup system 110 or via other suitable ways such as application program interfaces (APIs). The one or more client device 120A, 120B, . . . 120N (referred to herein as “device”) may be any computing devices that have data that may need backup. Examples of such devices include without limitation, workstations, personal computers, desktop computers, virtual machines, databases, docker containers, or other types of generally fixed computing systems such as mainframe computers, servers, and minicomputers. Other examples of such devices include mobile or portable computing devices, such as one or more laptops, tablet computers, personal data assistants, mobile phones (such as smartphones), IoT devices, wearable electronic devices such as smartwatches, and other mobile or portable computing devices such as embedded computers, set-top boxes, vehicle-mounted devices, wearable computers, etc. Servers can include mail servers, file servers, database servers, virtual machine servers, and web servers.

In some embodiments, the system environment 100 includes a plurality of devices 120. The plurality of devices 120 may be from a single client or different clients being serviced by the system environment 100. In some embodiments, the system environment 100 includes a single device 120 having a plurality of data sets or one large data set that needs backup.

The one or more datasets generally include data generated by the operating system and/or applications executing on the client device 120. In general, the data present in the one or more data set may include files, directories, file system volumes, data blocks, extents, or any other hierarchies or organizations of data objects. As used herein, the term “data object” refers to (i) any file that is currently addressable by a file system or that was previously addressable by the file system (e.g., an archive file), and/or to (ii) a subset of such a file (e.g., a data block, an extent, etc.). The data present in the one or more datasets may further include structured data (e.g., database files), unstructured data (e.g., documents), and/or semi-structured data.

The one or more datasets further include associated metadata. Metadata generally includes information about data objects and/or characteristics associated with the data objects. Metadata can include, without limitation, one or more of the following: the data owner (e.g., the client or user that generates the data), the last modified time (e.g., the time of the most recent modification of the data object), a data object name (e.g., a file name), a data object size (e.g., a number of bytes of data), information about the content (e.g., an indication as to the existence of a particular search term), user-supplied tags, to/from information for email (e.g., an email sender, recipient, etc.), creation date, file type (e.g., format or application type), last accessed time, application type (e.g., type of application that generated the data object), location/network (e.g., a current, past or future location of the data object and network pathways to/from the data object), geographic location (e.g., GPS coordinates), frequency of change (e.g., a period in which the data object is modified), business unit (e.g., a group or department that generates, manages or is otherwise associated with the data object), aging information (e.g., a schedule, such as a time period, in which the data object is migrated to secondary or long term storage), boot sectors, partition layouts, file location within a file folder directory structure, user permissions, owners, groups, access control lists (ACLs), system metadata (e.g., registry information), combinations of the same or other similar information related to the data object. In addition to metadata generated by or related to file systems and operating systems, some applications and/or other components of the client device 120 maintain indices of metadata for data objects, e.g., metadata associated with individual email messages.

The data backup system 110 is configured to split the one or more datasets into chunks and store the one or more data sets as objects on the object storage 130. The indexing system 112 of the data backup system 110 is further configured to store metadata of the one or more datasets as objects using a merge index 136 in a merge index database 134 on the object storage 130. As noted earlier, metadata is the additional information maintained to allow restore of backed up data back into its original form. Typically, a database on the storage server is used to store the metadata. For example, a NoSQL database such as AWS DynamoDB. However, as mentioned earlier, using a scalable NoSQL database on cloud may not be very cost effective. Embodiments of the present invention enable cost-effective metadata storage by using a merge index database 134 on the object storage 130.

Merge index database 134 is also a key value database like NoSQL, but it differs in the way updates are done to the database. A merge index database 134 disclosed herein allows for batching of objects and storing the batched objects with a single write operation to the object storage 130. Thus, the storage cost of object storage, e.g., a hard disk, may be several times lower than that for a distributed database. Moreover, an object storage 130 allows additional index entries to be created without overloading a specific server computer. Object storage also allows multiple computers to access the merge index database 134 simultaneously.

The merge index database 134 may be used to store index information and/or metadata regarding data structure so that data can be retrieved efficiently. The merge index database 134 according to embodiments of the present description is further configured to allow for versioning of records to associate timelines with metadata records. Due to filesystem storing multiple timelines, the records stored in the merge index 136 are valid within a specific lifespan. The merge index 136 may save namespace metadata in different namespace entries that will be discussed in further detail below.

Object storage 130 (also known as object-based storage) is a computer data storage architecture that manages data as objects, as opposed to other storage architectures like file storage which manages data as a file hierarchy and block storage which manages data as blocks within sectors and tracks. Non limiting examples of object storages 130 include AMAZON S3, RACKSPACE CLOUD FILES, AZURE BLOB STORAGE, or GOOGLE CLOUD STORAGE. Each object typically may include the data of the object itself, a variable amount of metadata of the object, and a unique identifier that identifies the object. Unlike data files or data blocks, once an object is created, normally it could be difficult to change because the unique identifier of the object often is generated based on the underlying data (e.g., generated based on the checksum of the object). However, unlike file or blocks that often need an operating system of a computer to be accessed, objects may often be accessed directly from a data store and/or through API calls. This allows object storage to scale efficiently in light of various challenges in storing big data. The object storage 130 may store data blocks of file data from or more client devices 120 (as one or more objects) and merge index S3Tables in the merge index database 134, as described in detail later.

The system environment further includes an optional distributed database 140. A distributed database is a distributed, often decentralized, system that distributes data among different nodes to provide better data access and operation in case of a failure or offline of one or more nodes. A distributed database is often a NoSQL database server having non-volatile memory. Non-limiting examples of distributed databases include AMAZON DYNAMODB and APACHE CASSANDRA. In some embodiments, the distributed database 140 may be used for data deduplication purpose by storing a plurality of deduplication indices 142 generated by the deduplication indexing system 114. According to some embodiments, checksums of backup data (e.g., snapshots) are created as the deduplication indices of the backup data. For additional details about the operation of the distributed database 140 using the deduplication indices 142, U.S. Pat. No. 8,996,467, patented on Mar. 31, 2015, entitled “Distributed Scalable Deduplicated Data Backup System” is incorporated herein by reference in its entirety unless directly contradictory to the embodiments described herein.

The various components in the system environment 100 may communicate through the network 150 and/or locally. For example, in some embodiments, one of the system components may communicate locally with the data backup system 110, while other components communicate with the data backup system 110 through the networks. In other embodiments, every component in the system environment 100 is online and communicates with each other through the network 150. In one embodiment, the network 150 uses standard communications technologies and/or protocols. Thus, the network 150 can include links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 3G, digital subscriber line (DSL), asynchronous transfer mode (ATM), InfiniBand, PCI Express Advanced Switching, etc. Similarly, the networking protocols used on the network 160 can include multiprotocol label switching (MPLS), the transmission control protocol/Internet protocol (TCP/IP), the User Datagram Protocol (UDP), the hypertext transport protocol (HTTP), the simple mail transfer protocol (SMTP), the file transfer protocol (FTP), etc.

While the data backup system 110, the one or more client devices 120A, 120B, . . . 120N, the object storage130, and the distributed database 140 are each represented by a single block in FIG. 1, each of these components may include multiple distributed and/or independent computers (may also be referred to as workers) working cooperatively and in parallel with other computers so that the operation of the entire system will not be affected when one or more workers are down.

FIG. 2 is a block diagram of an example file system 200 in accordance with embodiments of the present description. The data backup system 110, the object storage 130, and optionally a distributed database 140, which communicate to each other through a network, collectively may form the file system 200. The data backup system 110 includes an indexing system 112 and optionally a deduplication indexing system 114. The file system 200 in other embodiments may include additional or fewer components. For example, the file system 200 may include additional backup storages which can be conventional data storages.

As shown in FIG. 2, the data backup system 110 may serve as the front end of the file system 200 and communicate with the one or more client devices 120. The data backup system 110 stores the file data 132 as backup data 132 in the object storage 130 and metadata 133 as a merge index 136 in the merge index database 134. Further, the one or more client devices 120 may perform a search 126 based on metadata and in response, the file system 200 may provide an output 128.

The object storage 130 is configured to store file data for one or more files and a plurality of namespace entries corresponding to file data and/or metadata of the one or more files as one or more objects. The file data may also be captured as one or more snapshots. Snapshots include list of different backup versions/timelines for each backup dataset. Each snapshot is assigned a unique version number. Higher version numbers indicate the most recent backup. In some embodiments, the one or more snapshots may be stored in the distributed database 140.

The plurality of namespace entries may be stored using a merge index database 134 in the object storage 130. The merge index database 134 is configured to maintain one merge index 136 per backup dataset. Non-limiting examples of records maintained in a merge index 136 include directory entry, block map, and the like. Directory entry includes one or more entry for each file and/or folder. Directory entry stores the folder-file hierarchy and details of each version of the file/folder. Block map maintains the list of chunks/offset of each file in the backup dataset. Each block map entry may further store a block identifier for the corresponding data block in the object storage.

As shown in FIG. 2, the file system 200 further includes an indexing system 112. The indexing system is configured to generate the plurality of namespace entries, wherein each namespace entry of the plurality of namespace entries includes an operation type conducted on the file data and/or metadata captured in a particular snapshot and a version number corresponding to the particular snapshot.

A namespace entry of the plurality of namespace entries may include a key value pair. A key in a namespace entry may be formed by multiple parts. For example, a key may be divided into three parts, which may be a group part, a term part, and a version-identifier part. A group part may be used for entries in the merge index 136 to be ordered by group as the major key. A term part may be used to sort namespace entries for a given group. A version-identifier part may be used to sort entries for a given group-term.

There can be different types of namespace entries stored in the merge index 136. For example, in one embodiment, a folder identifier may be stored as the group part of the key and a child/file name as the term part. This type of entry can be used to list files in a folder. In another embodiment, a block map type index may use a file identifier as the group part of the key and an offset as the term part. This type of entry may be used to list all data blocks in a file. In another embodiment, an attribute type index may use a file identifier as the group part of the key and an attribute as the term part. This type of entry may be used to store additional information, e.g., user defined attributes.

The values in the key-value pair namespace entry may be the namespace metadata of the file, such as the file size, modification time, access control, etc. For smaller files, the metadata may also contain the data block identifier. If a data block includes data from multiple files, the metadata may also include offset within the block.

As noted earlier, the key further includes a version-identifier part. As mentioned earlier, the plurality of namespace entries is stored as objects in the object storage 130. Since objects are difficult to update, the indexing system 112 in accordance with embodiments of the present description creates a new namespace entry to reflect a change in operation conducted on the file data and/or metadata. Thus, a version identifier is added to each namespace entry to identify the latest entry. According to embodiments of the present description, the version-identifier part of the key includes an operation type conducted on the file data and/or metadata captured in a particular snapshot and a version number corresponding to the particular snapshot. Non-limiting examples of operation type include “create” when a new record is created or when a record is modified, and “delete” when a record is deleted.

An example of a pair of namespace entries for a directory is given below:

Parent=/folder, File name=doc1, ver=13, optype=Delete
Parent=/folder, File name=doc1, ver=9, optype=Create

In the above example, “parent=/folder” is the group part in the key-value pair, “file name=doc1' is the term part in the key-value pair, and “ver=9, optype=Create” is the version identifier part in the key-value pair. Further, in the above example, the pair of namespace entries record that the file doc1 was created in snapshot version 9 and deleted in snapshot version 13.

An example of a pair of namespace entries for a block map index is given below:

Path=/folder/doc1, offset=0, ver=16, optype=Create, size=1 MB
Path=/folder/doc1, offset=1 MB, ver=16, optype=Create, size=1 MB

In the above example, “Path=/folder/doc1” is the group part in the key-value pair, “offset=1 MB’ is the term part in the key-value pair, and “ver=16, optype=Create” is the version identifier part in the key-value pair. Moreover “size=1 MB” is the value part of the key value pair. Further, in the above example, the pair of namespace entries record that the doc1 has 2 blocks that are created in the version 16 i.e., first block at OMB and second at 1 MB offset.

An example of a pair of namespace entries for an attribute index is given below:

Path=/folder/doc1, attribute=a1, ver=20, optype=Create, value=“v11”
Path=/folder/doc1, attribute=a1, ver=16, optype=Create, value=“v1”

In the above example, “Path=/folder/doc1” is the group part in the key-value pair, “attribute=al” is the term part in the key-value pair, and “ver=16, optype=Create” is the version identifier part in the key-value pair. Further, in the above example, the pair of namespace entries record that the doc1 has an attribute a1 that changed from v1 to v11.

Referring again to FIG. 2, the indexing system 112 is further configured to store the plurality of namespace entries as one or more objects using the merge index database 134 in the object storage 130. The plurality of namespace entries may be stored in the merge index database 134 as objects that are arranged in a plurality of hierarchical tables that may be referred to as S3Tables. Each merge index 136 includes a plurality of S3Tables, which may be created by different workers of the file system 200 and/or at different times. The namespace entries of the plurality of namespace entries may be stored in different S3Tables in some embodiments.

The indexing system 112 may operate in batches. When a new data file is received in file system 200, a new namespace entry associated with the new data file may not be immediately saved persistently to the object storage 130. Instead, multiple updates to the merge index database134 may be treated in batches. The new namespace entries (e.g., new key-value pairs) may be first stored temporarily in memory (e.g., a buffer memory) before flushing to the object storage 130. As a result, the namespace entries belonging to the same batch may be stored in a single object, or two or more related serialized objects if the batch has more entries than an object can include. A batch of updates may be processed and persisted by the indexing system 112 in an S3Table that includes one or more objects. The next batch of entries may go to a new S3Table that includes other objects and so on. Since merge indexes are created in batches, multiple S3Tables may be formed for files that are uploaded at different times. Periodically, the indexing system 112 may performs a merge or compaction operation of S3Tables to consolidate the entries in different S3Tables. Hence, in response to a search request, the number of S3Tables that need to be loaded can be reduced and the search speed is improved. The hierarchical structure and manner of operation of generating, merging and compacting the S3Tables are described in U.S. Pat. No. 1,125,667, patented on Feb. 22, 2022, entitled “Deduplicated merge indexed object storage file system” incorporated herein by reference in its entirety unless directly contradictory to the embodiments described herein.

With continued reference to FIG. 2, the indexing system is further configured to identify, in response to a search query 126, one or more files 128 for retrieval from the object storage 130 based on a list of the plurality of namespace entries sorted on the version numbers. In some embodiments, the sorted list of plurality of namespace entries may be sorted with the latest version number listed first followed by the earlier version numbers.

In some embodiments, the search query 126 includes a query version number and the indexing system 112 is configured to identify the one or more files by comparing the query version number with a version number preceding the query version number and a version number succeeding the query version number in the list of the plurality of namespace entries. It should be noted that in embodiments, where the query version number is the same as a version number of a create namespace entry, the indexing system 112 is configured to identify the one or more files for retrieval based on the query version number itself.

In some embodiments, the indexing system 112 is configured to generate, responsive to a creation of the file data and/or file metadata captured in a first particular snapshot, a first namespace entry including the operation type create and a first version number corresponding to the first particular snapshot. The indexing system 112 is further configured to generate, responsive to a deletion of the file data and/or file metadata captured in a second particular snapshot, a second namespace entry including the operation type delete and a second version number corresponding to the second snapshot. The indexing system 112 is furthermore configured to locate the query version number in a sorted list of the first version number and the second version number.

In some embodiments, the indexing system 112 is configured to generate, responsive to a creation of the file and/or file metadata captured in a first particular snapshot, a first namespace entry including the operation type create and a first version number corresponding to the first particular snapshot. The indexing system 112 is further configured to generate, responsive to a modification of the file data and/or file metadata captured in a second particular snapshot, a second namespace entry including the operation type create and a second version number corresponding to the second particular snapshot. The indexing system 112 is furthermore configured to locate the query version number in a sorted list of the first version number and the second version number.

In some such embodiments, the indexing system 112 is further configured to generate, responsive to a deletion of the file data and/or file metadata captured in a third particular snapshot, a third namespace entry including the operation type delete and a third version number corresponding to the third particular snapshot. The indexing system 112 is more configured to locate the query version number in a sorted list of the first version number, the second version number, and the third version number.

Thus, embodiments of the present description are different from other object storage- based indexing approaches where a creation version (cver) and a deletion version (dyer) are recorded for every namespace entry as a single record. Although such an approach is cost-effective with respect to storage cost, it can be inefficient due to the need for reading a record before modifying it. This incurs an additional read request cost and also increases overall compute requirements as the entire object needs to be read to fetch a single item. Embodiments of the present description address the noted shortcomings in the art by storing creation, modification and deletion events as separate records in the merge index, and identifying the one or more files by using a sorted list of the version numbers. Hence, the techniques described herein entirely skip the read operation while performing a delete or update operation on the file.

The manner of operation of the indexing system 112 for generating and storing a plurality of namespace entries corresponding to different operation types is further illustrated with a conceptual diagram in FIGS. 3 and 4. In FIGS. 3 and 4, creation, deletion, and modification of one or more files and corresponding generation and storage of namespace is illustrated. It should be noted that for simplicity, creation, modification and deletion of files are described in FIGS. 3 and 4, however, folders, block maps, and attributes may also be processed in a similar manner.

FIG. 3 illustrates an embodiment including creation and deletion operation types. A namespace entry “entry 10” corresponding to creation of a file is illustrated in FIG. 3. For instance, a client device 120 may create a file and the creation operation may be captured in snapshot #4. The data backup system 110 on receiving and analyzing the snapshot #4 identifies the new file created in the snapshot #4. In response, the data backup system 110 causes the indexing system 112 to generate a new namespace entry associated with the new file. In a batch, the new namespace entry may first be stored in a memory before being flushed to an object. The data backup system 110 may also create one or more checksums for one or more data blocks corresponding to the new file and store the checksum as a deduplication index 142 in the distributed database 140.

After the new namespace entry is flushed to an object, the object storage 130 will include a new create namespace entry “entry 10”. In one embodiment, the entry 10 may be a key-value pair with a key that is formed by a file identifier, such as doc1, version number, and operation type. In one embodiment, the file identifier may include a group part and a term part, but for illustration purpose the file identifier in FIG. 3 is simply shown as doc1. Also, for simplicity, the value of the entry as a key-value pair is not shown in FIG. 3. The entry 10 includes a version number that has a value “4” which indicates that creation of the file corresponding to the entry 10 is captured in the snapshot #4. The entry 10 also includes an operation type “create” indicating the operation type captured in the snapshot #4.

A namespace entry “entry 12” corresponding to deletion of a file is further illustrated in FIG. 3. For instance, a client device 120 may delete the file captured in snapshot #4. The deletion operation may be captured in snapshot #13. The data backup system 110 on receiving and analyzing the snapshot #13 identifies the deletion of the file. In response, the data backup system 110 causes the indexing system 112 to generate a new namespace entry associated with the deletion of the file. In a batch, the new namespace entry may first be stored in a memory before being flushed to an object.

After the new namespace entry is flushed to an object, the object storage 130 will include a new create namespace entry “entry 12”. In one embodiment, the entry 12 may be a key-value pair with a key that is formed by a file identifier, such as doc1, version number, and operation type. In one embodiment, the file identifier may include a group part and a term part, but for illustration purpose the file identifier in FIG. 3 is simply shown as doc1. Also, for simplicity, the value of the entry as a key-value pair is not shown in FIG. 3. The entry 12 includes a version number that has a value “13” which indicates that deletion of the file corresponding to the entry 12 is captured in the snapshot #13. The entry 12 also includes an operation type “delete” indicating the operation type captured in the snapshot #13. In the embodiment illustrated in FIG. 3, entry 10 corresponds to the first namespace entry and entry 12 corresponds to the second namespace entry. Further, ver=4 corresponds to first n version number and ver=13 corresponds to second version number.

As noted earlier, the plurality of namespace entries is sorted on the version numbers and one or more files for retrieval are identified by comparing a query version number to the sorted list of namespace entries. By way of example for the example embodiment illustrated in FIG. 3, the namespace entries entry 10 and 12 are sorted as below.

Parent=/, File name=doc1, ver=13, optype=Delete
Parent=/, File name=doc1, ver=4, optype=Create

In this example, the indexing system 112 is configured to identify snapshot version 4 for retrieval when the query version number is between 4 to 11 (e.g, if query number is 8). Further the indexing system 112 is configured to identify no snapshot for retrieval (i.e., file is not available) if the query version number is less than 4 or greater than or equal to 13. Furthermore, the indexing system 112 is configured to identify snapshot version 4 for retrieval when the query version number is 4.

FIG. 4 illustrates an embodiment including creation, modification and deletion operation types. As shown in FIG. 4, a namespace entry “entry 10” is recorded corresponding to the creation of the file in snapshot #4 similar to the example illustrated in FIG. 3. However, in this example, at snapshot #9 the file is modified by a client device 120. The modification operation may be captured in snapshot #9. The data backup system 110 on receiving and analyzing the snapshot #9 identifies the modification of the file. In response, the data backup system 110 causes the indexing system 112 to generate a new create namespace entry associated with the modification of the file. In a batch, the new namespace entry may first be stored in a memory before being flushed to an object.

After the new namespace entry is flushed to an object, the object storage 130 will include a new create namespace entry “entry 11”. The entry 11 includes a version number that has a value “9” which indicates that creation of the file corresponding to the entry 2 is captured in the snapshot #9. The entry 2 also includes an operation type “create” indicating the operation type captured in the snapshot #9. Thus, according to embodiments of the present description a new create namespace entry is generated when a file is modified. Referring again to FIG. 4, an entry 12 corresponding to a deletion operation captured in snapshot #13 is further generated and stored similar to FIG. 3. In the embodiment illustrated in FIG. 4, entry 10 corresponds to the first namespace entry, entry 11 corresponds to the second namespace entry, and entry 12 corresponds to the third namespace entry. Further, ver=4 corresponds to first version number, ver=9 corresponds to second version number, and ver=13 corresponds to third version number. It should be noted that although FIG. 4 illustrates an example with one modification, techniques of the present invention are equally applicable for two or more modifications.

As noted earlier, the plurality of namespace entries is sorted on the version numbers and one or more files for retrieval are identified by comparing a query version number to the sorted list of namespace entries. By way of example for the example embodiment illustrated in FIG. 4, the namespace entries 10, 11 and 12 are sorted as below.

Parent=/, File name=doc1, ver=13, optype=Delete
Parent=/, File name=doc1, ver=9, optype=Create
Parent=/, File name=doc1, ver=4, optype=Create

In this example, the indexing system 112 is configured to identify snapshot version 4 (i.e., a first generation of the file doc1) for retrieval when the query version number is 4 or between 4 to 8 (e.g, if query number is 7). Further, the indexing system 112 is configured to identify snapshot version 9 (i.e., a second generation of the file doc1) for retrieval when the query version number is 9 or between 9 to 13 (e.g, if query number is 11). Further the indexing system 112 is configured to identify no snapshot for retrieval (i.e., file is not available) if the query version number is less than 4 or greater than or equal to 13.

In some embodiments, each namespace entry of the plurality of namespace entries further includes a unique sequence number. A sequence number is a monotonically increasing counter. Each new S3Table is given a new sequence number. In such embodiments, the indexing system 112 is further configured to identify the one or more files based on a list of a plurality of namespace entries sorted on the unique sequence numbers if the version numbers for some of the name space entries in the list of the plurality of namespace entries are the same.

An example of a plurality of namespace entries sorted by sequence numbers is given below:

Parent=/folder, File name=doc1, ver=4, seq=78, optype=Create, size=2 MB
Parent=/folder, File name=doc1, ver=4, seq=54, optype=Delete
Parent=/folder, File name=doc1, ver=4, seq=22, optype=Create, size=1 MB

In the example illustrated above, a create namespace entry may be recorded at snapshot version 4 for a file doc1 having a size 1 MB. However, the data backup of file doc1 may be interrupted resulting in deletion of the file from the source. In such an instance, a delete namespace entry may be recorded also at snapshot version 4. Moreover, later the same file doc1 may be created with a different file attribute (size=2 MB) and a second create namespace entry may be recorded at snapshot version 4. In such an instance, the namespace entries may be further sorted in decreasing order based on the sequence numbers. Therefore, in the above example, a snapshot version to be retrieved may be identified based on the sorted list of sequence numbers.

The sequence numbers may be used for file retrieval as shown by the example below.

Parent=/folder, File name=doc1, ver=9, seq=92, optype=Delete
Parent=/folder, File name=doc1, ver=4, seq=78, optype=Create, size=2MB
Parent=/folder, File name=doc1, ver=4, seq=54, optype=Delete
Parent=/folder, File name=doc1, ver=4, seq=22, optype=Create, size=1 MB

In the above example, the indexing system 112 is configured to identify snapshot version 4 at sequence 78 for retrieval when the query version number is 4 or between 4 to 8 (e.g, if query number is 7). Further the indexing system 112 is configured to identify no snapshot for retrieval (i.e., file is not available) if the query version number is less than 4 or greater than or equal to 13.

As mentioned earlier, although FIGS. 3 and 4 are described in the context of file operation, the systems and methods of the present description may be applicable for other metadata indexes as well.

By way of example, a sorted list of block map namespace entries is given below:

Path=/folder/doc1, offset=0, ver=20, seq=20, optype=Create, size=1 MB
Path=/folder/doc1, offset=0, ver=16, seq=10, optype=Create, size=1 MB
Path=/folder/doc1, offset=1 MB, ver=16, seq=10, optype=Create, size=1 MB

In the example above, file doc1 has two blocks that were created in the snapshot version #16 i.e., first block at OMB and second block at 1 MB offset. In a subsequent snapshot version#20, only the initial block of the file was modified. Hence a new create namespace entry was added for offset=0.

Similarly, an example of a plurality of attribute-based namespace entries is given below. The attribute-based namespace entries may be used to store additional information about path, e.g., ACLs, user defined attributes etc.

Path=/folder/doc1, attribute=a1, ver=20, seq=20, optype=Create, value=“v11”
Path=/folder/doc1, attribute=a1, ver=16, seq=10, optype=Create, value=“v1”
Path=/folder/doc1, attribute=a2, ver=16, seq=10, optype=Create, value=“v2”

In the example above, the file doc1 has two attributes named a1 and a2. Attribute a1 was changed from “v1” to “v11” snapshot version #20. Attribute a2 remained unchanged. In the above example, the indexing system 112 is configured to identify attributes a1=v1 and a2=v2 if the query version number is between 16 to 19 (e.g, if query number is 17). Further the indexing system 112 is configured to identify attributes a1=v11 and a2=v2 for query version numbers greater than or equal to 20.

Referring again to FIG. 2, in some embodiments, the indexing system 112 is further configured to perform a merge operation or a compaction operation on the plurality of namespace entries to remove one or more obsolete namespace entries. In some embodiments, the indexing system 112 is further configured to perform a merge operation or a compaction operation on the plurality of namespace entries stored as S3Tables. The manner of merging the S3Tables, in some embodiments, is described in U.S. Pat. No. 1,125,667, patented on Feb. 22, 2022, entitled “Deduplicated merge indexed object storage file system” incorporated herein by reference in its entirety unless directly contradictory to the embodiments described herein.

Referring again to FIG. 2, the indexing system 112 further includes a memory 116 storing one or more processor-executable routines, and a processor 118. The processor 118 is further configured to execute the processor-executable routines to perform the steps illustrated in the flow-chart of FIG. 5.

FIG. 5 is a flowchart illustrating a method 300 for indexing metadata using an object-based storage. The method 300 may be implemented using the file system 200 of FIG. 2, according to some aspects of the present description. Each step of the method 300 is described in detail below.

The method 300 includes, at block 302, generating a plurality of namespace entries, wherein each namespace entry of the plurality of namespace entries includes an operation type conducted on a file data and/or file metadata captured in a particular snapshot and a version number corresponding to the particular snapshot. Examples of namespace entries are described herein earlier.

The method 300 further includes, at block 304, storing the plurality of namespace entries as one or more objects in an object storage. As noted earlier, the plurality of namespace entries may be stored in a merge index database 134 as objects that are arranged in a plurality of hierarchical tables that may be referred to as S3Tables

The method 300 further includes, at block 306, identifying, in response to a search query, one or more files for retrieval from the object storage based on the plurality of namespace entries sorted on the version numbers. In some embodiments, the method 300 includes, at block 306, identifying the one or more files by comparing a version number in the search query with a version number preceding the query version number and a version number succeeding the query version number in the list of the plurality of namespace entries.

In some embodiments, the method 300 includes generating, responsive to a creation of the file data and/or file metadata captured in a first particular snapshot, a first namespace entry including the operation type create and a first version number corresponding to the first particular snapshot. The method 300 further includes generating, responsive to a deletion of the file data and/or file metadata captured in a second particular snapshot, a second namespace entry including the operation type delete and a second version number corresponding to the second snapshot; and locating the query version number in a sorted list of the first version number and the second version number.

In some embodiments, the method 300 includes generating, responsive to a creation of the file and/or file metadata captured in a first particular snapshot, a first namespace entry including the operation type create and a first version number corresponding to the first particular snapshot. The method 300 further includes generating, responsive to a modification of the file data and/or file metadata captured in a second particular snapshot, a second namespace entry including the operation type create and a second version number corresponding to the second particular snapshot; and locating the query version number in a sorted list of the first version number and the second version number. In some embodiments, the method 300 further includes generating, responsive to a deletion of the file data and/or file metadata captured in a third particular snapshot, a third namespace entry including the operation type delete and a third version number corresponding to the third particular snapshot; and locating the query version number in a sorted list of the first version number, the second version number, and the third version number.

In some embodiments, each namespace entry of the plurality of namespace entries further includes a unique sequence number, and the indexing method includes identifying the one or more files based on a list of a plurality of namespace entries sorted on the unique sequence numbers if the version numbers for some of the name space entries in the list of the plurality of namespace entries are the same.

The systems and methods described herein may be partially or fully implemented by a special purpose computer system created by configuring a general-purpose computer to execute one or more particular functions embodied in computer programs. The functional blocks and flowchart elements described above serve as software specifications, which may be translated into the computer programs by the routine work of a skilled technician or programmer.

The computer programs include processor-executable instructions that are stored on at least one non-transitory computer-readable medium, such that when run on a computing device, cause the computing device to perform any one of the aforementioned methods. The medium also includes, alone or in combination with the program instructions, data files, data structures, and the like. Non-limiting examples of the non-transitory computer-readable medium include, but are not limited to, rewriteable non-volatile memory devices (including, for example, flash memory devices, erasable programmable read-only memory devices, or a mask read-only memory devices), volatile memory devices (including, for example, static random access memory devices or a dynamic random access memory devices), magnetic storage media (including, for example, an analog or digital magnetic tape or a hard disk drive), and optical storage media (including, for example, a CD, a DVD, or a Blu-ray Disc). Examples of the media with a built-in rewriteable non-volatile memory, include but are not limited to memory cards, and media with a built-in ROM, including but not limited to ROM cassettes, etc. Program instructions include both machine codes, such as produced by a compiler, and higher-level codes that may be executed by the computer using an interpreter. The described hardware devices may be configured to execute one or more software modules to perform the operations of the above-described example embodiments of the description, or vice versa.

Non-limiting examples of computing devices include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), a programmable logic unit (PLU), a microprocessor or any device which may execute instructions and respond. A central processing unit may implement an operating system (OS) or one or more software applications running on the OS. Further, the processing unit may access, store, manipulate, process and generate data in response to the execution of software. It will be understood by those skilled in the art that although a single processing unit may be illustrated for convenience of understanding, the processing unit may include a plurality of processing elements and/or a plurality of types of processing elements. For example, the central processing unit may include a plurality of processors or one processor and one controller. Also, the processing unit may have a different processing configuration, such as a parallel processor.

The computer programs may also include or rely on stored data. The computer programs may encompass a basic input/output system (BIOS) that interacts with hardware of the special purpose computer, device drivers that interact with particular devices of the special purpose computer, one or more operating systems, user applications, background services, background applications, etc.

The computer programs may include: (i) descriptive text to be parsed, such as HTML (hypertext markup language) or XML (extensible markup language), (ii) assembly code, (iii) object code generated from source code by a compiler, (iv) source code for execution by an interpreter, (v) source code for compilation and execution by a just-in-time compiler, etc. As examples only, source code may be written using syntax from languages including C, C++, C#, Objective-C, Haskell, Go, SQL, R, Lisp, Java®, Fortran, Perl, Pascal, Curl, OCaml, Javascript®, HTML5, Ada, ASP (active server pages), PHP, Scala, Eiffel, Smalltalk, Erlang, Ruby, Flash®, Visual Basic®, Lua, and Python®.

One example of a computing system 400 is described below in FIG. 6. The computing system 400 includes one or more processor 402, one or more computer-readable RAMs 404 and one or more computer-readable ROMs 406 on one or more buses 408. Further, the computer system 408 includes a tangible storage device 410 that may be used to execute operating systems 420 and file system 200. Both, the operating system 420 and the file system 200 are executed by processor 402 via one or more respective RAMs 404 (which typically includes cache memory). The execution of the operating system 420 and/or file system 200 by the processor 402, configures the processor 402 as a special-purpose processor configured to carry out the functionalities of the operation system 420 and/or the file system 200, as described above.

Examples of storage devices 410 include semiconductor storage devices such as ROM 406, EPROM, flash memory or any other computer-readable tangible storage device that may store a computer program and digital information.

Computing system 400 also includes a R/W drive or interface 412 to read from and write to one or more portable computer-readable tangible storage devices 426 such as a CD-ROM, DVD, memory stick or semiconductor storage device. Further, network adapters or interfaces 412 such as a TCP/IP adapter cards, wireless Wi-Fi interface cards, or 3G or 4G wireless interface cards or other wired or wireless communication links are also included in the computing system 400.

In one example embodiment, the file system 200 may be stored in tangible storage device 410 and may be downloaded from an external computer via a network (for example, the Internet, a local area network or another wide area network) and network adapter or interface 414.

Computing system 400 further includes device drivers 416 to interface with input and output devices. The input and output devices may include a computer display monitor 418, a keyboard 422, a keypad, a touch screen, a computer mouse 424, and/or some other suitable input device.

In this description, including the definitions mentioned earlier, the term ‘module’ may be replaced with the term ‘circuit.’ The term ‘module’ may refer to, be part of, or include processor hardware (shared, dedicated, or group) that executes code and memory hardware (shared, dedicated, or group) that stores code executed by the processor hardware. The term code, as used above, may include software, firmware, and/or microcode, and may refer to programs, routines, functions, classes, data structures, and/or objects.

Shared processor hardware encompasses a single microprocessor that executes some or all code from multiple modules. Group processor hardware encompasses a microprocessor that, in combination with additional microprocessors, executes some or all code from one or more modules. References to multiple microprocessors encompass multiple microprocessors on discrete dies, multiple microprocessors on a single die, multiple cores of a single microprocessor, multiple threads of a single microprocessor, or a combination of the above. Shared memory hardware encompasses a single memory device that stores some or all code from multiple modules. Group memory hardware encompasses a memory device that, in combination with other memory devices, stores some or all code from one or more modules.

In some embodiments, the module may include one or more interface circuits. In some examples, the interface circuits may include wired or wireless interfaces that are connected to a local area network (LAN), the Internet, a wide area network (WAN), or combinations thereof. The functionality of any given module of the present description may be distributed among multiple modules that are connected via interface circuits. For example, multiple modules may allow load balancing. In a further example, a server (also known as remote, or cloud) module may accomplish some functionality on behalf of a client module.

While only certain features of several embodiments have been illustrated and described herein, many modifications and changes will occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the scope of the invention and the appended claims.

Claims

1. A file system, comprising:

an object storage configured to store file data for one or more files and a plurality of namespace entries corresponding to file data and/or metadata of the one or more files as one or more objects;
an indexing system configured to: generate the plurality of namespace entries, wherein each namespace entry of the plurality of namespace entries comprises an operation type conducted on the file data and/or metadata captured in a particular snapshot and a version number corresponding to the particular snapshot; store the plurality of namespace entries as one or more objects in the object storage; and identify, in response to a search query, one or more files for retrieval from the object storage based on a list of the plurality of namespace entries sorted on the version numbers.

2. The file system of claim 1, wherein the indexing system is configured to identify the one or more files by comparing a query version number with a version number preceding the query version number and a version number succeeding the query version number in the list of the plurality of namespace entries.

3. The file system of claim 2, wherein the indexing system is configured to:

generate, responsive to a creation of the file data and/or file metadata captured in a first particular snapshot, a first namespace entry comprising the operation type create and a first version number corresponding to the first particular snapshot;
generate, responsive to a deletion of the file data and/or file metadata captured in a second particular snapshot, a second namespace entry comprising the operation type delete and a second version number corresponding to the second snapshot; and
locate the query version number in a sorted list of the first version number and the second version number.

4. The file system of claim 2, wherein the indexing system is configured to:

generate, responsive to a creation of the file and/or file metadata captured in a first particular snapshot, a first namespace entry comprising the operation type create and a first version number corresponding to the first particular snapshot;
generate, responsive to a modification of the file data and/or file metadata captured in a second particular snapshot, a second namespace entry comprising the operation type create and a second version number corresponding to the second particular snapshot; and
locate the query version number in a sorted list of the first version number and the second version number.

5. The file system of claim 4, wherein the indexing system is further configured to:

generate, responsive to a deletion of the file data and/or file metadata captured in a third particular snapshot, a third namespace entry comprising the operation type delete and a third version number corresponding to the third particular snapshot; and
locate the query version number in a sorted list of the first version number, the second version number, and the third version number.

6. The file system of claim 1, wherein each namespace entry of the plurality of namespace entries further comprises a unique sequence number, and the indexing system is further configured to identify the one or more files based on a list of a plurality of namespace entries sorted on the unique sequence numbers if the version numbers for some of the name space entries in the list of the plurality of namespace entries are the same.

7. The file system of claim 1, wherein the indexing system is further configured to perform a merge operation or a compaction operation on the plurality of namespace entries.

8. The file system of claim 1, further comprising a distributed database configured to store a plurality of deduplication indices corresponding to the file data, and a deduplication indexing module configured to generate the plurality of deduplication indices.

9. An indexing system, comprising:

a memory storing one or more processor-executable routines; and
a processor communicatively coupled to the memory, the processor configured to execute the one or more processor-executable routines to: generate a plurality of namespace entries, wherein each namespace entry of the plurality of namespace entries comprises an operation type conducted on a file data and/or file metadata captured in a particular snapshot and a version number corresponding to the particular snapshot; store the plurality of namespace entries as one or more objects in an object storage; and identify, in response to a search query, one or more files for retrieval from the object storage based on the plurality of namespace entries sorted on the version numbers.

10. The indexing system of claim 9, wherein the processor is configured to execute the one or more processor-executable routines to identify the one or more files by comparing a version number in the search query with a version number preceding the query version number and a version number succeeding the query version number in the list of the plurality of namespace entries.

11. The indexing system of claim 10, wherein the processor is configured to execute the one or more processor-executable routines to:

generate, responsive to a creation of the file data and/or file metadata captured in a first particular snapshot, a first namespace entry comprising the operation type create and a first version number corresponding to the first particular snapshot;
generate, responsive to a deletion of the file data and/or file metadata captured in a second particular snapshot, a second namespace entry comprising the operation type delete and a second version number corresponding to the second snapshot; and
locate the query version number in a sorted list of the first version number and the second version number.

12. The indexing system of claim 10, wherein the processor is configured to execute the one or more processor-executable routines to:

generate, responsive to a creation of the file and/or file metadata captured in a first particular snapshot, a first namespace entry comprising the operation type create and a first version number corresponding to the first particular snapshot;
generate, responsive to a modification of the file data and/or file metadata captured in a second particular snapshot, a second namespace entry comprising the operation type create and a second version number corresponding to the second particular snapshot; and
identify the snapshot version by locating the query version number in a sorted list of the first version number and the second version number.

13. The indexing system of claim 12, wherein the processor is further configured to execute the one or more processor-executable routines to:

generate, responsive to a deletion of the file data and/or file metadata captured in a third particular snapshot, a third namespace entry comprising the operation type delete and a third version number corresponding to the third particular snapshot; and
identify the snapshot version by locating the query version number in a sorted list of the first version number, the second version number, and the third version number.

14. The indexing system of claim 9, wherein each namespace entry of the plurality of namespace entries further comprises a unique sequence number, and the processor is configured to execute the one or more processor-executable routines to identify the one or more files based on a list of a plurality of namespace entries sorted on the unique sequence numbers if the version numbers for some of the name space entries in the list of the plurality of namespace entries are the same.

15. An indexing method, comprising:

generating a plurality of namespace entries, wherein each namespace entry of the plurality of namespace entries comprises an operation type conducted on a file data and/or file metadata captured in a particular snapshot and a version number corresponding to the particular snapshot;
storing the plurality of namespace entries as one or more objects in an object storage; and
identifying, in response to a search query, one or more files for retrieval from the object storage based on the plurality of namespace entries sorted on the version numbers.

16. The indexing method of claim 15, comprising identifying the one or more files by comparing a version number in the search query with a version number preceding the query version number and a version number succeeding the query version number in the list of the plurality of namespace entries.

17. The indexing method of claim 15, comprising:

generating, responsive to a creation of the file data and/or file metadata captured in a first particular snapshot, a first namespace entry comprising the operation type create and a first version number corresponding to the first particular snapshot;
generating, responsive to a deletion of the file data and/or file metadata captured in a second particular snapshot, a second namespace entry comprising the operation type delete and a second version number corresponding to the second snapshot; and
locating the query version number in a sorted list of the first version number and the second version number.

18. The indexing method of claim 15, comprising:

generating, responsive to a creation of the file and/or file metadata captured in a first particular snapshot, a first namespace entry comprising the operation type create and a first version number corresponding to the first particular snapshot;
generating, responsive to a modification of the file data and/or file metadata captured in a second particular snapshot, a second namespace entry comprising the operation type create and a second version number corresponding to the second particular snapshot; and
locating the query version number in a sorted list of the first version number and the second version number.

19. The indexing method of claim 18, further comprising:

generating, responsive to a deletion of the file data and/or file metadata captured in a third particular snapshot, a third namespace entry comprising the operation type delete and a third version number corresponding to the third particular snapshot; and
locating the query version number in a sorted list of the first version number, the second version number, and the third version number.

20. The indexing method of claim 15, wherein each namespace entry of the plurality of namespace entries further comprises a unique sequence number, and the indexing method comprises identifying the one or more files based on a list of a plurality of namespace entries sorted on the unique sequence numbers if the version numbers for some of the name space entries in the list of the plurality of namespace entries are the same.

Patent History
Publication number: 20230222165
Type: Application
Filed: Jan 11, 2023
Publication Date: Jul 13, 2023
Inventors: Milind Vithal BORATE (Pune), Somesh JAIN (Pune), Rohit SINGH (Ghaziabad), Shubham AGARWAL (Kota), Sanjay BHOSALE (Pune), Pallavi THAKUR (Pune), Srikiran GOTTIPATI (Vijayawada)
Application Number: 18/095,818
Classifications
International Classification: G06F 16/903 (20060101);