Abstract: A device that generates suggested command completions for a distributed storage system is described. In an exemplary embodiment, the device receives a command token from a management client, wherein the command token is a partial command for the distributed storage system. In addition, the device retrieves a plurality of complete commands for the distributed storage system; wherein one of the plurality of complete commands includes a parameter based on a current configuration of the distributed storage system. The device further determines a subset of the plurality of complete commands that match the command token. The device sends the subset of the plurality of complete commands to the management client.
Abstract: A distributed multi-layer cache in a distributed storage system is described, where the storage controller functions of the distributed storage system are separated from that of distributed storage system storage media. In an exemplary embodiment, a storage controller server determines if an object is in a cache that stores a plurality of objects. In addition, the distributed storage system includes the cache and a distributed object layer for persistently storing the plurality of objects. The cache further includes a de-duplicated cache layer. The storage controller server accesses the object from the cache if the object is in the cache and accesses the object from the distributed object layer if the object is not in the cache.
Abstract: A distributed storage system that dispatches an input/output request is described. In an exemplary embodiment, a storage controller client receives the input/output request, wherein the distributed storage system includes the storage controller client, a plurality of storage controller servers, a plurality of virtual nodes distributed among a plurality of physical nodes, and each of the plurality of physical nodes is hosted on one of the plurality of storage controller servers. The storage controller client further computes a target virtual node for the input/output request, where the target virtual node is one of the plurality of virtual nodes. Using the computed target virtual node, the storage controller client determines a target physical node that corresponds to the target virtual node, where the target physical node is one of the plurality of physical nodes.
Abstract: A distributed garbage collection in a distributed storage system is described, where the storage controller functions of the distributed storage system are separated from that of distributed storage system storage media. In an exemplary embodiment, a storage controller server generates a live object map of live objects stored on the distributed storage system in a plurality of block segments distributed across a plurality of storage controller servers. The storage controller server further scans the plurality of block segments to generate segment summary statistics, where the segment summary statistics indicates the number of live objects stored in the plurality of block segments. In addition, the storage controller server compacts each of the plurality of block segments that have a low utilization based on the segment summary statistics. Furthermore, the live object map is a probabilistic data structure storing a list of valid objects.
Abstract: A distributed storage system that performs automated load balancing is described. In an exemplary embodiment, a storage controller server determines if there is duplicative data in a distributed storage system. In this embodiment, the storage controller server detects a load balancing event in the distributed storage system, where the distributed storage system includes a plurality of virtual nodes distributed across a plurality of physical nodes. In response to detecting the load balancing event, the storage controller server determines that a current virtual node is to move from a source physical node to a destination physical node. In addition, the current virtual node is one of the plurality of virtual nodes and the source and destination physical nodes are in the plurality of physical nodes. The storage controller server further moves the current virtual node from the source physical node to the destination physical node.