Abstract: An access request including a client address for data is received. A metadata server determines a mapping between the client address and storage unit identifiers for the data. Each of the one or more storage unit identifiers uniquely identifies content of a storage unit and the metadata server stores mappings on storage unit identifiers that are referenced by client addresses. The one or more storage unit identifiers are sent to one or more block servers. The one or more block servers service the request using the one or more storage unit identifiers where the one or more block servers store information on where a storage unit is stored on a block server for a storage unit identifier. Also, multiple client addresses associated with a storage unit with a same storage unit identifier are mapped to a single storage unit stored in a storage medium for a block server.
Abstract: In one embodiment, a method for removing unused storage units is provided. One or more storage units are referenced by multiple client addresses. The method includes constructing, on a metadata server, a filter on at least a portion of block identifiers that identify storage units currently being referenced by client addresses. The metadata server stores information on which storage unit identifiers are referenced by which client addresses. The filter is transmitted from the metadata server to a block server. The filter is used by the block server to test whether storage unit identifiers that exist on the block server are present in the filter. The block server stores information on where a storage unit is stored on the block server for a storage unit identifier. Storage unit identifiers not present in the filter and associated storage units are deleted from the block server.