File deduplication using storage tiers

- F5 Networks, Inc.

A method and apparatus for removing duplicated data in a file system utilizing the concept of storage tiers. A synthetic namespace is created via file virtualization, and is comprised of one or more file systems. Deduplication is applied at the namespace level and on all of the file systems comprising the synthetic namespace. All files in a file system in a higher storage tier whose contents are identical to at least one other file in the synthetic namespace are moved to a destination file system in a lower storage tier. For each set of duplicated files that are moved from the original servers, a single instance copy of the file is left behind as a mirror copy. Read access to a duplicated file is redirected to its mirror copy. When the first write to a duplicated file is received, the association from the duplicated file stored in the destination server to its mirror copy that is stored in the origin server is discarded. Access to the “modified” duplicated file will then resume normally from the destination server.

Skip to: Description  ·  Claims  ·  References Cited  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application claims priority from U.S. Provisional Patent Application No. 60/987,181 entitled FILE DEDUPLICATION USING STORAGE TIERS filed on Nov. 12, 2007.

This patent application also may be related to one or more of the following patent applications:

U.S. Provisional Patent Application No. 60/923,765 entitled NETWORK FILE MANAGEMENT SYSTEMS, APPARATUS, AND METHODS filed on Apr. 16, 2007.

U.S. Provisional Patent Application No. 60/940,104 entitled REMOTE FILE VIRTUALIZATION filed on May 25, 2007.

U.S. Provisional Patent Application No. 60/987,161 entitled REMOTE FILE VIRTUALIZATION METADATA MIRRORING filed Nov. 12, 2007.

U.S. Provisional Patent Application No. 60/987,165 entitled REMOTE FILE VIRTUALIZATION DATA MIRRORING filed Nov. 12, 2007.

U.S. Provisional Patent Application No. 60/987,170 entitled REMOTE FILE VIRTUALIZATION WITH NO EDGE SERVERS filed Nov. 12, 2007.

U.S. Provisional Patent Application No. 60/987,174 entitled LOAD SHARING CLUSTER FILE SYSTEM filed Nov. 12, 2007.

U.S. Provisional Patent Application No. 60/987,206 entitled NON-DISRUPTIVE FILE MIGRATION filed Nov. 12, 2007.

U.S. Provisional Patent Application No. 60/987,197 entitled HOTSPOT MITIGATION IN LOAD SHARING CLUSTER FILE SYSTEMS filed Nov. 12, 2007.

U.S. Provisional Patent Application No. 60/987,194 entitled ON DEMAND FILE VIRTUALIZATION FOR SERVER CONFIGURATION MANAGEMENT WITH LIMITED INTERRUPTION filed Nov. 12, 2007.

U.S. patent application Ser. No. 12/104,197 entitled FILE AGGREGATION IN A SWITCHED FILE SYSTEM filed Apr. 16, 2008.

U.S. patent application Ser. No. 12/103,989 entitled FILE AGGREGATION IN A SWITCHED FILE SYSTEM filed Apr. 16, 2008.

U.S. patent application Ser. No. 12/126,129 entitled REMOTE FILE VIRTUALIZATION IN A SWITCHED FILE SYSTEM filed May 23, 2008.

All of the above-referenced patent applications are hereby incorporated herein by reference in their entireties.

FIELD OF THE INVENTION

This invention relates generally to storage networks with two or more tiers of storage servers, and more specifically, relates to a more efficient way of storing files that have identical contents in a storage network.

BACKGROUND

In enterprises today, employees tend to keep copies of all of the necessary documents and data that they access often. This is so that they can find the documents and data easily (central locations tend to change at least every so often). Furthermore, employees also tend to forget where certain things were found (in the central location), or never even knew where the document originated (they are sent a copy of the document via email). Finally, multiple employees may each keep a copy of the latest mp3 file, or video file, even if it is against company policy.

This can lead to duplicate copies of the same document or data residing in individually owned locations, so that the individual's themselves can easily find the document. However, this also means a lot of wasted space to store all of these copies of the document or data. And these copies are often stored on more expensive (and higher performance) tiers of storage, since the employees tend not to focus on costs, but rather on performance (they will store data on the location that they can most easily remember that gives them the best performance in retrieving the data).

Deduplication is a technique where files with identical contents are first identified and then only one copy of the identical contents, the single-instance copy, is kept in the physical storage while the storage space for the remaining identical contents is reclaimed and reused. Files whose contents have been deduped because of identical contents are hereafter referred to as deduplicated files. Thus, deduplication achieves what is called “Single-Instance Storage” where only the single-instance copy is stored in the physical storage, resulting in more efficient use of the physical storage space. File deduplication thus creates a domino effect of efficiency, reducing capital, administrative, and facility costs and is considered one of the most important and valuable technologies in storage.

U.S. Pat. Nos. 6,389,433 and 6,477,544 are examples of how a file system provides the single-instance-storage.

While single-instance-storage is conceptually simple, implementing it without sacrificing read/write performance is difficult. Files are deduped without the owners being aware of it. The owners of deduplicated files therefore have the same performance expectation as other files that have no duplicated copies. Since many deduplicated files are sharing one single-instance copy of the contents, it is important to prevent the single-instance copy from being modified. Typically, a file system uses the copy-on-write technique to protect the single-instance copy. When an update is pending on a deduplicated file, the file system creates a partial or full copy of the single-instance copy, and the update is allowed to proceed only after the (partial) copied data has been created and only on the copied data. The delay to wait for the creation of a (partial) copy of the single-instance data before an update can proceed introduces significant performance degradation. In addition, the process to identify and dedupe replicated files also puts a strain on file system resources. Because of the performance degradation, deduplication or single-instance copy is deemed not acceptable for normal use. In reality, deduplication is of no (obvious) benefit to the end-user. Thus, while the feature of deduplication or single-instance storage has been available in a few file systems, it is not commonly used and many file systems do not even offer this feature due to its adverse performance impact.

File system level deduplication offers many advantages for the IT administrators. However, it generally offers no direct benefits to the users of the file system other than performance degradation for those files that have been deduped. Therefore, the success of deduplication in the market place depends on reducing performance degradation to an acceptable level.

Another aspect of the file system level deduplication is that deduplication is usually done on a per file system basis. It is more desirable if deduplication is done together on one or more file systems. For example, the more file systems that are deduped together, the more chances that files with identical contents will be found and more storage space will be reclaimed. For example, if there is only one copy of file A in a file system, file A will not be deduped. On the other hand, if there is a copy of file A in another file system, then together, file A in the two file systems can be deduped. Furthermore, since there is only one single-instance copy for all of the deduplicated files from one or more file systems, the more file systems that are deduped together, the more efficient the deduplication process becomes.

SUMMARY

Thus, it is desirable to achieve deduplication with acceptable performance. It is even more desirable to be able to dedupe across more file systems to achieve more deduplication efficiency.

In accordance with one aspect of the invention there are provided a method and an apparatus for deduplicating files in a file storage system having a primary storage tier and a secondary storage tier. In such embodiments, file deduplication involves identifying a plurality of files stored in the primary storage tier having identical file contents; copying the plurality of files to the secondary storage tier; storing in the primary storage tier a single copy of the file contents; and storing metadata for each of the plurality of files, the metadata associating each of the file copies in the secondary storage tier with the single copy of the file contents stored in the primary storage tier.

In various alternative embodiments, identifying the plurality of files stored in the primary storage tier having identical file contents may involve computing, for each of the plurality of files, a hash value based on the contents of the file; and identifying the files having identical file contents based on the hash values. Storing the single copy of the file contents in the primary storage tier may involve copying the file contents to a designated mirror server; and deleting the remaining file contents from each of the plurality of files in the primary storage tier. Upon a read access to one of the plurality of files, the read access may be directed to the single copy of the file contents maintained in the primary storage tier. Upon a write access to one of the plurality of files, the association between the file copy in the secondary storage tier and the single copy of the file contents stored in the primary storage tier may be broken the file copy stored in the secondary storage tier may be modified. The modified file copy subsequently may be migrated from the secondary storage tier to the primary storage tier based on a migration policy.

In other embodiments, deduplicating a selected file in the primary storage tier may involve determining whether the file contents of the selected file match the file contents of a previously deduplicated file having a single copy of file contents stored in the primary storage tier; when the file contents of the selected file match the file contents of a previously deduplicated file, deduplicating the selected file; otherwise determining whether the file contents of the selected file match the file contents of a non-duplicate file in the first storage tier; and when the file contents of the selected file match the file contents of a non-duplicate file, deduplicating both the selected file and the non-duplicate file. Determining whether the file contents of the selected file match the file contents of a previously deduplicated file may involve comparing a hash value associated with the selected file to a distinct hash value associated with each single copy of file contents stored in the primary storage tier. Deduplicating the selected file may involve copying the selected file to the secondary storage tier; deleting the file contents from the selected file; and storing metadata for the selected file, the metadata associating the file copy in the secondary storage tier with the single copy of the file contents for the previously deduplicated file stored in the primary storage tier. Deduplicating both the selected file and the non-duplicate file may involve copying the selected file and the non-duplicate file to the secondary storage tier; storing in the primary storage tier a single copy of the file contents; and storing metadata for each of the first and second selected files, the metadata associating each of the file copies in the secondary storage tier with the single copy of the file contents stored in the primary storage tier. Storing the single copy of the file contents for deduplicating both the selected file and the non-duplicate file may involve copying the file contents to the designated mirror server; and deleting the remaining file contents from the selected file and the non-duplicate file. Determining whether the file contents of the selected file match the file contents of a non-duplicate file in the primary storage tier may involve maintaining a list of non-duplicate files in the primary storage tier, the list including a distinct hash value for each non-duplicate file; and comparing a hash value associated with the selected file to the hash values associated with the non-duplicate files in the list, and when the file contents of the selected file do not match the file contents of any non-duplicate file, may involve adding the selected file to the list of non-duplicate files (e.g., by storing a pathname and a hash value associated with the selected file). Deduplicating both the selected file and the non-duplicate file may further involve removing the non-duplicate file from the list of non-duplicate files.

Deduplication may be implemented in a file switch or other device that manages file storage.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing features of the invention will be more readily understood by reference to the following detailed description, taken with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram showing an exemplary switched file system including a file switch (MFM) as known in the art;

FIG. 2 is a logic flow diagram for file deduplication using storage tiers in accordance with an exemplary embodiment of the present invention;

FIG. 3 is a logic flow diagram deduplicating a selected file in accordance with an exemplary embodiment of the present invention.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

This patent application relates generally to a method for performing deduplication on a global namespace using file virtualization when the global namespace is constructed from one or more storage servers, and to enable deduplication as a storage placement policy in a tiered storage environment.

A traditional file system manages the storage space by providing a hierarchical namespace. The hierarchical namespace starts from the root directory, which contains files and subdirectories. Each directory may also contain files and subdirectories identifying other files or subdirectories. Data is stored in files. Every file and directory is identified by a name. The full name of a file or directory is constructed by concatenating the name of the root directory and the names of each subdirectory that finally leads to the subdirectory containing the identified file or directory, together with the name of the file or the directory.

The full name of a file thus carries with it two pieces of information: (1) the identification of the file and (2) the physical storage location where the file is stored. If the physical storage location of a file is changed (for example, moved from one partition mounted on a system to another), the identification of the file changes as well.

For ease of management, as well as for a variety of other reasons, the administrator would like to control the physical storage location of a file. For example, important files might be stored on expensive, high-performance file servers, while less important files could be stored on less expensive and less capable file servers.

Unfortunately, moving files from one server to another usually changes the full name of the files and thus, their identification, as well. This is usually a very disruptive process, since after the move users may not be able to remember the new location of their files. Thus, it is desirable to separate the physical storage location of a file from its identification. With this separation, IT and system administrators will be able to control the physical storage location of a file while preserving what the user perceives as the location of the file (and thus its identity).

File virtualization is a technology that separates the full name of a file from its physical storage location. File virtualization is usually implemented as a hardware appliance that is located in the data path between users and the file servers. For users, a file virtualization appliance appears as a file server that exports the namespace of a file system. From the file servers' perspective, the file virtualization appliance appears as just a normal user. Attune System's Maestro File Manager (MFM) is an example of a file virtualization appliance. FIG. 1 is a schematic diagram showing an exemplary switched file system including a file switch (MFM).

As a result of separating the full name of a file from the file's physical storage location, file virtualization provides the following capabilities:

1) Creation of a Synthetic Namespace

    • Once a file is virtualized, the full filename does not provide any information about where the file is actually stored. This leads to the creation of synthetic directories where the files in a single synthetic directory may be stored on different file servers. A synthetic namespace can also be created where the directories in the synthetic namespace may contain files or directories from a number of different file servers. Thus, file virtualization allows the creation of a single global namespace from a number of cooperating file servers. The synthetic namespace is not restricted to be from one file server, or one file system.

2) Allows Having Many Full Filenames to Refer to a Single File

    • As a consequence of separating a file's name from the file's storage location, file virtualization also allows multiple full filenames to refer to a single file. This is important as it allows existing users to use the old filename while allowing new users to use a new name to access the same file.

3) Allows Having One Full Name to Refer to Many Files

    • Another consequence of separating a file's name from the file's storage location is that one filename may refer to many files. Files that are identified by a single filename need not contain identical contents. If the files do contain identical contents, then one file is usually designated as the authoritative copy, while the other copies are called the mirror copies. Mirror copies increase the availability of the authoritative copy, since even if the file server containing the authoritative copy of a file is down, one of the mirror copies may be designated as a new authoritative copy and normal file access can then resumed. On the other hand, the contents of a file identified by a single name may change according to the identity of the user who wants to access the file.

Deduplication is of no obvious benefit to the end users of a file system. Instead of using deduplication as a management policy to reduce storage space and subsequently cause inconvenience to the end users of the deduplicated files, this invention uses deduplication as a storage placement policy to intelligently managed the storage assets of an enterprise, with relatively little inconvenience to the end users.

In embodiments of the present invention, a set of file servers is designated as tier 1 where data stored in these file servers is considered more important to the enterprise. Another (typically non-overlapping) set of file servers is designated as tier 2 storage where data stored in these file servers is considered less important to the business. By using these two storage tiers to identify data important to the business, the system administrators can spend more time and resources to provide faster access and more frequent backup on the data stored on the tier 1 file servers.

Deduplication typically is treated as one of the storage placement policies that decides where data should be stored, e.g., on a tier 1 or tier 2 file server.

In embodiments of the present invention, duplicated data is automatically moved from tier 1 to tier 2. The total storage space used by the deduplicated data on tier 1 and tier 2 remains the same (or perhaps even increases slightly). However, there is more storage space available on tier 1 file servers as a result of deduplication, since all the duplicated data is now stored on tier 2.

There may be performance differences between tier 1 and tier 2 file servers. However, these differences tend to be small since the relatively inexpensive file servers are still very capable. To maintain the same level of performance when accessing the deduplicated files, as each set of duplicated files is moved from the tier 1 file servers, a single instance copy of the file is left behind as a mirror copy. One of the tier 1 file servers is designated as a mirror server where all of the mirror copies are stored. Read access to a deduplicated file is redirected to the deduplicated file's mirror copy. When the first write to a deduplicated file is received, the association from the deduplicated file stored in a tier 2 server to its mirror copy that is stored in a tier 1 server is discarded. Accesses to the “modified” duplicated file will then resume normally from the tier 2 file server. At a certain time, the “modified” deduplicated file is then migrated back to tier 1 storage.

Extending file virtualization to support deduplication is relatively straight forward. First, a set of tier-1 servers is identified as a target for deduplication, and a set of tier 2 servers is identified for receiving deduplicated data. One of the tier 1 file servers is chosen as the mirror server. The mirror server is used to store the mirror copy of each set of deduplicated files with identical contents.

A background deduplication process typically is run periodically within the file virtualization appliance to perform the deduplication. Exemplary embodiments use a sha1 digest computed from the contents of a file to identify files that have identical contents. A sha1 digest value is a 160-bit globally unique value for any given set of data (contents) of a file. Therefore, if two files are identical in contents (but not necessarily name or location), they should always have the same sha1 digest values. And conversely, if two files are different in contents, they should always have different sha1 digest values.

An exemplary deduplication process for the namespace is as follows:

    • 1) Each file stored in the tier 1 file servers that is idle is inspected. If the file has already been deduped, it is skipped.
    • 2) If the file does not have a sha1 digest value, it is computed and saved in the metadata for the file.
    • 3) A check is made if there is a mirror copy stored in the mirror server. If there is, the file is deduped, and this algorithm loops around again with the next file on the tier 1 file servers.
    • 4) The sha1 digest value and the path name of the file are then added to an internal list. If there is no existing entry in the internal list with an identical sha1 digest value, the entry is added and this algorithm loops around again with the next file on the tier 1 file servers.
    • 5) If there is already an entry in the list with the identical sha1 digest value, the current file, as well as the other file with the same sha1 digest value listed in the internal list, will both be individually deduped and the entry in the internal list is removed. This algorithm then loops around with the next file on the tier 1 file servers.
    • 6) The deduplicated process will continue until all the files in the tier 1 storage are processed.

It is possible that the sha1 digest value for a file marked for deduplication may have changed before it is actually deduped. This case should occur relatively infrequently. If it does occur, essentially the worst that can happen is that a file that really has no duplicate files in tier 1 gets deduplicated and migrated to tier 2. However, the deduplicated file eventually should be migrated back to the tier 1 storage tier.

An exemplary process to dedupe a single file (called from the deduplication process for the namespace) is as follows:

    • 1) A check is made to see if there is a mirror copy with an identical sha1 digest.
    • 2) If there is no mirror copy in the mirror server, a new mirror is made with the sha1 digest and the associated file's contents.
    • 3) If there already is a mirror copy, the file is migrated to a tier 2 file server according to the storage placement policy. The migrated file is marked as deduplicated, and a mirror association is created between the migrated file and its mirror copy.

When a non-deduplicated file that has a sha1 digest is opened for update, its sha1 digest is immediately cleared.

When a deduplicated file is opened for update, its sha1 digest is immediately cleared. The mirror association between the deduplicated copy and the mirror copy is immediately broken. The file is no longer a deduplicated file (its deduplicated flag is cleared), and an entry is added to a to-do list to migrate this file back to tier 1 storage in the future.

When a deduplicated file is open for read, a check is made to see if there is a mirror copy stored in the mirror server. If there is, subsequent read requests on the deduplicated file will be switched to the mirror server for processing. Otherwise, the read request is switched to the tier 2 file server containing the actual data of the deduplicated file.

FIG. 2 is a logic flow diagram for file deduplication using storage tiers in accordance with an exemplary embodiment of the present invention. In block 202, a deduplication device (e.g., a file switch) identifies a plurality of files stored in the primary storage tier having identical file contents. In block 204, the deduplication device copies the plurality of files to the secondary storage tier. In block 206, the deduplication device stores in the primary storage tier a single copy of the file contents. In block 208, the deduplication device stores metadata for each of the plurality of files, the metadata associating each of the file copies in the secondary storage tier with the single copy of the file contents stored in the primary storage tier.

FIG. 3 is a logic flow diagram deduplicating a selected file in the primary storage tier in accordance with an exemplary embodiment of the present invention. In block 302, the deduplication device, determines whether the file contents of the selected file match the file contents of a previously deduplicated file having a single copy of file contents stored in the primary storage tier. When the file contents of the selected file match the file contents of a previously deduplicated file (YES in block 304), then the deduplication device deduplicates the selected file in block 306, for example, by copying the selected file to the secondary storage tier, deleting the file contents from the selected file, and storing metadata for the selected file associating the file copy in the secondary storage tier with the single copy of the file contents for the previously deduplicated file stored in the primary storage tier. When the file contents of the selected file do not match the file contents of any previously deduplicated file (NO in block 304), then the deduplication device determines whether the file contents of the selected file match the file contents of a non-duplicate file in the first storage tier in block 308. When the file contents of the selected file match the file contents of a non-duplicate file (YES in block 310), then the deduplication device deduplicates both the selected file and the non-duplicate file, for example, by copying the selected file and the non-duplicate file to the secondary storage tier, storing in the primary storage tier a single copy of the file contents, and storing metadata for each of the first and second selected files associating each of the file copies in the secondary storage tier with the single copy of the file contents stored in the primary storage tier. When the file contents of the selected file do not match the file contents of any non-duplicate file (NO in block 310), then the deduplication device may add the selected file a list of non-duplicate files.

It should be noted that file deduplication as discussed herein may be implemented using a file switches of the types described above and in the provisional patent application Ser. No. 60/923,765. It should also be noted that embodiments of the present invention may incorporate, utilize, supplement, or be combined with various features described in one or more of the other referenced patent applications.

It should be noted that terms such as “client,” “server,” “switch,” and “node” may be used herein to describe devices that may be used in certain embodiments of the present invention and should not be construed to limit the present invention to any particular device type unless the context otherwise requires. Thus, a device may include, without limitation, a bridge, router, bridge-router (brouter), switch, node, server, computer, appliance, or other type of device. Such devices typically include one or more network interfaces for communicating over a communication network and a processor (e.g., a microprocessor with memory and other peripherals and/or application-specific hardware) configured accordingly to perform device functions. Communication networks generally may include public and/or private networks; may include local-area, wide-area, metropolitan-area, storage, and/or other types of networks; and may employ communication technologies including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies.

It should also be noted that devices may use communication protocols and messages (e.g., messages created, transmitted, received, stored, and/or processed by the device), and such messages may be conveyed by a communication network or medium. Unless the context otherwise requires, the present invention should not be construed as being limited to any particular communication message type, communication message format, or communication protocol. Thus, a communication message generally may include, without limitation, a frame, packet, datagram, user datagram, cell, or other type of communication message.

It should also be noted that logic flows may be described herein to demonstrate various aspects of the invention, and should not be construed to limit the present invention to any particular logic flow or logic implementation. The described logic may be partitioned into different logic blocks (e.g., programs, modules, functions, or subroutines) without changing the overall results or otherwise departing from the true scope of the invention. Often times, logic elements may be added, modified, omitted, performed in a different order, or implemented using different logic constructs (e.g., logic gates, looping primitives, conditional logic, and other logic constructs) without changing the overall results or otherwise departing from the true scope of the invention.

The present invention may be embodied in many different forms, including, but in no way limited to, computer program logic for use with a processor (e.g., a microprocessor, microcontroller, digital signal processor, or general purpose computer), programmable logic for use with a programmable logic device (e.g., a Field Programmable Gate Array (FPGA) or other PLD), discrete components, integrated circuitry (e.g., an Application Specific Integrated Circuit (ASIC)), or any other means including any combination thereof. In a typical embodiment of the present invention, predominantly all of the described logic is implemented as a set of computer program instructions that is converted into a computer executable form, stored as such in a computer readable medium, and executed by a microprocessor under the control of an operating system.

Computer program logic implementing all or part of the functionality previously described herein may be embodied in various forms, including, but in no way limited to, a source code form, a computer executable form, and various intermediate forms (e.g., forms generated by an assembler, compiler, linker, or locator). Source code may include a series of computer program instructions implemented in any of various programming languages (e.g., an object code, an assembly language, or a high-level language such as Fortran, C, C++, JAVA, or HTML) for use with various operating systems or operating environments. The source code may define and use various data structures and communication messages. The source code may be in a computer executable form (e.g., via an interpreter), or the source code may be converted (e.g., via a translator, assembler, or compiler) into a computer executable form.

The computer program may be fixed in any form (e.g., source code form, computer executable form, or an intermediate form) either permanently or transitorily in a tangible storage medium, such as a semiconductor memory device (e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM), a magnetic memory device (e.g., a diskette or fixed disk), an optical memory device (e.g., a CD-ROM), a PC card (e.g., PCMCIA card), or other memory device. The computer program may be fixed in any form in a signal that is transmittable to a computer using any of various communication technologies, including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies. The computer program may be distributed in any form as a removable storage medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the communication system (e.g., the Internet or World Wide Web).

Hardware logic (including programmable logic for use with a programmable logic device) implementing all or part of the functionality previously described herein may be designed using traditional manual methods, or may be designed, captured, simulated, or documented electronically using various tools, such as Computer Aided Design (CAD), a hardware description language (e.g., VHDL or AHDL), or a PLD programming language (e.g., PALASM, ABEL, or CUPL).

Programmable logic may be fixed either permanently or transitorily in a tangible storage medium, such as a semiconductor memory device (e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM), a magnetic memory device (e.g., a diskette or fixed disk), an optical memory device (e.g., a CD-ROM), or other memory device. The programmable logic may be fixed in a signal that is transmittable to a computer using any of various communication technologies, including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies (e.g., Bluetooth), networking technologies, and internetworking technologies. The programmable logic may be distributed as a removable storage medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the communication system (e.g., the Internet or World Wide Web).

The present invention may be embodied in other specific forms without departing from the true scope of the invention. Any references to the “invention” are intended to refer to exemplary embodiments of the invention and should not be construed to refer to all embodiments of the invention unless the context otherwise requires. The described embodiments are to be considered in all respects only as illustrative and not restrictive.

Claims

1. A method of deduplicating files, the method comprising:

accessing, with a file virtualization device, a virtualized environment including one or more primary storage servers operating as a primary storage tier and storing a first plurality of files and one or more secondary storage servers operating as a secondary storage tier and storing a second plurality of files comprising at least a plurality of files not included in the first plurality of files, wherein a global namespace is associated with the first and second pluralities of files stored in the one or more primary and secondary storage servers;
identifying, with the file virtualization device, a subset of the first plurality of files that are stored in the primary storage tier and have identical file contents and storing a copy of only the subset of files in the secondary storage tier;
storing, with the file virtualization device, a single copy of the contents of each of the subset of files in the primary storage tier and deleting all other files having identical file contents from the primary storage tier; and
storing, with the file virtualization device, metadata associating each of the copies of the subset of files stored in the secondary storage tier with a corresponding one of the single copies stored in the primary storage tier.

2. A method according to claim 1, wherein identifying the subset of files that are stored in the primary storage tier and have identical file contents comprises:

computing, for each of the plurality of files, a hash value based on contents of the file; and
identifying files having identical file contents based on a comparison of the hash values.

3. A method according to claim 1, wherein storing the single copy of the contents comprises copying the file contents to a designated mirror server of the primary storage tier.

4. A method according to claim 1, further comprising:

upon a read access to one of the plurality of files, directing, with the file virtualization device, the read access to the single copy of the contents stored in the primary storage tier.

5. A method according to claim 1, further comprising, upon a write access to one of the plurality of files:

breaking, with the file virtualization device, the association between the copy in the secondary storage tier and the corresponding single copy of the contents stored in the primary storage tier;
modifying, with the file virtualization device, the copy stored in the secondary storage tier; and
migrating, with the file virtualization device, the modified copy from the secondary storage tier to the primary storage tier based on a migration policy.

6. A method according to claim 1, further comprising deduplicating, with the file virtualization device, a selected file in the primary storage tier comprising:

determining whether contents of the selected file match contents of a previously deduplicated file having a corresponding single copy stored in the primary storage tier;
when the contents of the selected file match the contents of a previously deduplicated file, deduplicating the selected file;
otherwise determining whether the contents of the selected file match the contents of a non-duplicate file in the primary storage tier; and
when the contents of the selected file match the contents of a non-duplicate file, deduplicating both the selected file and the non-duplicate file.

7. A method according to claim 6, wherein determining whether the contents of the selected file match the contents of a non-duplicate file in the primary storage tier comprises:

maintaining a list of non-duplicate files in the primary storage tier, the list including a distinct hash value for each non-duplicate file;
comparing a hash value associated with the selected file to the hash values associated with the non-duplicate files in the list; and
when the contents of the selected file do not match the contents of any non-duplicate file, adding the selected file to the list of non-duplicate files.

8. A virtualization apparatus for deduplicating files, the apparatus comprising:

at least one communication interface for communicating with one or more primary and secondary storage servers; and
at least one of configurable hardware logic configured to be capable of implementing or a processor configured to execute program instructions stored in a memory comprising: accessing a virtualized environment including the one or more primary storage servers operating as a primary storage tier and storing a first plurality of files and the one or more secondary storage servers operating as a secondary storage tier and storing a second plurality of files comprising at least a plurality of files not included in the first plurality of files, wherein a global namespace is associated with the first and second pluralities of files stored in the one or more primary and secondary storage servers; identifying a subset of the accessed files that are stored in the primary storage tier and have identical file contents and storing a copy of only the subset of files in the secondary storage tier; storing a single copy of the contents of each of the subset of files in the primary storage tier and deleting all other files having identical file contents from the primary storage tier; and storing metadata associating each of the copies of the subset of files stored in the secondary storage tier with a corresponding one of the single copies stored in the primary storage tier.

9. An apparatus according to claim 8, wherein identifying the subset of files that are stored in the primary storage tier and have identical file contents further comprises:

computing, for each of the plurality of files, a hash value based on contents of the file; and
identifying files having identical contents based on a comparison of the hash values.

10. Apparatus according to claim 8, wherein storing the single copy of the contents further comprises copying the file contents to a designated mirror server of the primary storage tier.

11. An apparatus according to claim 8, wherein at least one of configurable hardware logic further configured to be capable of implementing or the processor is further configured to execute program instructions stored in a memory further comprising upon a read access to one of the plurality of files, directing the read access to the single copy of the contents stored in the primary storage tier.

12. An apparatus according to claim 8, wherein at least one of configurable hardware logic further configured to be capable of implementing or the processor is further configured to execute program instructions stored in a memory further comprising upon a write access to one of the plurality of files:

breaking the association between the copy in the secondary storage tier and the corresponding single copy of the contents stored in the primary storage tier;
modifying the copy stored in the secondary storage tier; and
migrating the modified copy from the secondary storage tier to the primary storage tier based on a migration policy.

13. An apparatus according to claim 8, wherein at least one of configurable hardware logic further configured to be capable of implementing or the processor is further configured to execute program instructions stored in a memory further comprising deduplicating a selected file in the primary storage tier comprising:

determining whether contents of the selected file match contents of a previously deduplicated file having a corresponding single copy stored in the primary storage tier;
when the contents of the selected file match the contents of a previously deduplicated file, deduplicating the selected file;
otherwise determining whether the contents of the selected file match the contents of a non-duplicate file in the primary storage tier; and
when the contents of the selected file match the contents of a non-duplicate file, deduplicating both the selected file and the non-duplicate file.

14. An apparatus according to claim 13, wherein determining whether the contents of the accessed file match the contents of a non-duplicate file in the primary storage tier further comprises:

maintaining a list of non-duplicate files in the primary storage tier, the list including a distinct hash value for each non-duplicate file;
comparing a hash value associated with the selected file to the hash values associated with the non-duplicate files in the list; and
when the contents of the selected file do not match the contents of any non-duplicate file, adding the selected file to the list of non-duplicate files.

15. A system that deduplicates files, the system comprising:

one or more primary storage servers operating as a primary storage tier and storing a first plurality of files and one or more secondary storage servers operating as a primary storage tier and storing a second plurality of files comprising at least a plurality of files not included in the first plurality of files, the storage servers storing the first and second pluralities of files in a virtualized environment, wherein a global namespace is associated with the first and second pluralities of files stored in the one or more primary and secondary storage servers;
a file virtualization device including at least one of configurable hardware logic configured to be capable of implementing or a processor configured to execute program instructions stored in a memory comprising: identifying a subset of the plurality of files that are stored in the primary storage tier and have identical file contents and storing a copy of only the subset of files in the secondary storage tier; storing a single copy of the contents of each of the subset of files in the primary storage tier and deleting all other files having identical file contents from the primary storage tier; and storing metadata associating each of the copies of the subset of files stored in the secondary storage tier with a corresponding one of the single copies stored in the primary storage tier.

16. A system according to claim 15, wherein identifying the subset of files that are stored in the primary storage tier and have identical file contents further comprises:

computing, for each of the plurality of files, a hash value based on contents of the file; and
identifying files having identical contents based on a comparison of the hash values.

17. A system according to claim 15, wherein storing the single copy of the contents further comprises copying the file contents to a designated mirror server of the primary storage tier.

18. A system according to claim 15, wherein at least one of configurable hardware logic further configured to be capable of implementing or the processor is further configured to execute program instructions stored in a memory further comprising upon a read access to one of the plurality of files, directing the read access to the single copy of the contents stored in the primary storage tier.

19. A system according to claim 15, wherein at least one of configurable hardware logic further configured to be capable of implementing or the processor is further configured to execute program instructions stored in a memory further comprising upon a write access to one of the plurality of files:

breaking the association between the copy in the secondary storage tier and the corresponding single copy of the contents stored in the primary storage tier;
modifying the copy stored in the secondary storage tier; and
migrating the modified copy from the secondary storage tier to the primary storage tier based on a migration policy.

20. A system according to claim 15, wherein at least one of configurable hardware logic further configured to be capable of implementing or the processor is further configured to execute program instructions stored in a memory further comprising deduplicating a selected file in the primary storage tier comprising:

determining whether contents of the selected file match contents of a previously deduplicated file having a corresponding single copy stored in the primary storage tier;
when the contents of the selected file match the contents of a previously deduplicated file, deduplicating the selected file;
otherwise determining whether the contents of the selected file match the contents of a non-duplicate file in the primary storage tier; and
when the contents of the selected file match the contents of a non-duplicate file, deduplicating both the selected file and the non-duplicate file.

21. A system according to claim 20, wherein determining whether the contents of the accessed file match the contents of a non-duplicate file in the primary storage tier further comprises:

maintaining a list of non-duplicate files in the primary storage tier, the list including a distinct hash value for each non-duplicate file;
comparing a hash value associated with the selected file to the hash values associated with the non-duplicate files in the list; and
when the contents of the selected file do not match the contents of any non-duplicate file, adding the selected file to the list of non-duplicate files.

22. A non-transitory computer readable medium having stored thereon instructions for deduplicating files comprising machine executable code which when executed by at least one processor, causes the processor to perform steps comprising:

accessing a plurality of files stored in a virtualized environment including one or more primary storage servers operating as a primary storage tier and storing a first plurality of files and one or more secondary storage servers operating as a secondary storage tier and storing a second plurality of files comprising at least a plurality of files not included in the first plurality of files, wherein a global namespace is associated with the first and second pluralities of files stored in the one or more primary and secondary storage servers;
identifying a subset of the accessed files that are stored in the primary storage tier and have identical file contents and storing a copy of only the subset of files in the secondary storage tier;
storing a single copy of the contents of each of the subset of files in the primary storage tier and deleting all other files having identical file contents from the primary storage tier; and
storing metadata associating each of the copies of the subset of files stored in the secondary storage tier with a corresponding one of the single copies stored in the primary storage tier.

23. A non-transitory computer readable medium according to claim 22, wherein identifying the subset of files that are stored in the primary storage tier and have identical file contents further comprises:

computing, for each of the plurality of files, a hash value based on contents of the file; and
identifying files having identical contents based on a comparison of the hash values.

24. A non-transitory computer readable medium according to claim 22, wherein storing the single copy of the contents further comprises copying the file contents to a designated mirror server of the primary storage tier.

25. A non-transitory computer readable medium according to claim 22, further having stored thereon instructions that when executed by the at least one processor cause the processor to perform steps further comprising upon a read access to one of the plurality of files, directing the read access to the single copy of the contents stored in the primary storage tier.

26. A non-transitory computer readable medium according to claim 22, further having stored thereon instructions that when executed by the at least one processor cause the processor to perform steps further comprising:

breaking the association between the copy in the secondary storage tier and the corresponding single copy of the contents stored in the primary storage tier;
modifying the copy stored in the secondary storage tier; and
migrating the modified copy from the secondary storage tier to the primary storage tier based on a migration policy.

27. A non-transitory computer readable medium according to claim 22, further having stored thereon instructions that when executed by the at least one processor cause the processor to perform steps further comprising deduplicating a selected file in the primary storage tier comprising:

determining whether contents of the selected file match contents of a previously deduplicated file having a corresponding single copy stored in the primary storage tier;
when the contents of the selected file match the contents of a previously deduplicated file, deduplicating the selected file;
otherwise determining whether the contents of the selected file match the contents of a non-duplicate file in the primary storage tier; and
when the contents of the selected file match the contents of a non-duplicate file, deduplicating both the selected file and the non-duplicate file.

28. A non-transitory computer readable medium according to claim 27, wherein determining whether the contents of the accessed file match the contents of a non-duplicate file in the primary storage tier further comprises:

maintaining a list of non-duplicate files in the primary storage tier, the list including a distinct hash value for each non-duplicate file;
comparing a hash value associated with the selected file to the hash values associated with the non-duplicate files in the list; and
when the contents of the selected file do not match the contents of any non-duplicate file, adding the selected file to the list of non-duplicate files.
Referenced Cited
U.S. Patent Documents
4993030 February 12, 1991 Krakauer et al.
5218695 June 8, 1993 Noveck et al.
5303368 April 12, 1994 Kotaki
5473362 December 5, 1995 Fitzgerald et al.
5511177 April 23, 1996 Kagimasa et al.
5537585 July 16, 1996 Blickenstaff et al.
5548724 August 20, 1996 Akizawa et al.
5550965 August 27, 1996 Gabbe et al.
5583995 December 10, 1996 Gardner et al.
5586260 December 17, 1996 Hu
5590320 December 31, 1996 Maxey
5623490 April 22, 1997 Richter et al.
5649194 July 15, 1997 Miller et al.
5649200 July 15, 1997 Leblang et al.
5668943 September 16, 1997 Attanasio et al.
5692180 November 25, 1997 Lee
5721779 February 24, 1998 Funk
5724512 March 3, 1998 Winterbottom
5806061 September 8, 1998 Chaudhuri et al.
5832496 November 3, 1998 Anand et al.
5832522 November 3, 1998 Blickenstaff et al.
5838970 November 17, 1998 Thomas
5862325 January 19, 1999 Reed et al.
5884303 March 16, 1999 Brown
5893086 April 6, 1999 Schmuck et al.
5897638 April 27, 1999 Lasser et al.
5905990 May 18, 1999 Inglett
5917998 June 29, 1999 Cabrera et al.
5920873 July 6, 1999 Van Huben et al.
5926816 July 20, 1999 Bauer et al.
5937406 August 10, 1999 Balabine et al.
5991302 November 23, 1999 Berl et al.
5995491 November 30, 1999 Richter et al.
5999664 December 7, 1999 Mahoney et al.
6012083 January 4, 2000 Savitzky et al.
6029168 February 22, 2000 Frey
6044367 March 28, 2000 Wolff
6047129 April 4, 2000 Frye
6072942 June 6, 2000 Stockwell et al.
6078929 June 20, 2000 Rao
6085234 July 4, 2000 Pitts et al.
6088694 July 11, 2000 Burns et al.
6104706 August 15, 2000 Richter et al.
6128627 October 3, 2000 Mattis et al.
6128717 October 3, 2000 Harrison et al.
6161145 December 12, 2000 Bainbridge et al.
6161185 December 12, 2000 Guthrie et al.
6181336 January 30, 2001 Chiu et al.
6202156 March 13, 2001 Kalajan
6223206 April 24, 2001 Dan et al.
6233648 May 15, 2001 Tomita
6237008 May 22, 2001 Beal et al.
6256031 July 3, 2001 Meijer et al.
6282610 August 28, 2001 Bergsten
6289345 September 11, 2001 Yasue
6308162 October 23, 2001 Ouimet et al.
6324581 November 27, 2001 Xu et al.
6329985 December 11, 2001 Tamer et al.
6339785 January 15, 2002 Feigenbaum
6349343 February 19, 2002 Foody et al.
6374263 April 16, 2002 Bunger et al.
6389433 May 14, 2002 Bolosky et al.
6393581 May 21, 2002 Friedman et al.
6397246 May 28, 2002 Wolfe
6412004 June 25, 2002 Chen et al.
6438595 August 20, 2002 Blumenau et al.
6466580 October 15, 2002 Leung
6469983 October 22, 2002 Narayana et al.
6477544 November 5, 2002 Bolosky et al.
6487561 November 26, 2002 Ofek et al.
6493804 December 10, 2002 Soltis et al.
6516350 February 4, 2003 Lumelsky et al.
6516351 February 4, 2003 Borr
6542909 April 1, 2003 Tamer et al.
6549916 April 15, 2003 Sedlar
6553352 April 22, 2003 Delurgio et al.
6556997 April 29, 2003 Levy
6556998 April 29, 2003 Mukherjee et al.
6560230 May 6, 2003 Li et al.
6601101 July 29, 2003 Lee et al.
6606663 August 12, 2003 Liao et al.
6612490 September 2, 2003 Herrendoerfer et al.
6654346 November 25, 2003 Mahalingaiah et al.
6721794 April 13, 2004 Taylor et al.
6728265 April 27, 2004 Yavatkar et al.
6738357 May 18, 2004 Richter et al.
6738790 May 18, 2004 Klein et al.
6742035 May 25, 2004 Zayas et al.
6744776 June 1, 2004 Kalkunte et al.
6748420 June 8, 2004 Quatrano et al.
6754215 June 22, 2004 Arikawa et al.
6757706 June 29, 2004 Dong et al.
6775672 August 10, 2004 Mahalingam et al.
6775673 August 10, 2004 Mahalingam et al.
6775679 August 10, 2004 Gupta
6782450 August 24, 2004 Arnott et al.
6801960 October 5, 2004 Ericson et al.
6826613 November 30, 2004 Wang et al.
6839761 January 4, 2005 Kadyk et al.
6847959 January 25, 2005 Arrouye et al.
6847970 January 25, 2005 Kar et al.
6850997 February 1, 2005 Rooney et al.
6871245 March 22, 2005 Bradley
6880017 April 12, 2005 Marce et al.
6889249 May 3, 2005 Miloushev et al.
6914881 July 5, 2005 Mansfield et al.
6922688 July 26, 2005 Frey, Jr.
6934706 August 23, 2005 Mancuso et al.
6938039 August 30, 2005 Bober et al.
6938059 August 30, 2005 Tamer et al.
6959373 October 25, 2005 Testardi
6961815 November 1, 2005 Kistler et al.
6973455 December 6, 2005 Vahalia et al.
6973549 December 6, 2005 Testardi
6975592 December 13, 2005 Seddigh et al.
6985936 January 10, 2006 Agarwalla et al.
6985956 January 10, 2006 Luke et al.
6986015 January 10, 2006 Testardi
6990114 January 24, 2006 Erimli et al.
6990547 January 24, 2006 Ulrich et al.
6990667 January 24, 2006 Ulrich et al.
6996841 February 7, 2006 Kadyk et al.
7010553 March 7, 2006 Chen et al.
7013379 March 14, 2006 Testardi
7039061 May 2, 2006 Connor et al.
7051112 May 23, 2006 Dawson
7072917 July 4, 2006 Wong et al.
7075924 July 11, 2006 Richter et al.
7089286 August 8, 2006 Malik
7111115 September 19, 2006 Peters et al.
7113962 September 26, 2006 Kee et al.
7120746 October 10, 2006 Campbell et al.
7127556 October 24, 2006 Blumenau et al.
7133967 November 7, 2006 Fujie et al.
7146524 December 5, 2006 Patel et al.
7152184 December 19, 2006 Maeda et al.
7155466 December 26, 2006 Rodriguez et al.
7165095 January 16, 2007 Sim
7167821 January 23, 2007 Hardwick et al.
7173929 February 6, 2007 Testardi
7194579 March 20, 2007 Robinson et al.
7234074 June 19, 2007 Cohn et al.
7236491 June 26, 2007 Tsao et al.
7280536 October 9, 2007 Testardi
7284150 October 16, 2007 Ma et al.
7293097 November 6, 2007 Borr
7293099 November 6, 2007 Kalajan
7293133 November 6, 2007 Colgrove et al.
7343398 March 11, 2008 Lownsbrough
7346664 March 18, 2008 Wong et al.
7383288 June 3, 2008 Miloushev et al.
7401220 July 15, 2008 Bolosky et al.
7406484 July 29, 2008 Srinivasan et al.
7415488 August 19, 2008 Muth et al.
7415608 August 19, 2008 Bolosky et al.
7440982 October 21, 2008 Lu et al.
7475241 January 6, 2009 Patel et al.
7477796 January 13, 2009 Sasaki et al.
7509322 March 24, 2009 Miloushev et al.
7512673 March 31, 2009 Miloushev et al.
7519813 April 14, 2009 Cox et al.
7562110 July 14, 2009 Miloushev et al.
7571168 August 4, 2009 Bahar et al.
7574433 August 11, 2009 Engel
7599941 October 6, 2009 Bahar et al.
7610307 October 27, 2009 Havewala et al.
7624109 November 24, 2009 Testardi
7639883 December 29, 2009 Gill
7653699 January 26, 2010 Colgrove et al.
7685177 March 23, 2010 Hagerstrom et al.
7734603 June 8, 2010 McManis
7788335 August 31, 2010 Miloushev et al.
7809691 October 5, 2010 Karmarkar et al.
7818299 October 19, 2010 Federwisch et al.
7822939 October 26, 2010 Veprinsky et al.
7831639 November 9, 2010 Panchbudhe et al.
7870154 January 11, 2011 Shitomi et al.
7877511 January 25, 2011 Berger et al.
7885970 February 8, 2011 Lacapra
7904466 March 8, 2011 Valencia et al.
7913053 March 22, 2011 Newland
7953701 May 31, 2011 Okitsu et al.
7958347 June 7, 2011 Ferguson
8005953 August 23, 2011 Miloushev et al.
8046547 October 25, 2011 Chatterjee et al.
8103622 January 24, 2012 Karinta
8112392 February 7, 2012 Bunnell et al.
8271751 September 18, 2012 Hinrichs, Jr.
8326798 December 4, 2012 Driscoll et al.
8351600 January 8, 2013 Resch
20010007560 July 12, 2001 Masuda et al.
20010014891 August 16, 2001 Hoffert et al.
20010047293 November 29, 2001 Waller et al.
20010051955 December 13, 2001 Wong
20020035537 March 21, 2002 Waller et al.
20020059263 May 16, 2002 Shima et al.
20020065810 May 30, 2002 Bradley
20020073105 June 13, 2002 Noguchi et al.
20020083118 June 27, 2002 Sim
20020087887 July 4, 2002 Busam et al.
20020106263 August 8, 2002 Winker
20020120763 August 29, 2002 Miloushev et al.
20020133330 September 19, 2002 Loisey et al.
20020133491 September 19, 2002 Sim et al.
20020138502 September 26, 2002 Gupta
20020143909 October 3, 2002 Botz et al.
20020147630 October 10, 2002 Rose et al.
20020150253 October 17, 2002 Brezak et al.
20020156905 October 24, 2002 Weissman
20020161911 October 31, 2002 Pinckney, III et al.
20020188667 December 12, 2002 Kirnos
20020194342 December 19, 2002 Lu et al.
20030009429 January 9, 2003 Jameson
20030012382 January 16, 2003 Ferchichi et al.
20030028514 February 6, 2003 Lord et al.
20030033308 February 13, 2003 Patel et al.
20030033535 February 13, 2003 Fisher et al.
20030061240 March 27, 2003 McCann et al.
20030065956 April 3, 2003 Belapurkar et al.
20030115218 June 19, 2003 Bobbitt et al.
20030115439 June 19, 2003 Mahalingam et al.
20030128708 July 10, 2003 Inoue et al.
20030135514 July 17, 2003 Patel et al.
20030149781 August 7, 2003 Yared et al.
20030156586 August 21, 2003 Lee et al.
20030159072 August 21, 2003 Bellinger et al.
20030171978 September 11, 2003 Jenkins et al.
20030177364 September 18, 2003 Walsh et al.
20030177388 September 18, 2003 Botz et al.
20030179755 September 25, 2003 Fraser
20030200207 October 23, 2003 Dickinson
20030204635 October 30, 2003 Ko et al.
20040003266 January 1, 2004 Moshir et al.
20040006575 January 8, 2004 Visharam et al.
20040010654 January 15, 2004 Yasuda et al.
20040017825 January 29, 2004 Stanwood et al.
20040025013 February 5, 2004 Parker et al.
20040028043 February 12, 2004 Maveli et al.
20040028063 February 12, 2004 Roy et al.
20040030857 February 12, 2004 Krakirian et al.
20040044705 March 4, 2004 Stager et al.
20040054748 March 18, 2004 Ackaouy et al.
20040054777 March 18, 2004 Ackaouy et al.
20040093474 May 13, 2004 Lin et al.
20040098383 May 20, 2004 Tabellion et al.
20040098595 May 20, 2004 Aupperle et al.
20040133573 July 8, 2004 Miloushev et al.
20040133577 July 8, 2004 Miloushev et al.
20040133606 July 8, 2004 Miloushev et al.
20040133607 July 8, 2004 Miloushev et al.
20040133650 July 8, 2004 Miloushev et al.
20040139355 July 15, 2004 Axel et al.
20040148380 July 29, 2004 Meyer et al.
20040153479 August 5, 2004 Mikesell et al.
20040181605 September 16, 2004 Nakatani et al.
20040199547 October 7, 2004 Winter et al.
20040213156 October 28, 2004 Smallwood et al.
20040236798 November 25, 2004 Srinivasan et al.
20040267830 December 30, 2004 Wong et al.
20050021615 January 27, 2005 Arnott et al.
20050050107 March 3, 2005 Mane et al.
20050091214 April 28, 2005 Probert et al.
20050108575 May 19, 2005 Yung
20050114291 May 26, 2005 Becker-Szendy et al.
20050114701 May 26, 2005 Atkins et al.
20050117589 June 2, 2005 Douady et al.
20050160161 July 21, 2005 Barrett et al.
20050175013 August 11, 2005 Le Pennec et al.
20050187866 August 25, 2005 Lee
20050198501 September 8, 2005 Andreev et al.
20050213587 September 29, 2005 Cho et al.
20050246393 November 3, 2005 Coates et al.
20050289109 December 29, 2005 Arrouye et al.
20050289111 December 29, 2005 Tribble et al.
20060010502 January 12, 2006 Mimatsu et al.
20060045096 March 2, 2006 Farmer et al.
20060074922 April 6, 2006 Nishimura
20060075475 April 6, 2006 Boulos et al.
20060080353 April 13, 2006 Miloushev et al.
20060106882 May 18, 2006 Douceur et al.
20060112151 May 25, 2006 Manley et al.
20060123062 June 8, 2006 Bobbitt et al.
20060140193 June 29, 2006 Kakani et al.
20060153201 July 13, 2006 Hepper et al.
20060161518 July 20, 2006 Lacapra
20060167838 July 27, 2006 Lacapra
20060179261 August 10, 2006 Rajan
20060184589 August 17, 2006 Lees et al.
20060190496 August 24, 2006 Tsunoda
20060200470 September 7, 2006 Lacapra et al.
20060206547 September 14, 2006 Kulkarni et al.
20060212746 September 21, 2006 Amegadzie et al.
20060218135 September 28, 2006 Bisson et al.
20060224636 October 5, 2006 Kathuria et al.
20060224687 October 5, 2006 Popkin et al.
20060230265 October 12, 2006 Krishna
20060242179 October 26, 2006 Chen et al.
20060259949 November 16, 2006 Schaefer et al.
20060268692 November 30, 2006 Wright et al.
20060271598 November 30, 2006 Wong et al.
20060277225 December 7, 2006 Mark et al.
20060282461 December 14, 2006 Marinescu
20060282471 December 14, 2006 Mark et al.
20070022121 January 25, 2007 Bahar et al.
20070024919 February 1, 2007 Wong et al.
20070027929 February 1, 2007 Whelan
20070027935 February 1, 2007 Haselton et al.
20070028068 February 1, 2007 Golding et al.
20070088702 April 19, 2007 Fridella et al.
20070098284 May 3, 2007 Sasaki et al.
20070136308 June 14, 2007 Tsirigotis et al.
20070139227 June 21, 2007 Speirs, II et al.
20070180314 August 2, 2007 Kawashima et al.
20070208748 September 6, 2007 Li
20070209075 September 6, 2007 Coffman
20070226331 September 27, 2007 Srinivasan et al.
20080046432 February 21, 2008 Anderson et al.
20080070575 March 20, 2008 Claussen et al.
20080104443 May 1, 2008 Akutsu et al.
20080114718 May 15, 2008 Anderson et al.
20080189468 August 7, 2008 Schmidt et al.
20080200207 August 21, 2008 Donahue et al.
20080209073 August 28, 2008 Tang
20080215836 September 4, 2008 Sutoh et al.
20080222223 September 11, 2008 Srinivasan et al.
20080243769 October 2, 2008 Arbour et al.
20080282047 November 13, 2008 Arakawa et al.
20080294446 November 27, 2008 Guo et al.
20090007162 January 1, 2009 Sheehan
20090013138 January 8, 2009 Sudhakar
20090037975 February 5, 2009 Ishikawa et al.
20090041230 February 12, 2009 Williams
20090055507 February 26, 2009 Oeda
20090055607 February 26, 2009 Schack et al.
20090077097 March 19, 2009 Lacapra et al.
20090089344 April 2, 2009 Brown et al.
20090094252 April 9, 2009 Wong et al.
20090106255 April 23, 2009 Lacapra et al.
20090106263 April 23, 2009 Khalid et al.
20090132616 May 21, 2009 Winter et al.
20090204649 August 13, 2009 Wong et al.
20090204650 August 13, 2009 Wong et al.
20090204705 August 13, 2009 Marinov et al.
20090210431 August 20, 2009 Marinkovic et al.
20090210875 August 20, 2009 Bolles et al.
20090240705 September 24, 2009 Miloushev et al.
20090240899 September 24, 2009 Akagawa et al.
20090254592 October 8, 2009 Marinov et al.
20090265396 October 22, 2009 Ram et al.
20100017643 January 21, 2010 Baba et al.
20100077294 March 25, 2010 Watson
20100082542 April 1, 2010 Feng et al.
20100205206 August 12, 2010 Rabines et al.
20100211547 August 19, 2010 Kamei et al.
20100325634 December 23, 2010 Ichikawa et al.
20110083185 April 7, 2011 Sheleheda et al.
20110087696 April 14, 2011 Lacapra
20110093471 April 21, 2011 Brockway et al.
20110107112 May 5, 2011 Resch
20110119234 May 19, 2011 Schack et al.
20110320882 December 29, 2011 Beaty et al.
20120144229 June 7, 2012 Nadolski
20120150699 June 14, 2012 Trapp et al.
Foreign Patent Documents
2003300350 July 2004 AU
2080530 April 1994 CA
2512312 July 2004 CA
0605088 February 1996 EP
0 738 970 October 1996 EP
63010250 January 1988 JP
6205006 July 1994 JP
06-332782 December 1994 JP
8021924 March 1996 JP
08-328760 December 1996 JP
08-339355 December 1996 JP
9016510 January 1997 JP
11282741 October 1999 JP
2000-183935 June 2000 JP
566291 December 2008 NZ
02/39696 May 2002 WO
WO 02/056181 July 2002 WO
WO 2004/061605 July 2004 WO
2006091040 August 2006 WO
WO 2008/130983 October 2008 WO
WO 2008/147973 December 2008 WO
Other references
  • “The AFS File System in Distributed Computing Environment”, www.transarc.ibm.com/Library/whitepapers/AFS/afsoverview.html, last accessed on Dec. 20, 2002.
  • Aguilera, Marcos K. et al., “Improving recoverability in multi-tier storage systems”, International Conference on Dependable Systems and Networks (DSN-2007), Edinburgh, Scotland, Jun. 2007, 10 pages.
  • Anderson, Darrell C. et al., “Interposed Request Routing for Scalable Network Storage”, ACM Transactions on Computer Systems 20(1): (Feb. 2002), pp. 1-24.
  • Anderson et al., “Serverless Network File System”, in the 15th Symposium on Operating Systems Principles, Dec. 1995, Association for Computing Machinery, Inc.
  • Anonymous, “How DFS Works: Remote File Systems”, Distributed File System (DFS) Technical Reference, retrieved from the Internet on Feb. 13, 2009: URL<:http://technetmicrosoft.com/en-us/library/cc782417W5.10,printer).aspx> (Mar. 2003).
  • Apple, Inc., “Mac OS X Tiger Keynote Intro. Part 2”, Jun. 2004, www.youtube.com <http://www.youtube.com/watch?v=zSBJwEmR.JbY>, p. 1.
  • Apple, Inc., “Tiger Developer Overview Series: Working with Spotlight”, Nov. 23, 2004, www.apple.com using www.archive.org <http ://web.archive.org/web/20041123005335/developer.apple.com/macosx/tiger/sp otlight.html>, pp. 1-6.
  • “Auspex Storage Architecture Guide”, Second Edition, 2001, Auspex Systems, Inc., www.auspex.com, last accessed on Dec. 30, 2002.
  • Cabrera et al., “Swift: Storage Architecture for Large Objects”, in Proceedings of the-Eleventh IEEE Symposium on Mass Storage Systems, pp. 123-128, Oct. 1991.
  • Cabrera et al., “Swift: Using Distributed Disk Striping to Provide High I/O Data Rates”, Computing Systems 4, 4 (Fall 1991), pp. 405-436.
  • Cabrera et al., “Using Data Striping in a Local Area Network”, 1992, technical report No. UCSC-CRL-92-09 of the Computer & Information Sciences Department of University of California at Santa Cruz.
  • Callaghan et al., “NFS Version 3 Protocol Specifications” (RFC 1813), Jun. 1995, The Internet Engineering Task Force (IETN), www.ietf.org, last accessed on Dec. 30, 2002.
  • Carns et al., “PVFS: A Parallel File System for Linux Clusters”, in Proceedings of the Extreme Linux Track: 4th Annual Linux Showcase and Conference, pp. 317-327, Atlanta, Georgia, Oct. 2000, USENIX Association.
  • Cavale, M. R., “Introducing Microsoft Cluster Service (MSCS) in the Windows Server 2003”, Microsoft Corporation, Nov. 2002.
  • “CSA Persistent File System Technology”, Colorado Software Architecture, Inc.: A White Paper, Jan. 1, 1999, p. 1-3, <http://www.cosoa.com/whitepapers/pfs.php>.
  • “Distributed File System: Logical View of Physical Storage: White Paper”, 1999, Microsoft Corp., www.microsoft.com, <http://www.eu.microsoft.com/TechNet/prodtechnol/windows2000serv/maintain/DFSnt95>, pp. 1-26, last accessed on Dec. 20, 2002.
  • English Language Abstract of JP 08-328760 from Patent Abstracts of Japan.
  • English Language Abstract of JP 08-339355 from Patent Abstracts of Japan.
  • English Translation of paragraphs 17, 32, and 40-52 of JP 08-328760.
  • English Translation of Notification of Reason(s) for Refusal for JP 2002-556371 (Dispatch Date: Jan. 22, 2007).
  • Fan et al., “Summary Cache: A Scalable Wide-Area Protocol”, Computer Communications Review, Association Machinery, New York, USA, Oct. 1998, vol. 28, Web Cache Sharing for Computing No. 4, pp. 254-265.
  • Farley, M., “Building Storage Networks”, Jan. 2000, McGraw Hill, ISBN 0072120509.
  • Gibson et al., “File Server Scaling with Network-Attached Secure Disks”, in Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (Sigmetrics '97), Jun. 15-18, 1997, Association for Computing Machinery, Inc.
  • Gibson et al., “NASD Scalable Storage Systems”, Jun. 1999, USENIX99, Extreme Linux Workshop, Monterey, California.
  • Harrison, C., May 19, 2008 response to Communication pursuant to Article 96(2) EPC dated Nov. 9, 2007 in corresponding European patent application No. 02718824.2.
  • Hartman, J., “The Zebra Striped Network File System”, 1994, Ph.D. dissertation submitted in the Graduate Division of the University of California at Berkeley.
  • Haskin et al., “The Tiger Shark File System”, 1996, in proceedings of IEEE, Spring COMPCON, Santa Clara, CA, www.research.ibm.com, last accessed on Dec. 30, 2002.
  • Hu, J., Final Office action dated Sep. 21, 2007 for related U.S. Appl. No. 10/336,784.
  • Hu, J., Office action dated Feb. 6, 2007 for related U.S. Appl. No. 10/336,784.
  • Hwang et al., “Designing SSI Clusters with Hierarchical Checkpointing and Single 1/0 Space”, IEEE Concurrency, pp. 60-69, Jan.-Mar. 1999.
  • International Search Report for International Patent Application No. PCT/US2008/083117 (Jun. 23, 2009).
  • International Search Report for International Patent Application No. PCT/US2008/060449 (Apr. 9, 2008).
  • International Search Report for International Patent Application No. PCT/US2008/064677 (Sep. 6, 2009).
  • International Search Report for International Patent Application No. PCT/US02/00720, Jul. 8, 2004.
  • International Search Report from International Application No. PCT/US03/41202, mailed Sep. 15, 2005.
  • Karamanolis, C. et al., “An Architecture for Scalable and Manageable File Services”, HPL-2001-173, Jul. 26, 2001. p. 1-114.
  • Katsurashima, W. et al., “NAS Switch: A Novel CIFS Server Virtualization, Proceedings”, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003 (MSST 2003), Apr. 2003.
  • Kimball, C.E. et al., “Automated Client-Side Integration of Distributed Application Servers”, 13Th LISA Conf., 1999, pp. 275-282 of the Proceedings.
  • Klayman, J., Nov. 13, 2008 e-mail to Japanese associate including instructions for response to office action dated May 26, 2008 in corresponding Japanese patent application No. 2002-556371.
  • Klayman, J., Response filed by Japanese associate to office action dated Jan. 22, 2007 in corresponding Japanese patent application No. 2002-556371.
  • Klayman, J., Jul. 18, 2007 e-mail to Japanese associate including instructions for response to office action dated Jan. 22, 2007 in corresponding Japanese patent application No. 2002-556371.
  • Kohl et al., “The Kerberos Network Authentication Service (V5)”, RFC 1510, Sep. 1993. (http://www.ietf.org/ rfc/rfc1510.txt?number=1510).
  • Korkuzas, V., Communication pursuant to Article 96(2) EPC dated Sep. 11, 2007 in corresponding European patent application No. 02718824.2-2201.
  • Lelil, S., “Storage Technology News: AutoVirt adds tool to help data migration projects”, Feb. 25, 2011, last accessed Mar. 17, 2011, <http://searchstorage.techtarget.com/news/article/0,289142, sid5gci1527986,00. html>.
  • Long et al., “Swift/RAID: A distributed RAID System”, Computing Systems, vol. 7, pp. 333-359, Summer 1994.
  • “NERSC Tutorials: I/O on the Cray T3E, 'Chapter 8, Disk Striping'”, National Energy Research Scientific Computing Center (NERSC), http://hpcfnersc.gov, last accessed on Dec. 27, 2002.
  • Noghani et al., “A Novel Approach to Reduce Latency on the Internet: 'Component-Based Download”', Proceedings of the Computing, Las Vegas, NV, Jun. 2000, pp. 1-6 on the Internet: Intl Conf on Internet.
  • Norton et al., “CIFS Protocol Version CIFS-Spec 0.9”, 2001, Storage Networking Industry Association (SNIA), www.snia.org, last accessed on Mar. 26, 2001.
  • Patterson et al., “A case for redundant arrays of inexpensive disks (RAID)”, Chicago, Illinois, Jun. 1-3, 1998, in Proceedings of ACM SIGMOND conference on the Management of Data, pp. 109-116, Association for Computing Machinery, Inc., www.acm.org, last accessed on Dec. 20, 2002.
  • Pearson, P.K., “Fast Hashing of Variable-Length Text Strings”, Comm. of the ACM, vol. 33, No. 6, Jun. 1990.
  • Peterson, M., “Introducing Storage Area Networks”, Feb 1998, InfoStor, www.infostor.com, last accessed on Dec. 20, 2002.
  • Preslan et al., “Scalability and Failure Recovery in a Linux Cluster File System”, in Proceedings of the 4th Annual Linux Showcase & Conference, Atlanta, Georgia, Oct. 10-14,2000, pp. 169-180 of the Proceedings, www.usenix.org, last accessed on Dec. 20, 2002.
  • Response filed Jul. 6, 2007 to Office action dated Feb. 6, 2007 for related U.S. Appl. No. 10/336,784.
  • Response filed Mar. 20, 2008 to Final Office action dated Sep. 21,2007 for related U.S. Appl. No. 10/336,784.
  • Rodriguez et al., “Parallel-access for mirror sites in the Internet”, InfoCom 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE Tel Aviv, Israel Mar. 26-30, 2000, Piscataway, NJ, USA, IEEE, US, pp. 864-873, XP010376176 ISBN: 0/7803-5880—5 p. 867, col. 2, last paragraph—p. 868, col. 1, paragraph 1.
  • Rsync, “Welcome to the RSYNC Web Pages”, Retrieved from the Internet URL: http://samba.anu.edu.ut.rsync/. (Retrieved on Dec. 18, 2009).
  • Savage, et al., “AFRAID—A Frequently Redundant Array of Independent Disks”, 1996 USENIX Technical Conf., San Diego, California, Jan. 22-26, 1996.
  • “Scaling Next Generation Web Infrastructure with Content-Intelligent Switching: White Paper”, Apr. 2000, p. 1-9 Alteon Web Systems, Inc.
  • Soltis et al., “The Design and Performance of a Shared Disk File System for IRIX”, in Sixth NASA Goddard Space Flight Center Conference on Mass Storage and Technologies in cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems, Mar. 23-26, 1998.
  • Soltis, et al., “The Global File System”, in Proceedings of the Fifth NASA Goddard Space Flight Center Conference on Mass Storage Systems and Technologies, Sep. 17-19, 1996, College Park, Maryland.
  • Sorenson, K.M., “Installation and Administration: Kimberlite Cluster Version 1.1.0, Rev. Dec. 2000”, Mission Critical Linux, http://oss.missioncriticallinux.corn/kimberlite/kimberlite.pdf.
  • Stakutis, C., “Benefits of SAN-based file system sharing”, Jul. 2000, InfoStor, www.infostor.com, last accessed on Dec. 30, 2002.
  • Thekkath et al., “Frangipani: A Scalable Distributed File System”, in Proceedings of the 16th ACM Symposium on Operating Systems Principles, Oct. 1997, Association for Computing Machinery, Inc.
  • Uesugi, H., Nov. 26, 2008 amendment filed by Japanese associate in response to office action dated May 26, 2008 in corresponding Japanese patent application No. 2002556371.
  • Uesugi, H., English translation of office action dated May 26, 2008 in corresponding Japanese patent application No. 2002-556371.
  • Uesugi, H., Jul. 15, 2008 letter from Japanese associate reporting office action dated May 26, 2008 in corresponding Japanese patent application No. 2002-556371.
  • “VERITAS SANPoint Foundation Suite(tm) and SANPoint Foundation Suite(tm) HA: New VERITAS vol. Management and File System Technology for Cluster Environments”, Sep. 2001, VERITAS Software Corp.
  • Wilkes, J., et al., “The HP AutoRAID Hierarchical Storage System”, ACM Transactions on Computer Systems, vol. 14, No. 1, Feb. 1996.
  • “Windows Clustering Technologies—An Overview”, Nov. 2001, Microsoft Corp., www.microsoft.com, last accessed on Dec. 30, 2002.
  • Zayas, E., “AFS-3 Programmer's Reference: Architectural Overview”, Transarc Corp., version 1.0 of Sep. 2, 1991, doc. No. FS-00-D160.
  • Basney, Jim et al., “Credential Wallets: A Classification of Credential Repositories Highlighting MyProxy,” TPRC 2003, Sep. 19-21, 2003.
  • Botzum, Keys, “Single Sign On—A Contrarian View,” Open Group Website, <http://www.opengroup.org/security/topics.htm>, Aug. 6, 2001, pp. 1-8.
  • Novotny, Jason et al., “An Online Credential Repository for the Grid: MyProxy,” 2001, pp. 1-8.
  • Pashalidis, Andreas et al., “A Taxonomy of Single Sign-On Systems,” 2003, pp. 1-16, Royal Holloway, University of London, Egham Surray, TW20, 0EX, United Kingdom.
  • Pashalidis, Andreas et al., “Impostor: a single sign-on system for use from untrusted devices,” Global Telecommunications Conference, 2004, GLOBECOM '04, IEEE, Issue Date: Nov. 29-Dec. 3, 2004.Royal Holloway, University of London.
  • Tulloch, Mitch, “Microsoft Encyclopedia of Security,” pp. 218, 300-301, Microsoft Press, 2003, Redmond, Washington.
  • Gupta et al., “Algorithms for Packet Classification”, Computer Systems Laboratory, Stanford University, CA, Mar./Apr. 2001, pp. 1-29.
  • Heinz Il G., “Priorities in Stream Transmission Control Protocol (SCTP) Multistreaming”, Thesis submitted to the Faculty of the University of Delaware, Spring 2003, pp. 1-35.
  • Internet Protocol,“Darpa Internet Program Protocol Specification”, (RFC:791), Information Sciences Institute, University of Southern California, Sep. 1981, pp. 1-49.
  • Ilvesmaki M., et al., “On the capabilities of application level traffic measurements to differentiate and classify Internet traffic”, Presented in SPIE's International Symposium ITcom, Aug. 19-21, 2001, pp. 1-11, Denver, Colorado.
  • Modiano E., “Scheduling Algorithms for Message Transmission Over a Satellite Broadcast System,” MIT Lincoln Laboratory Advanced Network Group, Nov. 1997, pp. 1-7.
  • Ott D., et al., “A Mechanism for TCP-Friendly Transport-level Protocol Coordination”, USENIX Annual Technical Conference, 2002, University of North Carolina at Chapel Hill, pp. 1-12.
  • Padmanabhan V., et al., “Using Predictive Prefetching to Improve World Wide Web Latency”, SIGCOM, 1996, pp. 1-15.
  • Rosen E., et al., “MPLS Label Stack Encoding”, (RFC:3032) Network Working Group, Jan. 2001, pp. 1-22, (http://www.ietrorg/rfc/rfc3032.txt).
  • Wang B., “Priority and Realtime Data Transfer Over the Best-Effort Internet”, Dissertation Abstract, Sep. 2005, ScholarWorks©UMASS.
  • Woo T.Y.C., “A Modular Approach to Packet Classification: Algorithms and Results”, Nineteenth Annual Conference of the IEEE Computer and Communications Societies 3(3):1213-22, Mar. 26-30, 2000, abstract only, (http://ieeexplore.ieee.org/xpl/freeabsall.jsp?arnumber=832499).
Patent History
Patent number: 8548953
Type: Grant
Filed: Nov 11, 2008
Date of Patent: Oct 1, 2013
Patent Publication Number: 20090204649
Assignee: F5 Networks, Inc. (Seattle, WA)
Inventors: Thomas K. Wong (Pleasanton, CA), Ron S. Vogel (San Jose, CA)
Primary Examiner: Vincent F Boccio
Application Number: 12/268,573
Classifications
Current U.S. Class: Deletion Due To Duplication (707/664); Data Cleansing, Data Scrubbing, And Deleting Duplicates (707/692); Using Hash Function (707/698)
International Classification: G06F 7/00 (20060101); G06F 17/00 (20060101);