REGULATING DATA STORAGE BASED ON COPY QUANTITY
In some examples, a server may receive a data file from one or more computing devices, and may store the data file at a storage system provided by a data storage service. The server may determine a number of copies of the data file to be stored at the storage system based on a number of a set of computing devices that store the data file. For example, the set of computing devices may be outside of the storage system, and the determined number of copies of the data file to be stored at the storage system may decrease when the number of the set of computing devices that store the data file increases. Additionally, the server may adjust the number of copies of the data file stored at the storage system based on the determined number of copies of the data file.
This application is the continuation of U.S. application Ser. No. 14/044,498, filed Oct. 2, 2013, which application claims the benefit of U.S. Provisional Patent Application No. 61/708,794, filed on Oct. 2, 2012, which applications are incorporated by reference herein in their entirety.
TECHNICAL FIELDSeveral of the disclosed embodiments relate to data storage, and more particularly, to regulating a number of copies of a data FILE to be stored in a storage system based on a popularity of the data file.
BACKGROUNDCurrent storage services such as cloud storage services allow users to store various multi-media content such as music files, video files, images, documents, etc. in the cloud. In order to provide for recovery from data loss, the cloud storage services typically replicate the content and store various copies of the content at different storage systems, and probably at different locations. This requires huge amounts of storage resources and other associated infrastructure and maintenance resources to maintain the data centers. This can result in increased costs. Further, some of the data files stored for various users can be identical. For example, a music file such as “Optimistic” by “Radiohead” is the same for any user storing that music file with cloud storage service. The current cloud storage services store multiple copies of the same file, e.g., one for each user who uploaded the music file, which results in a significant amount of space being used for storing identical files. Accordingly, the current storage services are inefficient at least in terms of managing the available storage space.
SUMMARYTechnology is disclosed for regulating data storage based on a popularity of data files (“the technology”). Various embodiments of the technology provide for maintaining a fixed durability level of data files stored in a storage system by regulating a number of copies of the data files stored in the storage system. One such embodiment includes regulating the number of copies of a particular data file stored in the storage system based on a popularity of the particular data file among various users who use the storage system. The number of copies stored in the storage system is increased or decreased, including from/to zero copies, based on the popularity of the particular data file. Further, the storage system can store either a complete data file or for a portion of the data file. Accordingly, the technology is applicable to either the complete data file or a portion of the data file.
In some embodiments, the popularity of the particular data file is determined by computing a popularity value for the particular data file. The popularity value of the particular data file can be determined based on a number of factors, including one or more of: (a) a number of computing devices associated with one or more of the users that contain the particular data file, (b) a latency associated with reading the particular data file from one or more of the computing devices that contain the particular data file, (c) a network bandwidth available for reading the particular data file from one or more of the computing devices that contain the particular data file, (d) availability of a network connection with one or more of the computing devices that contain the particular data file for reading the particular data file, (e) a number of the users requiring storage for the same data file at the storage system, or (f) access pattern of the particular data file for a specific user or a subset of the users. In some embodiments, one or more the above factors can be weighted relative to each other.
The popularity value can be determined in various units and using various mathematical equations. One example expression of a popularity value can include a percentage value, where a popularity value of 100% can indicate that all the users serviced by the storage system have a copy of the particular data file on all their computing devices, the particular data file can be fetched from any of the computing devices with a minimum latency, the particular data file is accessed frequently etc. On the other hand, a popularity value of 0% can indicate that none of the users have a copy of the particular data file or it is not possible to retrieve a copy within maximum accepted latency etc.
The number of copies stored in the storage system is increased or decreased, including from/to zero copies, based on the popularity value. For example, if the popularity value of a particular data file is 100%, the storage system may not store any copies of the particular data file since the particular data file is available at all the computing devices of the users and can be retrieved from any of the computing devices at any time. On the other hand, if the popularity value of a particular data file is 0% the storage system may store one or more copies of the particular data file since the particular data file is not available at any of the computing devices or cannot be retrieved within a maximum accepted latency etc. Generally, the higher the popularity of the data file, the lower the number of copies of the data file that need to be stored at the storage system. Further, various popularity value ranges and number of copies that can be stored for each of the ranges can be configured, e.g., by an entity such as an administrator of the storage server.
When a user requests a particular data file, a server determines whether the particular data file is available at the storage system. If the particular data file is available at the storage system, the server serves the request by fetching the file from the storage system. On the other hand, if the particular data file is not available at the storage system, the server serves the request by fetching the file from any of the other computing devices of the user and/or any of the computing devices of other users that contain the particular data file.
Technology is disclosed for regulating data storage based on a popularity of data (“the technology”). Various embodiments of the technology provide for maintaining a fixed durability level of data files stored in a storage system by regulating a number of copies of the data files stored in the storage system. One such embodiment includes regulating the number of copies of a particular data file stored in the storage system based on the popularity of the particular data file among various users who use the storage system. The number of copies stored in the storage system is increased or decreased, including from/to zero copies, based on the popularity of the particular data file. Further, the storage system can store either a complete data file or for a portion of the data file. Accordingly, the technology is applicable to either the complete data file or a portion of the data file.
In some embodiments, the popularity of the particular data file is determined by computing a popularity value for the particular data file. The popularity value of the particular data file can be determined based on a number of factors, including one or more of: (a) a number of computing devices associated with one or more of the users that contain the particular data file, (b) a latency associated with reading the particular data file from one or more of the computing devices that contain the particular data file, (c) a network bandwidth available for reading the particular data file from one or more of the computing devices that contain the particular data file, (d) availability of a network connection with one or more of the computing devices that contain the particular data file for reading the particular data file, (e) a number of the users requiring storage for the same data file at the storage system, or (f) access pattern of the particular data file for a specific user or a subset of the users. In some embodiments, one or more the above factors can be weighted relative to each other.
The popularity value can be determined in various units and using various mathematical equations. One example expression of a popularity value can include a percentage value, where a popularity value of 100% can indicate that all the users serviced by the storage system have a copy of the particular data file on all their computing devices, the particular data file can be fetched from any of the computing devices with a minimum latency, the particular data file is accessed frequently etc. On the other hand, a popularity value of 0% can indicate that none of the users have a copy of the particular data file or it is not possible to retrieve a copy within maximum accepted latency etc.
The number of copies stored in the storage system is increased or decreased, including from/to zero copies, based on the popularity value. For example, if the popularity value of a particular data file is 100% the storage system may not store any copies of the particular data file since the particular data file is available at all the computing devices of the users and can be retrieved from any of the computing devices at any time. On the other hand, if the popularity value of a particular data file is 0% the storage system may store one or more copies of the particular data file since the particular data file is not available at any of the computing devices or cannot be retrieved within a maximum accepted latency etc. Generally, the higher the popularity of the data file, the lower the number of copies of the data file that need to be stored at the storage system. Further, various popularity value ranges and number of copies that can be stored for each of the ranges can be configured, e.g., by an entity such as an administrator of the storage server.
When a user requests a particular data file, a server determines whether the particular data file is available at the storage system. If the particular data file is available at the storage system, the server serves the request by fetching the file from the storage system. On the other hand, if the particular data file is not available at the storage system, the server serves the request by fetching the file from any of the other computing devices of the user and/or any of the computing devices of other users that contain the particular data file.
EnvironmentA cloud data interface 120 can also be included to receive data from and send data to computing devices 130-140. The cloud data interface 120 can include network communication hardware and network connection logic to receive the information from computing devices. The network can be a local area network (LAN), wide area network (WAN) or the Internet. The cloud data interface 120 may include a queuing mechanism to organize data updates received from or sent to the computing devices 130-140.
Although
The computing devices 130-140 include an operating system 132-142 to manage the hardware resources of the computing devices 130-140 and provide services for running computer applications 134-144 (e.g., mobile applications running on mobile devices). The operating system 132-142 facilitates execution of the computer applications 134-144 on the computing device 130-140. The computing devices 130-140 include at least one local storage device 138-148 to store the computer applications 134-144 and user data. The computing device 130 or 140 can be a desktop computer, a laptop computer, a tablet computer, an automobile computer, a game console, a smartphone, a personal digital assistant, or other computing devices capable of running computer applications, as contemplated by a person having ordinary skill in the art.
The computer applications 134-144 stored in the computing devices 130-140 can include applications for general productivity and information retrieval, including email, calendar, contacts, and stock market and weather information. The computer applications 134-144 can also include applications in other categories, such as mobile games, factory automation, GPS and location-based services, banking, order-tracking, ticket purchases or any other categories as contemplated by a person having ordinary skill in the art.
The operating system 132-142 of the computing devices 130-140 includes socket redirection modules 136-146 to redirect network messages. The computer applications 134-144 generate and maintain network connections directed to various remote servers (not illustrated). The remote servers can include applications, products or services such as social networking applications that the users may interact with via the computer applications 142-144. Instead of directly opening and maintaining the network connections with these remote servers, the socket redirection modules 136-146 route all of the network messages for these connections of the computer applications 134-144 to the cloud server 110. The cloud server 110 is responsible for opening and maintaining network connections with the remote servers.
All or some of the network connections of the computing devices 130-140 are through the cloud server 110. The network connections can include Transmission Control Protocol (TCP) connections, User Datagram Protocol (UDP) connections, or other types of network connections based on other protocols. When there are multiple computer applications 134-144 that need network connections to multiple remote servers, the computing devices 130-140 only need to maintain one network connections with the cloud server 110. The cloud server 110 will in turn maintain multiple connections with the remote servers on behalf of the computer applications 134-144.
In various embodiments, the cloud server 110 maintains a certain level of durability of the data files stored at the storage system 105 by regulating a number of copies of the data files stored at the storage system 105. In some embodiments, the cloud server 110 regulates the number of copies of the data files based on the popularity values of the data files. For example, the more popular the data files are among the users, the fewer the number of copies of the data files stored at the storage system. Additional details with respect to regulating number of copies of the data files based on the popularity values are described at least with reference to
The server 230 provides data storage services to a number of users, including a first user, a second user and a third user to store various data files. The data files can include files such as images, videos, logs, application configuration files, computing device configuration files etc. A user can upload data files from one or more computing devices associated with the user to the server 230 via a communication network 225. For example, a first user can upload data file, File A, from a first computing device 205 and a second computing device 210. Similarly, the third computing device 215 uploads data file, “File A” and “File B” and the fourth computing device 220 “File A,” “File B” and “File C.” Accordingly, the server 230 stores four copies of data file, “File A” in the storage system 235, two copies of “File B” and one copy of “File C.” The storage system 235a can have a number of storage units across which the data files can be stored. Further, in some embodiments, the storage units can be spread across various geographical locations.
Typically, a storage system keeps a number of copies of the data files in order to improve the durability of data files, e.g., to minimize the impact due to data loss either at the user end or at the storage system end. In some embodiments, the server 230 maintains a certain level of durability of the data files stored at the storage system 235a by regulating a number of copies of the data files stored at the storage system 235a based on the popularity of the data files. The more popular the data files are among the users, the lower the number of copies of the data files are stored at the storage system.
The popularity of a particular data file is measured using a popularity value. In some embodiments, the popularity value of the particular data file is determined based on a number of factors, including one or more of: (a) a number of computing devices associated with one or more of the users that contain the particular data file, (b) a latency associated with reading the particular data file from one or more of the computing devices that contain the particular data file, (c) a network bandwidth available for reading the particular data file from one or more of the computing devices that contain the particular data file, (d) availability of a network connection with one or more of the computing devices that contain the particular data file for reading the particular data file, (e) a number of the users requiring storage for the same data file at the storage system, or (f) access pattern of the particular data file for a specific user or a subset of the users. In some embodiments, one or more the above factors can be weighted relative to each other and an overall popularity value of the particular data file can be determined as a function of the popularity value for one or more of the above factors.
In some embodiments, the higher the number of computing devices that contain the particular data file, the higher is the popularity value of the particular data file. This may indicate that since the particular data file is available from many computing devices, a lesser number of copies, including zero, may be stored at the storage system 235. When a user requests to retrieve the particular data file, the server obtains the particular data file from one of the computing devices and serves the particular data file to the user.
In some embodiments, the higher the latency associated with reading the particular data file from one or more of the computing devices that contain the particular data file, the lower the popularity value of the particular data file is. In some embodiments, if the latency is above a maximum acceptable value, the server may determine to store a higher number of copies at the storage system 235. In some embodiments, an overall latency based popularity value may be determined as an average of or as any other function of latency based popularity value of the particular data file for each of the computer devices that contain the particular data file.
In some embodiments, the higher the network bandwidth available for reading the particular data file from one or more of the computing devices that contain the particular data file, the higher popularity value. In some embodiments, an overall network bandwidth based popularity value may be determined as an average or as any other function of network bandwidth based popularity value of the particular data file for each of the computer devices that contain the particular data file.
In some embodiments, the higher the availability of a network connection with one or more of the computing devices that contain the particular data file for reading the particular data file higher the popularity value of the particular data file. In some embodiments, an overall network connection availability based popularity value may be determined as an average or as any other function of network connection availability based popularity value of the particular data file for each of the computer devices that contain the particular data file.
In some embodiments, the higher the number of the users requiring storage for the same data file at the storage system the higher the popularity value of the particular data file.
In some embodiments, the access pattern of the particular data file is considered for determining the popularity value. The access pattern can be based on how frequently the particular data file stored at the storage system 235a is accessed or requested by a user who has uploaded the particular data file. The higher the frequency of access, the higher the number of copies stored at the storage system. If the frequency of access is high, the server 230 may determine to store one or more copies on the storage system since it may be faster and more efficient to retrieve the data file from the storage system rather than the computing devices of the users that contain the copy of the particular data file. Accordingly, the higher the frequency of access the lower the popularity value. Further, in some embodiments, the access pattern of the particular data file may be considered not only for a particular user but also for a subset of the users.
The popularity value can be determined in various units and using various mathematical equations. One example expression of a popularity value can include a percentage value, where a popularity value of 100% can indicate that all the users serviced by the storage system have a copy of the particular data file on all their computing devices, the particular data file can be fetched from any of the computing devices with a minimum latency, the particular data file is accessed frequently etc. On the other hand, a popularity value of 0% can indicate that none of the users have a copy of the particular data file or it is not possible retrieve a copy within maximum accepted latency etc.
The number of copies stored in the storage system is increased or decreased, including from/to zero, based on the popularity value. For example, if the popularity value of a particular data file is 100% the storage system may not store any copies of the particular data file since the particular data file is available at all the computing devices of the users and can be retrieved from any of the computing devices at any time. On the other hand, if the popularity value of a particular data file is 0% the storage system may store one or more copies of the particular data file since the particular data file is not available at any of the computing devices or cannot be retrieved within a maximum accepted latency etc. Generally, the higher the popularity, the lower the number of copies of the data file stored at the storage system. Further, various popularity value ranges and number of copies that can be stored for each of the ranges can be configured, e.g., by an entity such as an administrator of the storage server.
Referring back to the non-regulated storage system 235a, the storage system 235a includes four copies of “File A,” two copies of “File B” and a copy of “File C.” The server 230 may adjust the number of copies of the above mentioned data files in one or more of the following ways:
Regarding “File A,” the server 230 may determine that “File A” has a high popularity value, e.g., because each of the four computing devices has a copy of “File A”, the availability of network connection with one or more of the computing devices is high, etc. Accordingly, the server 230 may decrease the number of copies of “File A” by half as shown in regulated storage systems 235b-c. In some embodiments, the server 230 may even determine not to store any copy of “File A” in the storage system as shown by example storage system 235d.
Regarding “File B,” the server 230 may determine to retain the same number of copies based on the popularity value of “File B.” Regarding, “File C,” in some embodiments, the popularity value may indicate that that one of copy of “File C” is sufficient to be stored at the storage system, for e.g., because only one computing device needs the file, the file is not accessed as frequently, etc. Accordingly, the server 230 stores only one copy of “File C” as shown in the example storage system 235b. However, in some embodiments, the popularity value of “File C” may change even with just one user, e.g., if the user is travelling and the network connectivity between the fourth computing device 220 and the storage unit in the storage system 235a that contains the copy of “File C” may change when the user is at another geographical location. The popularity value of “File C” can change and therefore can have an effect on the number of copies stored at the storage system. The popularity value may indicate that two copies of the file be maintained at the storage system. Accordingly, the server 230 may add another copy of “File C” at the storage system as shown in regulated storage systems 235c-d. In some embodiments, the server 230 may add another copy of the “File C” in the storage unit of storage systems 235c-d that is closer to the location where the user has travelled to.
In some embodiments, the server 230 determines whether various data files uploaded by different users are similar by using various file comparison techniques such as checksum, hash sum etc. The server 230 generates a checksum for each of the files uploaded to the server 230 for further storage at storage system 235 and stores the checksum of each of the data files in the storage system 235 or in another storage system separate from the storage system 235. The checksums may be calculated for a portion of the data file, e.g., a block of a file or a segment of file that has a plurality of blocks, or a complete data file. Further, the server 230 also stores the identifications of at least one of the user and the computing device which uploaded a particular data file. In some embodiments, the checksums and the identifications of the users and/or computing devices are stored in a data file availability table (not illustrated). The server 230 may use the data file availability table in determining the popularity value and also in determining which of the computing devices has a particular data file.
In some embodiments, the server 230 can use various storage techniques to store data efficiently. One example storage technique can include compression of data files that compresses the data files so that the space consumed by the data file is minimized. The computing devices can include devices such as a smart phone, a digital media player, a laptop, a desktop, a tablet PC etc.
The server 330 has adjusted the number of copies of the data files stored at storage system 350. For example, while the server 330 has stored two copies of “File B” and “File C” no copies of “File A” are stored at the storage system 350, e.g., because the “File A” has a high popularity value due to being available from a number of computing devices.
A computing device such as the third computing device requests the server 330 to retrieve “File A” that it had uploaded earlier. The server 330 determines whether the storage system 350 has a copy of “File A.” If the storage server 350 has a copy of “File A,” then the server obtains the data file from the storage server 350 and serves the data file to the third computing device 315. On the other hand, if the storage server 350 does not have a copy of “File A,” the server 330 determines which of the computing devices has a copy of “File A.” In some embodiments, the availability table 325 includes data specifying which of the computing devices has which of the data files and also data specifying other attributes such as network bandwidth for the computing devices, their network connection availability, associated latency to obtain the data file, etc.
The server 330 checks with the availability table 325 to determine which of the computing devices has a copy of “File A” and identifies a particular computing device from which it can retrieve a copy of “File A.” In some embodiments, the server 330 may select a computing device, e.g., first computing device 305, from which the copy of “File A” can be retrieved from least amount of latency. The server 330 retrieves the copy of “File A” from the first computing device 305 and serves the data file, “File A” to the third computing device 315. The third computing device 315 would not be aware of where the data file is retrieved from. From the perspective of the third computing device 315, the data file, “File A” is retrieved from the storage system 350.
As explained above, the server 400 facilitates storing of data files of the users at a storage system such as storage system 105. The data files can be received from one or more users and also from one or more computing devices of each of the users. For example, a user can be associated with multiple computing devices such as smartphones, digital media players, laptops, desktops, tablet PCs etc. The data files can include files such as images, videos, logs, application configuration files, computing device configuration files etc. The server 400 maintains a certain level of durability of the data files stored at the storage system 105 by regulating a number of copies of the data files stored at the storage system 105. In some embodiments, the more popular the data files are among the users, the lower the number of copies of the data files stored at the storage system 105.
The popularity of a particular data file is measured using a popularity value. The popularity value determination module 450 determines the popularity of the data file based on a number of factors, including one or more of: (a) a number of computing devices associated with one or more of the users that contain the particular data file, (b) a latency associated with reading the particular data file from one or more of the computing devices that contain the particular data file, (c) a network bandwidth available for reading the particular data file from one or more of the computing devices that contain the particular data file, (d) availability of a network connection with one or more of the computing devices that contain the particular data file for reading the particular data file, (e) a number of the users requiring storage for the same data file at the storage system, or (f) access pattern of the particular data file for a specific user or a subset of the users. In some embodiments, one or more the above factors can be weighted relative to each other. The popularity value can be determined in various units and using various mathematical equations. One example expression of a popularity value can include a percentage value.
The data file replication management module 460 determines the number of copies to be maintained at the storage system 105 for a particular data file. Generally, higher the popularity of the data file, lower is the number of copies of the data file stored at the storage system 105. Further, various popularity value ranges and number of copies that can be stored for each of the ranges can be configured, e.g., by an entity such as an administrator of the storage server. Further, in some embodiments, the data file replication management module 460 also maintains an availability table that includes data specifying which of the computing devices has copies of which of the data files, and also includes data specifying other attributes such as network bandwidth for the computing devices, their network connection availability, associated latency to obtain the data file, etc.
Request receiving module 440 receives requests from the users for storing or retrieving data files at/from the storage system 105. In some embodiments, the request receiving module 440 receives the request via the network component that facilitates communication with the computing devices of the users.
Data file serving module 470 responds to the requests from a user for retrieving the data files from storage system 105 by retrieving the data file and serving it to the user. The data file serving module 470 serves the data file by either retrieving the data file from the storage system 105 or from one of the computing devices if the storage system does not have the requested data file. In some embodiments, the data file serving module 470 checks with the availability table to determine which of the computing devices has a copy of the requested data file, and retrieves the copy of data file from one of the identified computing devices.
At step 515, the server determines a popularity of the data file. In some embodiments, the popularity of a data file is measured using a popularity value. The popularity value of the data file is determined based on a number of factors, including one or more of: (a) a number of computing devices associated with one or more of the users that contain the particular data file, (b) a latency associated with reading the particular data file from one or more of the computing devices that contain the particular data file, (c) a network bandwidth available for reading the particular data file from one or more of the computing devices that contain the particular data file, (d) availability of a network connection with one or more of the computing devices that contain the particular data file for reading the particular data file, (e) a number of the users requiring storage for the same data file at the storage system, or (f) access pattern of the particular data file for a specific user or a subset of the users. In some embodiments, one or more the above factors can be weighted relative to each other. The popularity value can be determined in various units and using various mathematical equations. One example expression of a popularity value can include a percentage value.
At step 520, the server 110 determines a number of copies of the data file to be stored at the storage system 105 based on the popularity value. In some embodiments, higher the popularity of the data file, lower is the number of copies of the data file stored at the storage system. For example, if the popularity value of a particular data file is 100%, the storage system may not store any copies of the particular data file since the particular data file is available at all the computing devices of the users and can be retrieved from any of the computing devices at any time. On the other hand, if the popularity value of a particular data file is 0% the storage system may store one or more copies of the particular data file since the particular data file is not available at any of the computing devices or cannot be retrieved within a maximum accepted latency etc. Further, various popularity value ranges and number of copies that can be stored for each of the ranges can be configured, e.g., by an entity such as an administrator of the storage server. For example, a popularity range of 0-9% may have 5 copies, 10-40% may have 4 copies, 41-70% may have 3 copies, 71-95% may have 2 copies and 96-100% may have 0 (Zero) copies.
At step 525, the server 110 regulates or adjusts the number of copies of the data file at the storage system by at least one of: (a) not storing any copy of the data file at the storage system if the popularity value exceeds a first threshold, (b) increasing the number of copies stored at the storage system if the popularity value is below a second threshold, or (c) decreasing the number of copies stored at the storage system if the popularity value exceeds a third threshold. In some embodiments, the number of copies of a data file can be regulated either for a complete data file or for a portion of the data file.
On the other hand, responsive to a determination that the storage system 105 does not have a copy of the entire data file, at step 615, the server 110 determines whether the storage system 105 has a portion of the requested data file. Responsive to a determination that the storage system has a portion of the requested file, e.g., a first block or segment etc., at step 620, the server 110 determines which of the computing devices of other users have a copy of the remaining portions of the requested data file. In some embodiments, the server checks with the availability table to determine which of the computing devices has a copy of the requested data file.
In some embodiments, the server 110 generates a checksum for each of the data files uploaded by the users to the server 110 for storage of the data files. The checksums may be calculated for a portion of the data file, e.g., a block of a file or a segment of file that has a plurality of blocks, or a complete data file. The server 110 stores the checksums of the data files in the availability table. In some embodiments, the server 110 also stores other attributes such as the names of the data file, identifications of the computing devices from which the data files are uploaded, a network bandwidth available for reading the copy of files from the corresponding computing devices, a network availability for connecting with the corresponding computing devices, associated latency etc. Some of the foregoing attributes may be updated periodically.
In some embodiments, the server 110 compares a checksum of the requested data file with the stored checksums of the data files to determine if any of the computing devices has the copy of the requested data file. The server 110 chooses one of the computing devices to retrieve a copy of the requested data file from based on a predefined criterion. For example, the server 110 can choose a computing device from which the copy of the data file can be read with least latency.
At step 625, the server 110 retrieves the copy of the remaining portions of the data file from one of the computing devices. At step 630, the server 110 generates an entire copy of the requested data file using the portions retrieved from the identified computing device and the storage system 105. The server 115 can use various file joining techniques for generating a file using various portions of the file. At step 645, the server 110 serves the copy of the data file to the user.
Referring back to step 615, responsive to a determination that the storage system does not have a portion of the requested file, at step 635, the server 110 determines which of the computing devices of other users have a copy of the entire requested data file. At step 640, the server 110 retrieves the copy of the entire data file from one of the computing devices and, at step 645, the server 110 serves the copy of the data file to the user.
Regardless of whether the data file is retrieved from the storage system 105 or from the computing devices of the users, from the perspective of the user who requested the data file, the user sees the data file as being served from the storage system 105. The user may be unaware of the fact that the data file is retrieved from a computing device of another user.
The memory 710 and storage devices 720 are computer-readable storage media that may store instructions that implement at least portions of the described technology. In addition, the data structures and message structures may be stored or transmitted via a data transmission medium, such as a signal on a communications link. Various communications links may be used, such as the Internet, a local area network, a wide area network, or a point-to-point dial-up connection. Thus, computer-readable media can include computer-readable storage media (e.g., “non-transitory” media) and computer-readable transmission media.
The instructions stored in memory 710 can be implemented as software and/or firmware to program the processor(s) 705 to carry out actions described above. In some embodiments, such software or firmware may be initially provided to the processing system 700 by downloading it from a remote system through the computing system 700 (e.g., via network adapter 730).
The technology introduced herein can be implemented by, for example, programmable circuitry (e.g., one or more microprocessors) programmed with software and/or firmware, or entirely in special-purpose hardwired (non-programmable) circuitry, or in a combination of such forms. Special-purpose hardwired circuitry may be in the form of, for example, one or more ASICs, PLDs, FPGAs, etc.
RemarksThe above description and drawings are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding of the disclosure. However, in certain instances, well-known details are not described in order to avoid obscuring the description.
Further, various modifications may be made without deviating from the scope of the invention. Accordingly, the invention is not limited except as by the appended claims.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
The terms used in this specification generally have their ordinary meanings in the art, within the context of the disclosure, and in the specific context where each term is used. Certain terms that are used to describe the disclosure are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner regarding the description of the disclosure. For convenience, certain terms may be highlighted, for example using italics and/or quotation marks. The use of highlighting has no influence on the scope and meaning of a term; the scope and meaning of a term is the same, in the same context, whether or not it is highlighted. It will be appreciated that the same thing can be said in more than one way. One will recognize that “memory” is one form of a “storage” and that the terms may on occasion be used interchangeably.
Consequently, alternative language and synonyms may be used for any one or more of the terms discussed herein, nor is any special significance to be placed upon whether or not a term is elaborated or discussed herein. Synonyms for certain terms are provided. A recital of one or more synonyms does not exclude the use of other synonyms. The use of examples anywhere in this specification including examples of any term discussed herein is illustrative only, and is not intended to further limit the scope and meaning of the disclosure or of any exemplified term. Likewise, the disclosure is not limited to various embodiments given in this specification.
Those skilled in the art will appreciate that the logic illustrated in each of the flow diagrams discussed above, may be altered in various ways. For example, the order of the logic may be rearranged, substeps may be performed in parallel, illustrated logic may be omitted; other logic may be included, etc.
Without intent to further limit the scope of the disclosure, examples of instruments, apparatus, methods and their related results according to the embodiments of the present disclosure are given below. Note that titles or subtitles may be used in the examples for convenience of a reader, which in no way should limit the scope of the disclosure. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. In the case of conflict, the present document, including definitions will control.
Claims
1. A method of regulating a data storage service, the method comprising:
- receiving, at a server, a data file from one or more computing devices;
- storing, by the server, the data file at a storage system provided by the data storage service;
- determining, by the server, a number of copies of the data file to be stored at the storage system based on a number of a set of computing devices that store the data file, the set of computing devices being outside of the storage system, wherein the determined number of copies of the data file to be stored at the storage system decreases when the number of the set of computing devices that store the data file increases; and
- adjusting, by the server, the number of copies of the data file stored at the storage system based on the determined number of copies of the data file.
2. The method of claim 1, wherein the one or more computing devices are associated with one or more users.
3. The method of claim 3, wherein the number of copies of the data file to be stored at the storage system is determined further based on a number of the one or more users associated with the one or more computing devices that send the data file to the server to be stored at the storage system.
4. The method of claim 1, wherein the number of copies of the data file include copies of a portion of the data file.
5. The method of claim 1, further comprising determining a value of the data file as a function of the number of the set of computing devices that store the data file, wherein the number of copies of the data file to be stored at the storage system is determined based on the value of the data file.
6. The method of claim 5, wherein adjusting the number of copies of the date file stored at the storage system includes increasing the number of copies of the data file stored at the storage system according to a value range to which the value of the data file corresponds.
7. The method of claim 5, wherein adjusting the number of copies of the data file stored at the storage system includes decreasing the number of copies of the data file stored at the storage system according to a value range to which the value of the data file corresponds.
8. The method of claim 5, wherein the determining of the value of the data file comprises determining the value as a function of a latency associated with reading the data file from one or more of the set of computing devices that store the data file.
9. The method of claim 5, wherein the determining of the value of the data file comprises determining the value as a function of a network bandwidth available for reading the data file from one or more of the set of computing devices that store the data file.
10. The method of claim 5, wherein the determining of the value of the data file comprises determining the value as a function of availability of a network connection with one or more of the set of computing devices that store the data file for reading the data file.
11. The method of claim 5, wherein the determining of the value of the data file comprises determining the value as a function of access pattern of the data file for a specific user.
12. The method of claim 5, wherein the determining of the value of the data file comprises determining the value as a function of access pattern of the data file for a subset of one or more users associated with the one or more computing devices.
13. An apparatus for regulating a data storage service, the apparatus comprising:
- a memory; and
- at least one processor coupled to the memory and configured to: receive a data file from one or more computing devices; store the data file at a storage system provided by the data storage service; determine a number of copies of the data file to be stored at the storage system based on a number of a set of computing devices that store the data file, the set of computing devices being outside of the storage system, wherein the determined number of copies of the data file to be stored at the storage system decreases when the number of the set of computing devices that store the data file increases; and adjust the number of copies of the data file stored at the storage system based on the determined number of copies of the data file.
14. The apparatus of claim 13, wherein the one or more computing devices are associated with one or more users, wherein the number of copies of the data file to be stored at the storage system is determined further based on a number of the one or more users associated with the one or more computing devices that send the data file to the apparatus to be stored at the storage system.
15. The apparatus of claim 13, wherein the at least one processor is further configured to determine a value of the data file as a function of the number of the set of computing devices that store the data file, wherein the number of copies of the data file to be stored at the storage system is determined based on the value of the data file.
16. The apparatus of claim 15, wherein, to determine the value of the data file, the at least one processor is configured to determine the value as a function of a latency associated with reading the data file from one or more of the set of computing devices that store the data file.
17. The apparatus of claim 15, wherein, to determine the value of the data file, the at least one processor is configured to determine the value as a function of a network bandwidth available for reading the data file from one or more of the set of computing devices that store the data file.
18. The apparatus of claim 15, wherein, to determine the value of the data file, the at least one processor is configured to determine the value as a function of availability of a network connection with one or more of the set of computing devices that store the data file for reading the data file.
19. The apparatus of claim 15, wherein, to determine the value of the data file, the at least one processor is configured to determine the value as a function of access pattern of the data file for a specific user.
20. The apparatus of claim 15, wherein, to determine the value of the data file, the at least one processor is configured to determine the value as a function of access pattern of the data file for a subset of one or more users associated with the one or more computing devices.
Type: Application
Filed: Mar 12, 2019
Publication Date: Jul 11, 2019
Inventor: Justin QUAN (San Francisco, CA)
Application Number: 16/299,597