SYSTEMS AND METHODS FOR DISTRIBUTING AND SECURING DATA
A robust computational secret sharing scheme that provides for the efficient distribution and subsequent recovery of a private data is disclosed. A cryptographic key may be randomly generated and then shared using a secret sharing algorithm to generate a collection of key shares. The private data may be encrypted using the key, resulting in a ciphertext. The ciphertext may then be broken into ciphertext fragments using an Information Dispersal Algorithm. Each key share and a corresponding ciphertext. Fragment are provided as input to a committal method of a probabilistic commitment scheme, resulting in a committal value and a decommittal value. The share for the robust computational secret sharing scheme may be obtained by combining the key share, the ciphertext fragment, the decommittal value, and the vector of committal values.
This application is a continuation of Ser. No. 13/412,111, filed on Mar. 5, 2012, which is a continuation of U.S. patent application Ser. No. 11/983,355, filed on Nov. 7, 2007 (now U.S. Pat. No. 8,155,322), which claims the benefit of U.S. provisional application No. 60/857,345, filed on Nov. 7, 2006. Each of the above-referenced applications is hereby incorporated by reference herein in its entirety.
FIELD OF THE INVENTIONThe present invention relates in general to a system for securing data from unauthorized access or use. The present invention also relates generally to cryptographic techniques for the construction of secret sharing schemes, and more particularly to systems and methods for supporting a secret sharing scheme that can tolerate damage to one or more shares.
BACKGROUND OF THE INVENTIONIn today's society, individuals and businesses conduct an ever-increasing amount of activities on and over computer systems. These computer systems, including proprietary and non-proprietary computer networks, are often storing, archiving, and transmitting all types of sensitive information. Thus, an ever-increasing need exists for ensuring data stored and transmitted over these systems cannot be read or otherwise compromised.
One common solution for securing computer systems is to provide login and password functionality. However, password management has proven to be quite costly with a large percentage of help desk calls relating to password issues. Moreover, passwords provide little security in that they are generally stored in a file susceptible to inappropriate access, through, for example, brute-force attacks.
Another solution for securing computer systems is to provide cryptographic infrastructures. Cryptography, in general, refers to protecting data by transforming, or encrypting, it into an unreadable format. Only those who possess the key(s) to the encryption can decrypt the data into a useable format. Cryptography is used to identify users, e.g., authentication, to allow access privileges, e.g., authorization, to create digital certificates and signatures, and the like. One popular cryptography system is a public key system that uses two keys, a public key known to everyone and a private key known only to the individual or business owner thereof. Generally, the data encrypted with one key is decrypted with the other and neither key is recreatable from the other.
Unfortunately, even the foregoing typical public-key cryptographic systems are still highly reliant on the user for security. For example, cryptographic systems issue the private key to the user, for example, through the user's browser. Unsophisticated users then generally store the private key on a hard drive accessible to others through an open computer system, such as, for example, the Internet. On the other hand, users may choose poor names for files containing their private key, such as, for example, “key.” The result of the foregoing and other acts is to allow the key or keys to be susceptible to compromise.
In addition to the foregoing compromises, a user may save his or her private key on a computer system configured with an archiving or backup system, potentially resulting in copies of the private key traveling through multiple computer storage devices or other systems. This security breach is often referred to as “key migration.” Similar to key migration, many applications provide access to a user's private key through, at most, simple login and password access. As mentioned in the foregoing, login and password access often does not provide adequate security.
One solution for increasing the security of the foregoing cryptographic systems is to include biometrics as part of the authentication or authorization. Biometrics generally include measurable physical characteristics, such as, for example, finger prints or speech that can be checked by an automated system, such as, for example, pattern matching or recognition of finger print patterns or speech patterns. In such systems, a user's biometric and/or keys may be stored on mobile computing devices, such as, for example, a smartcard, laptop, personal digital assistant, or mobile phone, thereby allowing the biometric or keys to be usable in a mobile environment.
The foregoing mobile biometric cryptographic system still suffers from a variety of drawbacks. For example, the mobile user may lose or break the smartcard or portable computing device, thereby having his or her access to potentially important data entirely cut-off. Alternatively, a malicious person may steal the mobile user's smartcard or portable computing device and use it to effectively steal the mobile user's digital credentials. On the other hand, the portable-computing device may be connected to an open system, such as the Internet, and, like passwords, the file where the biometric is stored may be susceptible to compromise through user inattentiveness to security or malicious intruders.
One way to secure data from unauthorized access or unauthorized use is to use a secret sharing scheme. A secret sharing scheme is a method to split a sensitive piece of data (e.g., confidential files, an encryption key, or any type of communication), sometimes called the secret, into a collection of pieces, called shares, such that that possession of a sufficient number of shares enables recovery of the secret, but possession of an insufficient number of shares provides little or no information about the secret that was shared. Such schemes are important tools in cryptography and information security.
Formally, a secret sharing scheme consists of a pair of algorithms, the sharing algorithm Share and the recovery algorithm Recover. The sharing algorithm is typically probabilistic (meaning that it makes randomized choices), and the recovery algorithm is typically deterministic. The sharing algorithm may be used to disassemble, or split, the secret into a collection of shares, and the recovery algorithm may be used to reassemble those shares. At reassembly time, each share may be present, in which case a string may be provided to the recovery algorithm, or a share may be missing, in which case a designated value (referred to as “⋄” herein) may be provided to the recovery algorithm. A set of players that is authorized to recover the secret is called an authorized set, and the set of all such players is sometimes called an access structure.
Secret sharing schemes have been designed to work on various access structures, but the most common access structure is a threshold access structure, where any subset of m or more players, out of a total of n players in all, are said to be authorized. A secret sharing scheme for a threshold access structure is sometimes called a threshold scheme. There are two security properties for any secret sharing scheme: a privacy property and a recoverability property. The privacy property ensures that unauthorized coalitions of players do not learn anything useful about the secret. The recoverability property ensures that authorized coalitions of players can ultimately recover the underlying secret.
Shamir's secret sharing scheme is said to be a perfect secret sharing (PSS) scheme. The term “perfect” refers to the privacy guarantee being information theoretic and without any error; thus, unauthorized coalitions of players may learn nothing useful about the underlying secret in PSS schemes.
One limitation with PSS schemes is that the size of each share must be at least as long as the size of the secret that is being shared. When the secret includes a large file or long string of characters, however, this limitation can become unwieldy, increasing overall complexity of the system. In response to this limitation, schemes for computational secret sharing (CSS) have been developed.
Krawczyk's CSS scheme, for example, permits the shares to be shorter than the secret. For example, in a 2-out-of-3 threshold scheme (meaning that any two of three shares are adequate for recovering the secret), the secret S can be divided into shares of size about |S|/2 bits, where |S| denotes the length of S. Shares this short are not possible in the PSS setting. In CSS schemes, however, the privacy property may no longer be absolute and information theoretic; rather, an unauthorized coalition of players may obtain a small amount of information about the shared secret from their shares. But, under a computational complexity assumption, the amount of information will be negligible and therefore, in practice, not much of a concern.
A second limitation of PSS schemes concerns the lack of mandated robustness. Robustness means that a faulty or adversarial participant is unable to force the recovery of an incorrect secret. The model for PSS assumes that each share is either “correct” or “missing”, but it may never be wrong (e.g., corrupt or intentional altered). In practice, this is a highly unreasonable assumption because shares may be wrong due to any number of factors, including, for example, errors in storage, noise in a communications channel, or due to genuinely adversarial activities. In addition, the lack of robustness is not just a theoretical possibility, but a genuine problem for typical PSS schemes, including Shamir's secret sharing scheme. With Shamir's scheme, an adversary can in fact force the recovery of any desired secret by appropriately changing just one share. Practical applications of secret sharing schemes typically require robustness.
SUMMARY OF THE INVENTIONBased on the foregoing, robust computational secret sharing schemes that are simultaneously efficient and have strong provable-security properties under weak cryptographic assumptions are needed.
Accordingly, one aspect of the present invention is to provide a method for securing virtually any type of data from unauthorized access or use. The method comprises one or more steps of parsing, splitting and/or separating the data to be secured into two or more parts or portions. The method also comprises encrypting the data to be secured. Encryption of the data may be performed prior to or after the first parsing, splitting and/or separating of the data. In addition, the encrypting step may be repeated for one or more portions of the data. Similarly, the parsing, splitting and/or separating steps may be repeated for one or more portions of the data. The method also optionally comprises storing the parsed, split and/or separated data that has been encrypted in one location or in multiple locations. This method also optionally comprises reconstituting or re-assembling the secured data into its original form for authorized access or use. This method may be incorporated into the operations of any computer, server, engine or the like, that is capable of executing the desired steps of the method.
Another aspect of the present invention provides a system for securing virtually any type of data from unauthorized access or use. This system comprises a data splitting module, a cryptographic handling module, and, optionally, a data assembly module. The system may, in one embodiment, further comprise one or more data storage facilities where secure data may be stored.
Another aspect of the invention includes using any suitable parsing and splitting algorithm to generate shares of data. Either random, pseudo-random, deterministic, or any combination thereof may be employed for parsing and splitting data.
In yet other embodiments, an n-party secret sharing scheme with message space S is provided. A family of adversaries, A, may be defined. The n-party secret sharing scheme may include one or more of the following five primitives: (1) a symmetric encryption algorithm with k-bit keys and message space S; (2) an n-party PSS algorithm over adversaries A with a message space {0,1}k; (3) an n-party information dispersal algorithm (IDA); (4) an n-party error correction code (ECC) over adversaries A with a message space {0,1}h; and (5) a randomized (or probabilistic) commitment scheme. Data may be secured by first applying a computational secret sharing algorithm to the data to be secured. A random or pseudo-random value may then be generated. From the output of the secret sharing algorithm and the random or pseudo-random value, a set of committal values and decommital values may be computed. A plurality of shares may then be formed by combining a share output from the secret sharing algorithm, a decommittal value, and one or more committal values. The shares may then be stored at one or more physical locations (e.g., on a magnetic hard disk drive), or one or more geographic locations (e.g., different data repositories or servers).
In some embodiments, a probabilistic commitment scheme may be used to compute the set of committal values and a set of decommittal values. Each share may be defined by a share output from a computational secret sharing algorithm, a decommittal value, and one or more committal values from the set of committal values.
In some embodiments, a cryptographic key may be generated and used to encrypt user data to create a ciphertext portion. A set of n key shares may be created by applying a secret sharing algorithm to the cryptographic key. A set of n ciphertext chunks may then be created by applying an information dispersal algorithm (IDA) to the ciphertext. A set of n committal values and n decommittal values may be computed by applying a probabilistic commitment scheme to each of the n key shares and ciphertext chunks. N data fragments may be formed, where each data fragment may be a function of a key share, a ciphertext, a decommittal value, and one or more committal values. Finally, the data fragments may be stored on one or more logical storage devices (e.g., n logical storage devices). One or more of these logical storage devices may be situated at different geographic or physical locations. The user data may then be reconstituted by combining at least a predefined number of data fragments. In some embodiments, various error-correcting codes may be used to provide an adequate collection of committal values for each player.
The present invention is described in more detail below in connection with the attached drawings, which are meant to illustrate and not to limit the invention, and in which:
One aspect of the present invention is to provide a cryptographic system where one or more secure servers, or a trust engine, stores cryptographic keys and user authentication data. Users access the functionality of conventional cryptographic systems through network access to the trust engine, however, the trust engine does not release actual keys and other authentication data and therefore, the keys and data remain secure. This server-centric storage of keys and authentication data provides for user-independent security, portability, availability, and straightforwardness.
Because users can be confident in, or trust, the cryptographic system to perform user and document authentication and other cryptographic functions, a wide variety of functionality may be incorporated into the system. For example, the trust engine provider can ensure against agreement repudiation by, for example, authenticating the agreement participants, digitally signing the agreement on behalf of or for the participants, and storing a record of the agreement digitally signed by each participant. In addition, the cryptographic system may monitor agreements and determine to apply varying degrees of authentication, based on, for example, price, user, vendor, geographic location, place of use, or the like.
To facilitate a complete understanding of the invention, the remainder of the detailed description describes the invention with reference to the figures, wherein like elements are referenced with like numerals throughout.
According to one embodiment of the invention, the user system 105 comprises a conventional general-purpose computer having one or more microprocessors, such as, for example, an Intel-based processor. Moreover, the user system 105 includes an appropriate operating system, such as, for example, an operating system capable of including graphics or windows, such as Windows, Unix, Linux, or the like. As shown in
In addition, the user system 105 may connect to the communication link 125 through a conventional service provider, such as, for example, a dial up, digital subscriber line (DSL), cable modem, fiber connection, or the like. According to another embodiment, the user system 105 connects the communication link 125 through network connectivity such as, for example, a local or wide area network. According to one embodiment, the operating system includes a TCP/IP stack that handles all incoming and outgoing message traffic passed over the communication link 125.
Although the user system 105 is disclosed with reference to the foregoing embodiments, the invention is not intended to be limited thereby. Rather, a skilled artisan will recognize from the disclosure herein, a wide number of alternatives embodiments of the user system 105, including almost any computing device capable of sending or receiving information from another computer system. For example, the user system 105 may include, but is not limited to, a computer workstation, an interactive television, an interactive kiosk, a personal mobile computing device, such as a digital assistant, mobile phone, laptop, or the like, a wireless communications device, a smartcard, an embedded computing device, or the like, which can interact with the communication link 125. In such alternative systems, the operating systems will likely differ and be adapted for the particular device. However, according to one embodiment, the operating systems advantageously continue to provide the appropriate communications protocols needed to establish communication with the communication link 125.
According to the embodiment where the user produces biometric data, the user provides a physical characteristic, such as, but not limited to, facial scan, hand scan, ear scan, iris scan, retinal scan, vascular pattern, DNA, a fingerprint, writing or speech, to the biometric device 107. The biometric device advantageously produces an electronic pattern, or biometric, of the physical characteristic. The electronic pattern is transferred through the user system 105 to the trust engine 110 for either enrollment or authentication purposes.
Once the user produces the appropriate authentication data and the trust engine 110 determines a positive match between that authentication data (current authentication data) and the authentication data provided at the time of enrollment (enrollment authentication data), the trust engine 110 provides the user with complete cryptographic functionality. For example, the properly authenticated user may advantageously employ the trust engine 110 to perform hashing, digitally signing, encrypting and decrypting (often together referred to only as encrypting), creating or distributing digital certificates, and the like. However, the private cryptographic keys used in the cryptographic functions will not be available outside the trust engine 110, thereby ensuring the integrity of the cryptographic keys.
According to one embodiment, the trust engine 110 generates and stores cryptographic keys. According to another embodiment, at least one cryptographic key is associated with each user. Moreover, when the cryptographic keys include public-key technology, each private key associated with a user is generated within, and not released from, the trust engine 110. Thus, so long as the user has access to the trust engine 110, the user may perform cryptographic functions using his or her private or public key. Such remote access advantageously allows users to remain completely mobile and access cryptographic functionality through practically any Internet connection, such as cellular and satellite phones, kiosks, laptops, hotel rooms and the like.
According to another embodiment, the trust engine 110 performs the cryptographic functionality using a key pair generated for the trust engine 110. According to this embodiment, the trust engine 110 first authenticates the user, and after the user has properly produced authentication data matching the enrollment authentication data, the trust engine 110 uses its own cryptographic key pair to perform cryptographic functions on behalf of the authenticated user.
A skilled artisan will recognize from the disclosure herein that the cryptographic keys may advantageously include some or all of symmetric keys, public keys, and private keys. In addition, a skilled artisan will recognize from the disclosure herein that the foregoing keys may be implemented with a wide number of algorithms available from commercial technologies, such as, for example, RSA, ELGAMAL, or the like.
According to another embodiment, the trust engine 110 internally performs certificate issuances. In this embodiment, the trust engine 110 may access a certificate system for generating certificates and/or may internally generate certificates when they are requested, such as, for example, at the time of key generation or in the certificate standard requested at the time of the request. The trust engine 110 will be disclosed in greater detail below.
Although the vendor system 120 is disclosed with reference to the foregoing embodiments, the invention is not intended to be limited thereby. Rather, a skilled artisan will recognize from the disclosure herein that the vendor system 120 may advantageously comprise any of the devices described with reference to the user system 105 or combination thereof.
One popular part of the Internet is the World Wide Web. The World Wide Web contains different computers, which store documents capable of displaying graphical and textual information. The computers that provide information on the World Wide Web are typically called “websites.” A website is defined by an Internet address that has an associated electronic page. The electronic page can be identified by a Uniform Resource Locator (URL). Generally, an electronic page is a document that organizes the presentation of text, graphical images, audio, video, and so forth.
Although the communication link 125 is disclosed in terms of its preferred embodiment, one of ordinary skill in the art will recognize from the disclosure herein that the communication link 125 may include a wide range of interactive communications links. For example, the communication link 125 may include interactive television networks, telephone networks, wireless data transmission systems, two-way cable systems, customized private or public computer networks, interactive kiosk networks, automatic teller machine networks, direct links, satellite or cellular networks, and the like.
According to one embodiment, the transaction engine 205 comprises a data routing device, such as a conventional Web server available from Netscape, Microsoft, Apache, or the like. For example, the Web server may advantageously receive incoming data from the communication link 125. According to one embodiment of the invention, the incoming data is addressed to a front-end security system for the trust engine 110. For example, the front-end security system may advantageously include a firewall, an intrusion detection system searching for known attack profiles, and/or a virus scanner. After clearing the front-end security system, the data is received by the transaction engine 205 and routed to one of the depository 210, the authentication engine 215, the cryptographic engine 220, and the mass storage 225. In addition, the transaction engine 205 monitors incoming data from the authentication engine 215 and cryptographic engine 220, and routes the data to particular systems through the communication link 125. For example, the transaction engine 205 may advantageously route data to the user system 105, the certificate authority 115, or the vendor system 120.
According to one embodiment, the data is routed using conventional HTTP routing techniques, such as, for example, employing URLs or Uniform Resource Indicators (URIs). URIs are similar to URLs, however, URIs typically indicate the source of files or actions, such as, for example, executables, scripts, and the like. Therefore, according to the one embodiment, the user system 105, the certificate authority 115, the vendor system 120, and the components of the trust engine 210, advantageously include sufficient data within communication URLs or URIs for the transaction engine 205 to properly route data throughout the cryptographic system.
Although the data routing is disclosed with reference to its preferred embodiment, a skilled artisan will recognize a wide number of possible data routing solutions or strategies. For example, XML or other data packets may advantageously be unpacked and recognized by their format, content, or the like, such that the transaction engine 205 may properly route data throughout the trust engine 110. Moreover, a skilled artisan will recognize that the data routing may advantageously be adapted to the data transfer protocols conforming to particular network systems, such as, for example, when the communication link 125 comprises a local network.
According to yet another embodiment of the invention, the transaction engine 205 includes conventional SSL encryption technologies, such that the foregoing systems may authenticate themselves, and vise-versa, with transaction engine 205, during particular communications. As will be used throughout this disclosure, the term “½ SSL” refers to communications where a server but not necessarily the client, is SSL authenticated, and the term “FULL SSL” refers to communications where the client and the server are SSL authenticated. When the instant disclosure uses the term “SSL”, the communication may comprise ½ or FULL SSL.
As the transaction engine 205 routes data to the various components of the cryptographic system 100, the transaction engine 205 may advantageously create an audit trail. According to one embodiment, the audit trail includes a record of at least the type and format of data routed by the transaction engine 205 throughout the cryptographic system 100. Such audit data may advantageously be stored in the mass storage 225.
According to one embodiment, the communication from the transaction engine 205 to and from the authentication engine 215 and the cryptographic engine 220 comprises secure communication, such as, for example conventional SSL technology. In addition, as mentioned in the foregoing, the data of the communications to and from the depository 210 may be transferred using URLs, URIs, HTTP or XML documents, with any of the foregoing advantageously having data requests and formats embedded therein.
As mentioned above, the depository 210 may advantageously comprises a plurality of secure data storage facilities. In such an embodiment, the secure data storage facilities may be configured such that a compromise of the security in one individual data storage facility will not compromise the cryptographic keys or the authentication data stored therein. For example, according to this embodiment, the cryptographic keys and the authentication data are mathematically operated on so as to statistically and substantially randomize the data stored in each data storage facility. According to one embodiment, the randomization of the data of an individual data storage facility renders that data undecipherable. Thus, compromise of an individual data storage facility produces only a randomized undecipherable number and does not compromise the security of any cryptographic keys or the authentication data as a whole.
According to one embodiment, the communications to the authentication engine comprise secure communications, such as, for example, SSL technology. Additionally, security can be provided within the trust engine 110 components, such as, for example, super-encryption using public key technologies. For example, according to one embodiment, the user encrypts the current authentication data with the public key of the authentication engine 215. In addition, the depository 210 also encrypts the enrollment authentication data with the public key of the authentication engine 215. In this way, only the authentication engine's private key can be used to decrypt the transmissions.
As shown in
According to one embodiment, communications to and from the cryptographic engine include secure communications, such as SSL technology. In addition, XML documents may advantageously be employed to transfer data and/or make cryptographic function requests.
Although the trust engine 110 is disclosed with reference to its preferred and alternative embodiments, the invention is not intended to be limited thereby. Rather, a skilled artisan will recognize in the disclosure herein, a wide number of alternatives for the trust engine 110. For example, the trust engine 110, may advantageously perform only authentication, or alternatively, only some or all of the cryptographic functions, such as data encryption and decryption. According to such embodiments, one of the authentication engine 215 and the cryptographic engine 220 may advantageously be removed, thereby creating a more straightforward design for the trust engine 110. In addition, the cryptographic engine 220 may also communicate with a certificate authority such that the certificate authority is embodied within the trust engine 110. According to yet another embodiment, the trust engine 110 may advantageously perform authentication and one or more cryptographic functions, such as, for example, digital signing.
According to another embodiment, the depository 210 may comprise distinct and physically separated data storage facilities, as disclosed further with reference to
Moreover, the nature of biometric data comparisons may result in varying degrees of confidence being produced from the matching of current biometric authentication data to enrollment data. For example, unlike a traditional password which may only return a positive or negative match, a fingerprint may be determined to be a partial match, e.g. a 90% match, a 75% match, or a 10% match, rather than simply being correct or incorrect. Other biometric identifiers such as voice print analysis or face recognition may share this property of probabilistic authentication, rather than absolute authentication.
When working with such probabilistic authentication or in other cases where an authentication is considered less than absolutely reliable, it is desirable to apply the heuristics 530 to determine whether the level of confidence in the authentication provided is sufficiently high to authenticate the transaction which is being made.
It will sometimes be the case that the transaction at issue is a relatively low value transaction where it is acceptable to be authenticated to a lower level of confidence. This could include a transaction which has a low dollar value associated with it (e.g., a $10 purchase) or a transaction with low risk (e.g., admission to a members-only web site).
Conversely, for authenticating other transactions, it may be desirable to require a high degree of confidence in the authentication before allowing the transaction to proceed. Such transactions may include transactions of large dollar value (e.g., signing a multi-million dollar supply contract) or transaction with a high risk if an improper authentication occurs (e.g., remotely logging onto a government computer).
The use of the heuristics 530 in combination with confidence levels and transactions values may be used as will be described below to allow the comparator to provide a dynamic context-sensitive authentication system.
According to another embodiment of the invention, the comparator 515 may advantageously track authentication attempts for a particular transaction. For example, when a transaction fails, the trust engine 110 may request the user to re-enter his or her current authentication data. The comparator 515 of the authentication engine 215 may advantageously employ an attempt limiter 535 to limit the number of authentication attempts, thereby prohibiting brute-force attempts to impersonate a user's authentication data. According to one embodiment, the attempt limiter 535 comprises a software module monitoring transactions for repeating authentication attempts and, for example, limiting the authentication attempts for a given transaction to three. Thus, the attempt limiter 535 will limit an automated attempt to impersonate an individual's authentication data to, for example, simply three “guesses.” Upon three failures, the attempt limiter 535 may advantageously deny additional authentication attempts. Such denial may advantageously be implemented through, for example, the comparator 515 returning a negative result regardless of the current authentication data being transmitted. On the other hand, the transaction engine 205 may advantageously block any additional authentication attempts pertaining to a transaction in which three attempts have previously failed.
The authentication engine 215 also includes the data splitting module 520 and the data assembling module 525. The data splitting module 520 advantageously comprises a software, hardware, or combination module having the ability to mathematically operate on various data so as to substantially randomize and split the data into portions. According to one embodiment, original data is not recreatable from an individual portion. The data assembling module 525 advantageously comprises a software, hardware, or combination module configured to mathematically operate on the foregoing substantially randomized portions, such that the combination thereof provides the original deciphered data. According to one embodiment, the authentication engine 215 employs the data splitting module 520 to randomize and split enrollment authentication data into portions, and employs the data assembling module 525 to reassemble the portions into usable enrollment authentication data.
The cryptographic engine 220 also comprises a cryptographic handling module 625 configured to perform one, some or all of a wide number of cryptographic functions. According to one embodiment, the cryptographic handling module 625 may comprise software modules or programs, hardware, or both. According to another embodiment, the cryptographic handling module 625 may perform data comparisons, data parsing, data splitting, data separating, data hashing, data encryption or decryption, digital signature verification or creation, digital certificate generation, storage, or requests, cryptographic key generation, or the like. Moreover, a skilled artisan will recognize from the disclosure herein that the cryptographic handling module 825 may advantageously comprises a public-key infrastructure, such as Pretty Good Privacy (PGP), an RSA-based public-key system, or a wide number of alternative key management systems. In addition, the cryptographic handling module 625 may perform public-key encryption, symmetric-key encryption, or both. In addition to the foregoing, the cryptographic handling module 625 may include one or more computer programs or modules, hardware, or both, for implementing seamless, transparent, interoperability functions.
A skilled artisan will also recognize from the disclosure herein that the cryptographic functionality may include a wide number or variety of functions generally relating to cryptographic key management systems.
A robust computational secret sharing (RCSS) scheme is illustrated in
When a party wishes to recover the secret that was distributed on logical data repository 720, entity 740 may attempt to collect the shares. First collected share S*[1] 744 may be the same as share 704, but it also could differ due to unintentional modification in transmission or storage (e.g., data corruption), or intentional modification due to the activities of an adversarial agent. Similarly, second collected share S*[2] 745 may be the same as share 705, and last share S*[n] 746 may be the same as share 706, but these shares could also differ for similar reasons. In addition to the possibility of being a “wrong” share, one or more shares in collection 743 could also be the distinguished value “missing”, represented by the symbol “⋄”. This symbol may indicate that the system (e.g., entity 740) is unable to find or collect that particular share. The vector of purported shares S* may then be provided to recovery algorithm 742 of the RCSS scheme, which may return either recovered secret S* 741 or the value designated as invalid 747. The shared secret 701 should equal the recovered secret 741 unless the degree of adversarial activity in corrupting shares exceeds that which the scheme was designed to withstand.
The RCSS goal is useful across two major domains: securing data at rest and securing data in motion. In the former scenario, a file server, for example, maintains its data on a variety of remote servers. Even if some subset of those servers are corrupted (for example, by dishonest administrators) or unavailable (for example, due to a network outage), data may still be both available and private. In the data-in-motion scenario, the sender of a secret message and the receiver of the message may be connected by a multiplicity of paths, only some of which may be observed by the adversary. By sending the shares over these different paths, the sender may securely transmit the secret S despite the possibility of some paths being temporarily unavailable or adversarially controlled. For example, in some embodiments, each share may be transmitted over a different logical communication channel. Systems and methods for securing data, and in particular systems and methods for securing data in motion, are described in more detail in U.S. patent application Ser. No. 10/458,928, filed Jun. 11, 2003, U.S. patent application Ser. No. 11/258,839, filed Oct. 25, 2005, and U.S. patent application Ser. No. 11/602,667, filed Nov. 20, 2006. The disclosures of each of the aforementioned earlier-filed patent applications is hereby incorporated by reference herein in their entireties.
Although at least one RCSS scheme with short share sizes has been proposed by Krawczyk, the scientific study of that scheme reveals that it is not a valid RCSS scheme under weak assumptions on the encryption scheme, and it is not known to be a valid scheme for all access structures (e.g., access structures other than the threshold schemes). For at least these reasons,
The mechanism of the ESX or HK2 approach may include a robust computational secret sharing scheme that may be constructed from the following five primitives: (1) a random or pseudo-random number generator, (2) an encryption scheme; (3) a perfect secret sharing (PSS) scheme; (4) an information dispersal algorithm (IDA); and (5) a probabilistic commitment scheme. These five primitives are described in more detail below.
(1) A random or pseudo-random number generator, Rand. Such a number generator may take a number k as input and returns k random or pseudorandom bits. In
(2) An encryption scheme, which may include a pair of algorithms, one called Encrypt and the other called Decrypt. The encryption algorithm Encrypt may take a key K of a given length k and an input message M that is referred to as the plaintext. The Encrypt algorithm may return a string C that is referred to as the ciphertext. The Encrypt algorithm may optionally employ random bits, but such random bits are not expressly shown in the drawings. The decryption algorithm Decrypt may take a key K of a given length k and an input message C that is referred to as the ciphertext. The Decrypt algorithm may return a string M that is referred to as the plaintext. In some cases, the decryption algorithm may return a designated failure value, which may indicate that the ciphertext C does not correspond to the encryption of any possible plaintext.
(3) A perfect secret sharing (PSS) scheme, which may include a pair of algorithms SharePSS and RecoverPSS. The first of these algorithms, known as the sharing algorithm of the PSS, may be a probabilistic map that takes as input a string K, called the secret, and returns a sequence of n strings, K[1], . . . , K[n], referred to as shares. Each K[i] may include one share or the n shares that have been dealt, or distributed, by the dealer (the entity carrying out the sharing process). The number n may be a user-programmable parameter of the secret sharing scheme, and it may include any suitable positive number. In some embodiments, the sharing algorithm is probabilistic in that it employs random or pseudo-random bits. Such a dependency can be realized by providing the sharing algorithm random or pseudo-random bits, as provided by the Rand algorithm. The second algorithm, known as the recovery algorithm of the PSS, may take as input a vector of n strings referred to as the purported shares. Each purported share is either a string or a distinguished symbol “⋄” which is read as missing. This symbol may be used to indicate that some particular share is unavailable. The recovery algorithm for the perfect secret sharing scheme may return a string S, or the recovered secret. Two properties of the PSS scheme may be assumed. The first property, the privacy property, ensures that no unauthorized set of users obtains any useful information about the secret that was shared from their shares. The second property, the recoverability property, ensures that an authorized set of parties can always recover the secret, assuming that the authorized parties contribute correct shares to the recovery algorithm and that any additional party contributes either a correct share or the distinguished missing (“⋄”) value. This PSS scheme may include the Shamir scheme commonly referred to as “Shamir Secret Sharing” or the Blakley secret sharing scheme.
(4) An information dispersal algorithm (IDA), which may include a pair of algorithms ShareIDA and RecoverIDA. The first of these algorithms, known as the sharing algorithm of the IDA, may include a mechanism that takes as input a string C, the message to be dispersed, and returns a sequence of n strings, C[1], . . . , C[n], which are referred to as the chunks of the data that have resulted from the dispersal. The value of n may be a user-programmable parameter of the IDA, and it may be any suitable positive number. The sharing algorithm of the IDA may be probabilistic or deterministic. In
The second algorithm, known as the recovery algorithm of the IDA, may take as input a vector of n strings, the supplied chunks. Each supplied chunk may be a string or the distinguished symbol “0”, which is read as missing and is used to indicate that some particular data chunk is unavailable. The recovery algorithm for the IDA may return a string S, the recovered secret. The IDA may be assumed to have a recoverability property; thus, an authorized set of parties can always recover the data from the supplied chunks, assuming that the authorized parties contribute correct chunks to the recovery algorithm of the IDA and that any additional party participating in reconstruction contributes either a correct chunk or else the distinguished missing (“⋄”) value. Unlike the case for a PSS scheme, there may be no privacy property associated with the IDA and, in fact, one simple and practical IDA is to replicate the input C for n times, and to have the recovery algorithm use the value that occurs most often as the recovered data. More efficient IDAs are known (for example, Rabin's IDA).
(5) A probabilistic commitment scheme, which may include a pair of algorithms, Ct and Vf, called the committal algorithm and the verification algorithm. The committal algorithm Ct may be a probabilistic algorithm that takes a string M to commit to and returns a committal value, H (the string that a player can use to commit to M) and also a decommittal value, R (the string that a player can use to decommit to the committal H for M). The committal algorithm may be probabilistic and, as such, can take a final argument, R*, which is referred to as the algorithm's coins. These coins may be earlier generated by a call to a random or pseudo-random number generator, Rand. The notation “Ct(M; R*)” is sometimes used herein to explicitly indicate the return value of the committal algorithm Ct on input M with random coins R*. The verification algorithm, Vf, may be a deterministic algorithm that takes three input strings: a committal value H, a string M, and a decommittal value R. This algorithm may return a bit 0 or 1, with 0 indicating that the decommittal is invalid (unconvincing) and 1 indicating that the decommittal is valid (convincing).
In general, a commitment scheme may satisfy two properties: a hiding property and a binding property. The hiding property entails that, given a randomly determined committal H for an adversarially chosen message M0 or M1, the adversary is unable to determine which message H the committal corresponds to. The binding property entails that an adversary, having committed to a message M0 by way of a committal H0 and corresponding decommital R0, is unable to find any message M1 distinct from M0 and any decommital R1 such that Vf(H0,M1,R1)=1. In most cases, the decommittal value R produced by a commitment scheme Ct(M; R*) is precisely the random coins R* provided to the algorithm (i.e., R=R*). However, this property is not required in all cases. The most natural probabilistic commitment schemes may be obtained by way of suitable cryptographic hash functions, such as SHA-1. There are a variety of natural techniques to process the value being committed to, M, and the coins, R*, before applying the cryptographic hash functions. Any commitment scheme containing a commitment mechanism Ct and verification algorithm Vf may yield a commitment mechanism Commit and verification mechanism Verify that applies to vectors of strings instead of individual strings. The commitment algorithm Commit may apply the Ct algorithm component-wise, and the verification algorithm Verify may apply the Vf algorithm component-wise. For Ct, separate random coins may be used for each component string in some embodiments.
Key 802 may be used for only one sharing, and can therefore be referred to as a one-time key. In addition to being used to encrypt secret 800, key 802 may also be shared or distributed using perfect secret sharing (PSS) scheme 806. PSS scheme 806 may include any perfect secret sharing scheme, including the Shamir or Blakley secret sharing schemes. Perfect secret sharing scheme 806 may be randomized, requiring its own source of random (or pseudo-random) bits. The random or pseudo-random bits may be provided by a separate random or pseudo-random number generator, such as number generator 805. PSS scheme 806 may output a vector of key shares K=K[1], . . . , K[n] 808 which, conceptually, may be sent out to the different “players,” one share per player. First, though, the key shares may be combined with additional information in some embodiments. Ciphertext C 804 may be split up into chunks 809 using information dispersal algorithm (IDA) 807, such as Rabin's IDA mechanism. IDA 807 may output a vector of ciphertext chunks C[1], . . . , C[n] 809. Then, commit mechanism 812 of a probabilistic commitment scheme may be employed. A sufficient number of random bits are generated for the commitment process using random or pseudo-random number generator 810, and the resulting random string 811 is used for all committals at commit mechanism 812. Commit mechanism 812 may determine a committal value H[i] and a decommital value R[i], collectively shown in vector 813, for each message M[i]=K[i]C[i] (spread across 808 and 809). The ith share (which is not explicitly represented in
The algorithm labeled “Share” in Table 1, below, further explains the sharing scheme depicted in
The recovery algorithm of the RCSS scheme is also shown in Table 1, below. This time, the caller provides an entire vector of purported shares, S=S[1] . . . S[n]. Each purported share S[i] may be a string or the distinguished symbol “⋄”, which again stands for a missing share. It may also be assumed, in some embodiments, that the caller provides the identity of a share j, where j is between 1 and n inclusive, which is known to be valid. At lines 20-21, each S[i] may be parsed into its component strings R[i] K[i], C[i], and H[1] . . . H[n]. It is understood that the missing symbol, “⋄”, may parse into components all of which are themselves the missing symbol ⋄. At line 23, the verification algorithm of the commitment scheme may be executed to determine if message KC[i]=K[i]C[i] appears to be valid. The “known valid” share j may then be used as the “reference value” for each commitment Hj[i]. Whenever a K[i] C[i] value appears to be invalid, it may be replaced by the missing symbol. The vector of K[i] values that have been so revised may now be supplied the recovery algorithm of the secret sharing scheme at line 25, while the vector of revised C[i] values may be supplied to the recovery algorithm of the IDA at line 26. At this point, one needs only to decrypt the ciphertext C recovered from the IDA under the key K recovered from the PSS scheme to get the value S that is recovered by the RCSS scheme itself.
As indicated above, the Recover algorithm of Table 1 assumes that the user supplies the location of a known-valid share. In the absence of this, other means may be employed to determine a consensus value for H[i]. The most natural possibility used in some embodiments is the majority vote. For example, in lieu of Hj[i] at line 23 a value of H[i] may be used that occurs most frequently among the recovered Hj[i] values, for j ranging from 1 to n.
Returning briefly to
The random or pseudo-random number generator, Rand, may be defined as before. The computational secret sharing scheme may include a pair of algorithms ShareCSS and RecoverCSS. The first of these algorithms, know as the sharing algorithm of the CSS, may be a probabilistic map that takes as input a string K, called the secret, and returns a sequence of n strings, K[1], . . . , K[n], referred to as shares. Each K[i] may include one share or the n shares that have been dealt, or distributed, by the dealer (the entity carrying out the sharing process). The number n may be a parameter of the secret sharing scheme, and it may be an arbitrary positive number. The sharing algorithm may be probabilistic in that it may employ random or pseudorandom bits. Such a dependency may be realized by providing the sharing algorithm random or pseudorandom bits, as provided by the random or pseudo-random number generator, Rand.
The second algorithm, knows as the recovery algorithm of the CSS, takes as input a vector of n strings, referred to as the purported shares. Each purported share is either a string or a distinguished symbol “⋄”, which is read as missing and is used to indicate that some particular share is unavailable or unknown. The recovery algorithm for the computational secret sharing scheme may return a string S, the recovered secret. Since the pair of algorithms make up a computational secret sharing scheme, two properties may be assumed. The first property, the privacy property, may ensure that no unauthorized set of users obtains any significant (computationally extractable) information about the secret that was shared from their shares. The second property, the recoverability property, ensures that an authorized set of parties can always recover the secret, assuming that the authorized parties contribute correct shares to the recovery algorithm and that any additional party contributes either a correct share or else the distinguished missing (“⋄”) value.
The third primitive in this embodiment is a probabilistic commitment scheme, which may be implemented as described above in connection with
Referring to
Those skilled in the art will realize that a great number of variants are possible. For example, an error correcting code may be used in some embodiments to provide an adequate collection of committals H[1] . . . H[n] for each player, effectively replacing the simple but somewhat inefficient replication code of the prior embodiment.
Although some common applications are described above, it should be clearly understood that the present invention may be integrated with any network application in order to increase security, fault-tolerance, anonymity, or any suitable combination of the foregoing.
Additionally, other combinations, additions, substitutions and modifications will be apparent to the skilled artisan in view of the disclosure herein. Accordingly, the present invention is not intended to be limited by the reaction of the preferred embodiments but is to be defined by a reference to the appended claims.
Claims
1. (canceled)
2. A method for securing data by generating a set of data fragments, the method comprising:
- generating, by a hardware processor, a set of shares by applying a computational secret sharing scheme to the data;
- for each particular share of the set of shares computing a respective first value that has a binding property and a hiding property with respect to the particular share;
- for each particular share of the set of shares computing a respective second value usable to verify the first value;
- generating a plurality of fragments, each fragment comprising: at least one share, at least one second value, and at least two first values computed for at least two shares; and
- and storing each fragment on at least one data repository.
3. The method of claim 2 wherein each fragment comprises first values computed for each share of the set of shares.
4. The method of claim 2 wherein storing each fragment on at least one data repository comprises storing each fragment at a different geographic location.
5. The method of claim 2 wherein storing each fragment on at least one data repository comprises storing each fragment at different physical locations on the at least one data repository.
6. The method of claim 2 wherein the at least one data repository comprises a distributed file system.
7. The method of claim 2 wherein the computational secret sharing scheme is selected from the group consisting of the Shamir, Blakley, and Krawczyk secret sharing schemes.
8. The method of claim 2 wherein computing each first value comprises employing a probabilistic scheme.
9. The method of claim 2 further comprising transmitting the generated fragments over a plurality of communication channels.
10. The method of claim 9 wherein transmitting the generated fragments over a plurality of communication channels comprises transmitting each generated fragment over a different communication channel.
11. The method of claim 9 wherein transmitting the generated fragments over a plurality of communication channels comprises transmitting the data fragments over communication channels before storing the data fragments.
12. A system for securing data by generating a set of data fragments, the system comprising:
- a hardware processor configured to: generate a set of shares by applying a computational secret sharing scheme to the data; for each particular share of the set of shares compute a respective first value that has a binding property and a hiding property with respect to the particular share; for each particular share of the set of shares compute a respective second value usable to verify the first value; generate a plurality of fragments, each fragment comprising: at least one share, at least one second value, and at least two first values computed for at least two shares; and and store each fragment on at least one data repository.
13. The system of claim 12 wherein each fragment comprises first values computed for each share of the set of shares.
14. The system of claim 12 wherein the hardware processor is configured to store each fragment on at least one data repository comprises storing each fragment at a different geographic location.
15. The system of claim 12 wherein the hardware processor is configured to store each fragment on at least one data repository comprises storing each fragment at different physical locations on the at least one data repository.
16. The system of claim 12 wherein the at least one data repository comprises a distributed file system.
17. The system of claim 12 wherein the computational secret sharing scheme is selected from the group consisting of the Shamir, Blakley, and Krawczyk secret sharing schemes.
18. The system of claim 12 wherein the hardware processor is configured to compute each first value by employing a probabilistic scheme.
19. The system of claim 12 wherein the hardware processor is configured to transmit the generated fragments over a plurality of communication channels.
20. The system of claim 19 wherein the hardware processor is configured to transmit the generated fragments over a plurality of communication channels by transmitting each generated fragment over a different communication channel.
21. The system of claim 19 wherein the hardware processor is configured to transmit the generated fragments over a plurality of communication channels by transmitting the data fragments over communication channels before storing the data fragments.
Type: Application
Filed: Sep 25, 2017
Publication Date: Apr 5, 2018
Inventors: Mihir Bellare (San Diego, CA), Phillip Rogaway (Davis, CA)
Application Number: 15/714,877