UNIVERSALLY USABLE HUMAN-INTERACTION PROOF
Disclosed is a system and method for generating a universally usable, completely automated public turing test to tell a computer and a human apart (CAPTCHA). The universally usable CAPTCHA uses contextually related pictures and sounds to present concepts to a user. The pictures allow the CAPTCHA to be used by someone who could see, and the audio would allow the CAPTCHA to be used by someone who could not see. This combination of sound and images should make the CAPTCHA more universally usable for a larger population of users than previously known CAPTCHAs.
Latest Towson University Patents:
This application is based upon and claims benefit of copending U.S. Provisional Patent Application Ser. No. 61/196,135 entitled “Universally Usable Human-Interaction Proof”, filed with the U.S. Patent and Trademark Office on Oct. 15, 2008 by the inventors herein, the specification of which is incorporated herein by reference.
FIELD OF THE INVENTIONThis invention relates generally to completely automated public turing tests to tell computers and humans apart (CAPTCHAs), and more particularly to CAPTCHAs configured for use by persons with perceptual limitations.
BACKGROUNDDespite growing interest in designing usable systems for managing privacy and security, recent efforts have generally failed to address the needs of users with disabilities. As security and privacy tools often rely upon subtle visual cues or other potentially inaccessible indicators, users with perceptual limitations may find such tools particularly challenging. Human-Interaction Proof (HIP) tools, commonly known as CAPTCHAs, may be used for instance to authenticate users to allow access to web pages, registration with various online services, inputting of an online vote, and the like. The CAPTCHA typically presents a user with a test, which test is designed so that it may somewhat easily be completed by a human, but is quite difficult to be completed by a computer, such that for any successfully completed CAPTCHA test, an assumption may be made that it was a human user that entered the solution.
Typical CAPTCHAs have required a user to type some number of characters that are presented in a distorted image. Distortion of the image can make automated recognition via optical character recognition software difficult, thus making the text interpretable by humans but not by automated tools. Unfortunately, however, for the approximately 161 million people worldwide having some type of visual impairment, the task of identifying what characters are presented in the distorted image can be difficult, if not impossible, to accomplish.
Other CAPTCHAs have comprised images or pictures presented to a user, typically in the form of a real world object or a commonly recognized shape. For instance, a user may be shown a picture of a cow, and tasked with identifying the subject of the picture as a cow. Likewise, the user may be shown a picture of three circles and a square, and tasked with clicking on the square.
Still other CAPTCHAs have comprised audio recordings in which a user listens to an audio file, such as spoken words or numbers or sounds related to a particular image, often with audio distortion overlaying the primary audio file, and is tasked with identifying the particular sound.
Efforts have also been made to combine visual distorted text and audio in a CAPTCHA, such as in the ReCAPTCHA product developed by Carnegie Mellon University. For the audio portion, the user is presented with an audio clip in which eight numbers are spoken by various individuals. In more recent versions, such ReCAPTCHA product has used short audio clips from old radio shows. In either case, background noise is applied to make it harder for hacker bots and the like to break the CAPTCHA. The user is then asked to fill in a form with those eight numbers and hit a submit button, at which point they are presented with either a “correct” or “incorrect” reply. Unfortunately, testing has suggested that even such combined CAPTCHAs fail to sufficiently improve the security screening process for persons having perceptual disabilities.
It would therefore be advantageous to provide a CAPTCHA that is capable of distinguishing between humans and computers, while being easier to use for a broader range of users than previously known CAPTCHAs, and particularly being capable of use by a broad range of users of differing backgrounds and abilities.
SUMMARY OF THE INVENTIONDisclosed herein is a universally usable CAPTCHA that joins visual and audio presentations to produce a single system in which the audio is directly contextually related to the visual elements that are presented to the user. As used herein, the term contextually related means that a contextual relationship exists between the subject matter of the visual elements of an image presented to a user and the sound that is embodied in the audio file presented to the user. Such a combined visual and audio CAPTCHA is more accessible for users with visual impairments than previously known CAPTCHAs, and may provide an added benefit of easier adaptation for different languages and cultures.
The universally usable CAPTCHA uses contextually related pictures and sounds to present concepts to a user. The pictures allow the CAPTCHA to be used by someone who could see, and the audio would allow the CAPTCHA to be used by someone who could not see. This combination of sound and images should make the CAPTCHA more universally usable for a larger population of users than previously known CAPTCHAs. Moreover, using a CAPTCHA to relay a concept instead of a particular textual string, and requiring a user to identify and understand that concept in order to solve the CAPTCHA, is expected to make such CAPTCHA more secure than previously known CAPTCHAs. As generalizable image processing and sound recognition tools are not readily available, images and sounds used in the universally usable CAPTCHA should be relatively resistant to automated attacks. Another benefit of such universally usable CAPTCHA is that it is anticipated that it would be relatively easy to internationalize. Because the universally usable CAPTCHA would use pictures and sound effects, many of these concepts (although not culturally-specific ones) could be used all over the world. The only thing that would need to be changed for developing the system for another language is changing the labels for the sound/image combinations. As described in Sauer, G., Lazar, J., Hochheiser, H., and Feng, J. (2009), Towards A Universally Usable Human Interaction Proof: Evaluation of alternative designs (currently under review at ACM Transactions on Accessible Computing), which is incorporated herein by reference in its entirety, such a universally usable CAPTCHA provides significant benefits to visually impaired users by improving accessibility to various electronic services accessible through the Internet.
The above and other features, aspects, and advantages of the present invention are considered in more detail, in relation to the following description of embodiments thereof shown in the accompanying drawings, in which:
The invention summarized above may be better understood by referring to the following description, which should be read in conjunction with the accompanying drawings in which like reference numerals are used for like parts. This description of an embodiment, set out below to enable one to practice an implementation of the invention, is not intended to limit the preferred embodiment, but to serve as a particular example thereof. Those skilled in the art should appreciate that they may readily use the conception and specific embodiments disclosed as a basis for modifying or designing other methods and systems for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent assemblies do not depart from the spirit and scope of the invention in its broadest form.
Referring again to
In the exemplary embodiment of
User interface 110 preferably provides a connection to user client device 200, receiving a web page request from user client device 200 and providing a web page with a universally usable CAPTCHA back to the user client device 200. The user interface 110 also receives a response back from user client device 200 in the form of a selection of a label that is provided as one of several solution options for the universally usable CAPTCHA, as described in greater detail below.
When CAPTCHA generating system 100 receives a request through user interface 110 from a user client device 200 for a web page that includes a universally usable CAPTCHA, universally usable CAPTCHA generation module 120 randomly selects an image and audio file combination from image/audio database 140, and transmits to user client device 200 a web page displaying the randomly selected picture. An exemplary representation of such a web page 400 is shown in
In a particularly preferred embodiment, the image and audio file combinations stored in database 140 are preferably categorized into the categories of transportation, animals, weather, and musical instruments. It was found that these four categories were easy to recognize for a majority of potential users, without any special training or experience. For instance, such contextually related image and audio file combinations could include images and audio recordings relating to a bird, a cat, a drum, or a piano. Any items that have multiple easily identifiable labels are preferably not used.
Moreover, it is noted that the universally usable CAPTCHA generating system described herein quite intentionally uses commonly recognized sounds instead of alphanumerically spoken characters or words, as such commonly recognized sounds are, given today's technology, more difficult to automatically identify using speech recognition or similar technologies.
After a user views image 410 and/or listens to the audio file activated by button 420, they may select a label 430 that describes the common context of image 410 and the contextually related audio recording. In the exemplary embodiment of
After the user selects a label from the pull down list of labels 430, determination module 130 receives the user's label selection through user interface 110, and determines whether the selected label is the label that accurately describes the context of image 410 and the related audio recording, and thus whether access by user device 200 is authorized. Those of ordinary skill in the art will recognize that in addition to basing such authorization decision on whether the selected label matches the label that accurately describes the context of image 410 and the related audio recording, determination module 130 may also receive additional data, such as session start time, number of attempted selections, and the like, and may additionally base such authorization decision on the time delay between the first presentation of the universally usable CAPTCHA to user device 200 and the receipt of a label selection through user interface 110, the number of previous incorrect attempts to select a label, and other factors without departing from the spirit and scope of the invention.
Determination module 130 determines whether or not the user selected the correct label, and thus whether or not user device 200 may access another, subsequent page that the universally usable CAPTCHA is intended to guard access to, before such access is granted. Those of ordinary skill in the art will recognize that such determination may be made by way of a simple lookup function in which determination module 130 consults image/audio database 140, determines the particular label or labels associated with the image and sound files randomly selected and transmitted to user device 200, and directly compares the stored, associated label or labels with the selected label to determine if a match exists. If the user selected label does not match the label that accurately describes the context of image 410 and the related audio recording, then access by user device 200 to such subsequent page is not allowed. If, however, such user selected label does match the label that accurately describes the context of image 410 and the related audio recording, then access by user device 200 is allowed to such subsequent page.
With regard to another embodiment of the invention, and with particular reference to
Those of ordinary skill in the art will recognize that more or fewer image and audio file combinations may be provided than the specific examples set forth here without departing from the spirit and scope of the invention, although it is noted that adding more image and audio file combinations generally increases security at the cost of efficiency. Also, while the simple addition of such image and audio file combinations to the corpus of data that an automated bot would have to search expands the search space that such a bot would have to deal with, it would be advisable to update and refresh such objects over time in order to prevent man-powered attacks.
Optionally, the particular images stored in image/audio file database 140 may be categorized under particular contexts, such that any randomly selected image and audio file combination that is to be used on a web page 400 may not only randomly select a particular context for the image and audio file combination, but likewise may randomly select particular image and/or audio files associated with such context. Internet based image search engines may be used to retrieve a large set of images associated with any given context or search term. The combination of periodic prefetching of images and extraction of arbitrary subsets of each image may be used to discourage attackers who attempt to perform similar queries on image search engines in order to identify the stimuli used.
The selection of appropriate matching sound clips and images is an important factor for user performance. The match between the term, the image, and the sound effect should be clear, unique, and obvious. Sometimes multiple concepts or terms may be connected to the same image/sound pair. For instance, both thunder and lightning may be valid solutions for the sound of thunder and the image of lightning. Similarly, both alarm and siren may be valid answers on the sound of a siren. However, some sounds may prove to be problematic (e.g., it was found in one test that a pig sound effect was not clear and obvious enough to easily identify for a user who is unable to view the associated image of a pig, and in another test that a wolf howl sound effect caused at least one test user to look for the word “fox” as a solution to the CAPTCHA). Sound effects that were well received by the participants included glass breaking, truck, train, siren, and bell sound effects.
It was also found that the sound effects need to have a minimum duration in order to be clear to the users. A large part of this is caused by the screen reader software that blind computer users will often use as a computing aid. With screen reader software, every key that is pressed is spoken through the computer speakers. Thus, if users use the enter or spacebar key to press the “Play Sound” button 420, the computer will be saying “Enter” or “Space” while the sound is playing. If the sound is not long enough to keep playing after the screen reader feedback, the user will not be able to hear it clearly. Thus, it may be desirable in certain circumstances to repeat the sound effect a few times. For instance, in prior testing a cat sound clip was very brief and had a cat meow only once. A few users missed the sound the first time they heard it. In contrast, a dog sound had a dog barking three times, which was easier for the users to capture. If the user was unsure what the particular sound was the first time, the repetition helped them understand the sound. Another possibility for compensating for the screen reader software reading the key presses is inserting a delay before the sound plays. The cost is that this will slow down the time it takes for users to complete the universally usable CAPTCHA and that a delay on a web site can give the impression of a poor server, slow connection, or a web site that is currently down.
Also, one major security concern with such a universally usable CAPTCHA is the problem of sound identification via checksum or file signature. This may be addressed by inserting, by way of non-limiting example, random “non-audible” noise (outside the range of human hearing) to the sound files as they are being processed. By inserting this noise randomly, the checksums and file signatures of the files would change every single time they are played. Moreover, the introduction of non-audible white noise to the background of the audio files may be used to keep the sound wave frequencies at a constant level throughout the entire file, thus increasing the difficulty of automated analysis. Such addition of high frequency white noise to the current clean audio files ads a layer of obfuscation, and thus security, making it more difficult for a bot or other automated device to evaluate, while ensuring that humans can still identify the sound. In a particularly preferred embodiment, a file of white noise at a frequency of 18k is used, which frequency was found to be high enough to be outside of the range of normal human hearing. Such white noise may be mixed with the current, clean, non-altered audio files to create an audio file that resembles a large chunk of 18k frequency noise to a sound wave analyzer, while a human remains capable of discerning and identifying the underlying, original audio recording.
Additionally, broad spectrum white noise may be added that may minimally be heard by the end user but would be more difficult to filter out.
Those of ordinary skill in the art will recognize that other elements may be included in universally usable CAPTCHA generating system 100, such as elements configured to manage the generation and termination of a particular session with one or more user client devices 200, which processing elements are well known to those of ordinary skill in the art, and thus will not be discussed further herein. Those of ordinary skill in the art will also recognize that while the above description provides for the selection of a particular label among a number of labels presented to a user device 200, it is also envisaged that an operator of user device may instead input their own textual description of the image and/or audio file, and that determination module 130 may read such textual description, compare the words of such textual description with approved terms associated with the image and audio file presented to user device 200 (which approved terms are likewise stored in image/audio database 140), and allow access by user device 200 upon a determination that a sufficient number of words in the textual description match the approved terms associated with the image and audio file presented to user device 200. With regard to a particularly preferred embodiment, determination module 130 may utilize standard strategies from information retrieval technology to address usability concerns relating to suffixes, misspellings, and synonyms. The suffix problem arises from the variants of words that may be provided as correct answers. An image of a drum set, accompanied by the sounds of drums playing, may elicit responses including “drum”, “drums”, and “drumming.” Stemming algorithms known to those of ordinary skill in the art may be used to strip such suffixes off of words, allowing all three responses to be interpreted as matching “drum.”
Misspelling, particularly due to keystroke errors, is another concern. Systems that do not allow for any spelling errors may be overly restrictive, causing difficulties in both task performance time and correctness rates. A Levenshtein distance of two, for example, may be used to allow responses with up to two misspellings to be counted as correct.
Synonymy is the problem of multiple ambiguous answers. In the above-described configuration in which each image and audio file combination has an initial “correct” answer that was sought, some pairings could be potentially ambiguous. For instance, for a storm cloud image/thunder sound combination, is the desired answer “storm”, “thunder”, or “lightning”? A separate usability study involving seven sighted users identified the three most commonly chosen labels for each sound/image pair. Participants having no prior knowledge of the sound files or their labels were asked to listen to each sound file and give the top three labels that came to mind. This was repeated for each sound in the corpus. A resulting vocabulary consisting of up to three synonyms for each image and audio file combination was used as the set of answers that would be considered correct. Although this tactic of using synonyms gathered through research study was used in this scenario, it is very realistic to expect the database of acceptable or correct answers to change over time. Through logging of user responses, a more defined set of what users are submitting as answer may be used to more thoroughly define the database of answers. For example, if looking through the logs it is found that more users are answering “swallows” instead of “birds,” the answer swallows can be added to the database of acceptable answers. Also in a completely opposite but related way, if the answer “seagull” is in the database, but through the logs it is shown that no users are using that answer, then “seagull” could be removed. This process could ultimately be automated. Because the database answers will be frequently changing to account for these log findings, it is believed that the blending of answers will not be a problem. The use of synonyms could potentially also increase the security offered by the universally usable CAPTCHA described herein. Although adding additional sounds may cause these later sounds to be difficult to categorize, and thus make them less useful and harder to solve, they may also make them more secure.
Web searches may also be used to generate synonym lists. To find a synonym for a given term, this technique starts with a search for pages containing the term. Candidate pages are then analyzed for any correlations between the target term and other words in the document. The resulting correlations may then be used to generate a subset of synonyms. This approach has several advantages over thesaurus/dictionary-based synonym generation. The use of web structure, as opposed to a static thesaurus, removes a potential target for attack. Furthermore, repeated application of this strategy, using different parameters for selection of candidate articles, might be used to generate a more unpredictable set of synonyms. For example, random selection of candidate articles from the most highly-ranked search result might lead to differing correlations, and therefore differing synonym sets, with each execution. This approach may provide increased resiliency against some forms of attack, at the possible cost of decreased usability due to confusion regarding synonyms that were once, but are no longer, accepted as valid.
The above-described universally usable CAPTCHA may also be configured to assist dyslexic users by using a “prediction feature,” combining the concept of a drop down feature with a free text input box, by allowing the user to type their answer, and as they are typing the answer the determination module 130 may suggest correct spellings for the term that such user is trying to input.
The foregoing universally usable CAPTCHA generating system (and the associated methods described above) may be used to help protect web sites against unauthorized access, while at the same time allowing effective use by individuals with visual impairment. To best implement such system and methods, it is advisable to provide as large a collection of contextually associated image and sound files as is practical and as a particular situation will allow. If the search space is too small, universally usable CAPTCHA will be subject to brute force attacks. Also, it would be desirable to randomize the audio file names every time generation module 120 is engaged to select and transmit an image and audio file combination to a user device 200, or to have all file names randomly chosen to be renamed to temp before being transmitted to a user device 200. Either of these file renaming options will make it difficult for a bot to catalog the filenames for purposes of knowing how to correctly respond in future access attempts. Such feature also has the complication of serving many users simultaneously, so it is important to make sure that each user's session files do not interfere with other concurrent sessions.
Obscuring the file size along with the file name would also deter brute force attacks. This can be done by making all of the file sizes the same size, thus defeating any attempts at cataloging file sizes. Finally, if there are more than two or three incorrect responses received from user device 200, such user device 200 should be locked out from further access attempts.
Still other benefits may result from such universally usable CAPTCHA as described herein. More particularly, the relative lack of culturally-specific content may make such universally usable CAPTCHA relatively easy to translate for users who are not English speakers or otherwise comfortable with Roman alphabets. An easy translation of the database 140 into another language would allow it to be used in other areas of the world other than the United States. While it is possible that some images in database 140 may be cultural specific, it should be fairly easy to eliminate those objects and replace them with objects that would more closely fit a particular culture Likewise, use of audio files as described herein may be useful for sighted users in mobile contexts, where small screens might make distorted text CAPTCHAs impractical.
The components, process steps, and/or data structures used in the system and methods described above may be implemented using various types of operating systems, computing platforms, computer programs, and/or general purpose machines. Those of ordinary skill in the art will also recognize that devices of a less general purpose nature, such as hardwired devices, field programmable gate arrays, application specific integrated circuits, or the like, may be used without departing from the spirit and scope of the invention. Likewise, those of ordinary skill in the art will recognize that the above described methods may be implemented in the form of a program that can be performed by various types of computers, and the program for performing such methods can be stored in any recording medium readable by a computer, such as a hard disk drive, CD-ROM, DVD, ROM, RAM, or flash memory. Still further, while
Having now fully set forth the preferred embodiments and certain modifications of the concept underlying the present invention, various other embodiments as well as certain variations and modifications of the embodiments herein shown and described will obviously occur to those skilled in the art upon becoming familiar with said underlying concept. It should be understood, therefore, that the invention may be practiced otherwise than as specifically set forth herein.
Claims
1. A computer implemented method in a security access computing system for providing secure access to an electronic service, comprising:
- receiving a request for a web page including a completely automated public turing test to tell computer and humans apart (CAPTCHA);
- in response to said request, transmitting a web page including an image file and an audio file, wherein said image file and said audio file are contextually related to one another;
- receiving a user selection of a label;
- determining whether said user selected label matches one or more stored labels that are contextually related to said image file and said audio file; and
- allowing access to said electronic service upon a determination that said user selected label matches said one or more stored labels.
2. The method of claim 1, further comprising the step of:
- prior to said transmitting step, randomly selecting said image file and said audio file from a collection of multiple image file and audio file pairs, wherein each of said image file and audio file pairs are contextually related to one another.
3. The method of claim 1, wherein said image file and said audio file are maintained in a database accessible by said computing system, and wherein said database further comprising a collection of multiple image file and audio file pairs, wherein each of said image file and audio file pairs are contextually related to one another.
4. The method of claim 3, wherein said database further comprises at least one label contextually associated with each said image file and audio file pair.
5. The method of claim 1, further comprising the step of denying access to said electronic service upon a determination that said user selected label does not match said one or more stored labels.
6. The method of claim 1, wherein said web page further comprises a user engageable function to initiate playback of said audio file.
7. The method of claim 6, wherein said user engageable function is displayed to a user simultaneously with an image contained in said image file.
8. The method of claim 1, further comprising the step of prompting a user to input a label believed to be contextually associated with said image file and said audio file.
9. The method of claim 8, wherein said prompting step further comprises presenting a plurality of labels to a user, wherein said plurality of labels includes at least one label that is contextually associated with said image file and said audio file, and at least one label that is not contextually associated with said image file and said audio file.
10. A computer readable medium whose contents cause a security access computing system to:
- receive a request for a web page including a completely automated public turing test to tell computer and humans apart (CAPTCHA);
- in response to said request, transmit a web page including an image file and an audio file, wherein said image file and said audio file are contextually related to one another;
- receive a user selection of a label;
- determine whether said user selected label matches one or more stored labels that are contextually related to said image file and said audio file; and
- allow access to said electronic service upon a determination that said user selected label matches said one or more stored labels.
11. The computer readable medium of claim 10, wherein said contents further cause said security access computing system to:
- prior to said transmitting a web page, randomly select said image file and said audio file from a collection of multiple image file and audio file pairs, wherein each of said image file and audio file pairs are contextually related to one another.
12. The computer readable medium of claim 10, wherein said image file and said audio file are maintained in a database accessible by said computing system, and wherein said database further comprising a collection of multiple image file and audio file pairs, wherein each of said image file and audio file pairs are contextually related to one another.
13. The computer readable medium of claim 12, wherein said database further comprises at least one label contextually associated with each said image file and audio file pair.
14. The computer readable medium of claim 10, wherein said contents further cause said security access computing system to:
- deny access to said electronic service upon a determination that said user selected label does not match said one or more stored labels.
15. The computer readable medium of claim 10, wherein said web page further comprises a user engageable function to initiate playback of said audio file.
16. The computer readable medium of claim 15, wherein said user engageable function is displayed to a user simultaneously with an image contained in said image file.
17. The computer readable medium of claim 10, wherein said contents further cause said security access computing system to:
- prompt a user to input a label believed to be contextually associated with said image file and said audio file.
18. The computer readable medium of claim 10, wherein said contents further cause said security access computing system to:
- present a plurality of labels to a user, wherein said plurality of labels includes at least one label that is contextually associated with said image file and said audio file, and at least one label that is not contextually associated with said image file and said audio file.
Type: Application
Filed: Oct 15, 2009
Publication Date: Apr 15, 2010
Patent Grant number: 8245277
Applicant: Towson University (Towson, MD)
Inventors: Jonathan K. Lazar (Columbia, MD), Harry Hochheiser (Baltimore, MD), Jinjuan Feng (Ellicott City, MD), Graig Sauer (Baltimore, MD), Jonathan D. Holman (Ashburn, VA)
Application Number: 12/579,680