MULTIMODAL TEXT INPUT SYSTEM, SUCH AS FOR USE WITH TOUCH SCREENS ON MOBILE PHONES
A system and method for entering text from a user includes a programmed processor that receives inputs from the user and disambiguates the inputs to present word choices corresponding to the text. In one embodiment, inputs are received in two or more modalities and are analyzed to present the word choices. In another embodiment, a keyboard is divided into zones each of which represents two more input characters. A sequence of zones selected by the user is analyzed to present word choices corresponding to the zone selected.
This application is a divisional of U.S. application Ser. No. 13/583,840, filed Oct. 16, 2012, which is a U.S. National Phase of International Patent Application No. PCT/US2011/028387, filed Mar. 14, 2011, entitled Multimodal Text Input System, Such as for Use With Touch Screens on Mobile Phones, which claims priority to the following commonly assigned U.S. Patent Application Nos. 61/313,454, filed Mar. 12, 2010 and 61/313,864, filed Mar. 15, 2010, both entitled Method and Apparatus for Text Input; 61/313,423, filed Mar. 12, 2010, entitled Method and Apparatus for Secure Data Transmission; and, 61/316,799, filed Mar. 23, 2010, entitled Multimodal Text Input System, Such as for Use with Touch screens on Mobile Phones, each of which is incorporated by reference in its entirety.
This application is also related to commonly assigned U.S. Pat. No. 6,307,549, entitled “REDUCED KEYBOARD DISAMBIGUATION SYSTEM;” U.S. patent application Ser. No. 11/379,006/ U.S. Patent Publication No. 2006/0274051 entitled “VIRTUAL KEYBOARD SYSTEMS WITH AUTOMATIC CORRECTION,” filed Apr. 17, 2006; and U.S. patent application Ser. No. 11/379,006, entitled “DISAMBIGUATION OF ICONS AND OTHER MEDIA IN TEXT-BASED APPLICATIONS,” filed Mar. 29, 2007, and 61/313,564, filed Mar. 12, 2010, all of which are incorporated by reference.
BACKGROUNDUsers enter text into computers and other data processing systems using a variety of input devices, including many types of keyboards. Most standard keyboards are large so as to accommodate both hands of a typist, but this leads to a less portable device. Laptop computers have attempted to shrink the size of a keyboard, sometimes at the expense of typing efficiency for some users.
Certain mobile devices such as cell phones, smart phones and the like, require an even smaller keyboard to maintain the form factor common with such devices. Some phones include a small key pad with physical buttons to distinguish each key on the keyboard, with which a user may use her thumbs to actuate the buttons. Other devices use a touch screen that displays a similar sized keyboard. Both of these smaller keyboards are cramped and can provide for slower and/or error prone text entry.
Although touch screen devices have become increasingly popular, consumers continue to demand simpler ways to input text, such as for e-mail, text messaging, instant messaging, and other activities. These touch screen keyboards, however, may suffer from more problems than their small key pad counterparts. The soft keys on the touch screen of such keyboards provide less tactile feedback to a user, which can lead to input errors, as well as other problems.
Various examples of the disclosed technology will now be described. The following description provides specific details for a thorough understanding and enabling description of these examples. One skilled in the relevant art will understand, however, that the disclosed technology may be practiced without many of these details. Likewise, one skilled in the relevant art will also understand that the disclosed technology may include many other obvious features not described in detail herein. Additionally, some well-known structures or functions may not be shown or described in detail below, so as to avoid unnecessarily obscuring the relevant description.
The terminology used below is to be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific examples of the disclosed technology. Indeed, certain terms may even be emphasized below; however, any terminology intended to be interpreted in any restricted manner will be overtly and specifically defined as such in this Detailed Description section.
I. Enterprise/Cloud EnvironmentAs will be explained in further detail below, the technology described herein relates to systems and methods for increasing the speed and accuracy with which a user can enter text into devices and in particular into mobile devices. In accordance with one embodiment, upon entry of text by the user, data representative of such text entry is transmitted to a centralized location via the Internet, wireless communication system, or other appropriate data transmission method to a remote centralized server.
Referring first to
In accordance with one embodiment of the disclosed technology, certain functions previously limited to implementation on one or more of the remotely located devices are transferred to the enterprise server, thus providing a more similar user experience for all users of the enterprise system. In particular, when entering text into one or more of the remotely located devices, as noted above, typically all functionality regarding text correction and auto suggestion and auto fill of text entry words is performed in accordance with information retained on the remotely located device. There exists, therefore, a problem related to updating of data and the like used in such selection, and in particular with potentially frequently updated information on an enterprise wide basis such as email addresses. Thus, if an email address list is updated for a particular enterprise across an entire group of enterprise users, such update may not be transmitted to one or more of the remotely located devices. If the enterprise is sufficiently large, it may not be practicable to download an entire email list to such a remote device. Thus, the remote device may maintain a local cache with a limited number of text entries, typically such entries determined in accordance with a locally-maintained address list or the like. If an email address is not found for a particular indicated individual, the user may be provided with the option of searching for the email address on a remote server, but this process requires additional steps to be performed by a user. There is further no automatic integration of such a system into the user experience of the individual enterprise user. Additionally, such a system may not employ any error tolerant text entry as the user must complete the text entry before asking for access to information stored on the enterprise server. Thus, there has traditionally been no way for such a system to aid in text entry to determine user intent when a complete dictionary of word is not available to the remotely located device.
Therefore, in accordance with the disclosed technology, a method and apparatus are provided that employ error tolerant text entry techniques at the remote device level, while using such error tolerant text entry results to select suggested text entry information from a centralized database location.
For example, in one embodiment of the disclosed technology error tolerant text entry is provided to a user by providing the user with preferably nine, or other appropriate number of zones each comprising one or more keys. When a zone is selected, a predetermined number of keys comprising the zone are also selected. Innovative software is able to determine a desired selected word based upon the zone selection sequence, rather than needing each individual key to be selected. A more detailed description of one example of such a zone text entry method is described at U.S. patent application Ser. No. 12/500,675 filed Jul. 10, 2009 titled METHOD AND APPARATUS FOR TEXT ENTRY, the contents thereof being incorporated herein by reference.
In other embodiments of the disclosed technology, such zone key entry may be performed in the background without providing indications of such zones to a user. Thus, while a user may be encouraged to be as accurate as possible when typing, great accuracy is not necessary. Other error tolerant text entry systems, such as that described in co-assigned U.S. patent application Ser. No. 12/186,425 to Matthew Cecil, el al., filed Jun. 14, 2007 entitled “A PROBABILITY-BASED APPROACH TO RECOGNITION OF USER-ENTERED DATA”, the entire contents thereof being incorporated herein by reference, may also be employed. In still further embodiments of the disclosed technology, more traditional text entry systems, and other known predictive text entry systems may be employed.
Further in accordance with aspects of the disclosed technology, in a particular embodiment of the disclosed technology, an error tolerant text entry system may be provided as described in U.S. patent application Ser. No. 12/500,675 referenced above. Thus, a described zone entry text system may be employed on the remotely located device, even without having the complete word dictionary located on such remotely located device. Such error tolerant text entry system may also maintain zone definitions in a manner not displayed to the user. Finally, other error tolerant text entry systems may also be employed in accordance with the present disclosed technology.
Referring next to
Thereafter, at step 230, one or more a dictionary search or other database search is performed by transmitting the predictive text entry information to the enterprise server. Importantly, it is not merely text keys entered that are transmitted, but rather such data as has been determined in accordance with the initial error tolerant text entry processing is transmitted. In accordance with the preferred embodiment of the disclosed technology, for each key that is input by a user, all other keys defined to be in the zone with the selected key, whether these zones are fixed or dynamic, available to the user or not, are transmitted to the enterprise server for further processing in accordance with the disclosed technology. In such a manner, the received predictive text entry information entered in any of the one or remote devices can be applied to any number of consistently maintained enterprise wide databases.
Selecting a suggested word replaces the entered word with the selected suggestion at step 250. If none of the words are selected, processing passes back to step 210 in
In an alternative embodiment of the disclosed technology, in order to speed processing at the remote device, based upon a determined size of a particular dictionary, it may prove beneficial to download a dictionary to the remote device prior to the beginning of text entry by the user. Thus, in a case where entries into a particular field in a form or the like are limited to a small number of possible entries, downloading such a dictionary upon entry of the field by the user may allow for only a single data transfer to be employed. Similarly, if a user is visiting a form or the like with a number of fields, dictionaries for each of the fields in a particular form may be downloaded in advance, or upon entry of a first field of the form. Of course, if any particular field were found to have a very large dictionary, access to this dictionary on the enterprise server may be provided as noted above.
The user can train the system to add words to its dictionary or the system can be set to add words automatically after a preset number of times that the word has been entered. Thus, in accordance with the disclosed technology, not only can a dictionary be modified on an enterprise wide platform, entry of text by a particular user employing a remotely located computing device may result in the addition of such a text entry to a database or dictionary for subsequent use by any enterprise user. Of course, local databases may also be maintained for text entries that need not be made available to the enterprise as a whole.
An exemplary predictive text algorithm is depicted in U.S. patent application Ser. No. 12/186,425 referenced above. Such an algorithm utilized in accordance with the present disclosed technology addresses the predictive text requirements of the disclosed technology. A user interface provided to a user on one or more of the remotely located computing devices will provide a character (or string of characters associated with one or more selected zones) to an algorithm at an enterprise location, and the algorithm will interpret that set of characters and return a result from the enterprise server to a user interface at the remotely located computing device in the form of one or more suggested word selections. Alternatively the enterprise server may return an indication of an inability to properly parse out a word selection, thereby indicating that the user has not yet entered enough letters as of yet.
In one embodiment, user context, history or the like, in accordance with the particular individual user, or in accordance with the collective experience of the full set of enterprise users may be employed to aid in such suggested word selection. Additionally, while the described predictive text algorithm may be employed, the present disclosed technology may be used with any appropriate text parsing or predictive text algorithm.
Using a matrix of probabilities the predictive text algorithm knows that for each character pressed, the user could have meant characters to the left, right, above or below, or the other characters in the zone indicated in as described above. Once the algorithm knows which characters are probable (based upon the keys associated with a selected key, the algorithm then runs a search against a dictionary preferably provided at an enterprise server location as accessible to a plurality of enterprise users, as is more completely described in the '425 application referenced above. Any words that match the string of probable characters are designated and then preferably scored based on distance between characters, frequency, probability, and any number of other characteristics that may be indicative of a user intention.
The system may also adopt over time to give a higher probability of selection to frequently used patterns of a particular user, or to particular groups of users in an enterprise setting. Thus, the prediction mechanism may be personalized for a particular user, even if that user is selecting words from a common centralized enterprise dictionary or database. Such personalization may comprise providing selections based upon a frequency of selection of words by a particular user during a predetermined preceding time period. Thus, the more a user uses a particular word, the more likely it may be to be selected again, and the more likely it will be suggested to the user. This personalization may be particularly useful in such a situation as when a user is entering an email address from a centralized, enterprise wide email list. As a particular user sends email to one or more particular recipients, the names of these recipients are more likely to be entered by the user in the future. Furthermore, such personalization need not be limited to a single person. Rather, entry of particular email addresses may be accumulated for a group of people or the like, thus allowing for a department or the like to generate a preferred list of email addresses. This selection may also be applied to standard words or other sequences that may be selected in accordance with a text entry system. Such personalization may provide a weighted selection system in which the words or email addresses that are selected most recently or frequently are provided first to a user followed by words selected at an earlier time, or less frequently.
In another embodiment, the personalization of a user dictionary is not limited to words commonly or recently selected by a user. The dictionary may also be customized based on recent behavior of the user. If a user is searching the internet for web pages devoted to particular topic, words from those web pages can be classified and added to the dictionary as more likely choices. For example, if a user is viewing web pages on their remote device associated with food and cooking, then when the user enters the keys “d i s e r t” the word search algorithm uses the recent history to suggest the word “dessert” before the word “desert.” Similarly, temporal and geographical considerations can be used by the text selection algorithm. For example, the GPS location of the remote device can be detected and sent to the enterprise server. If is known that a user is located in the mountains, then the text prediction algorithm is trained to suggest that a text entry of “how is the snor?” should be “How is the snow?” as opposed to “How is the show?”
In this embodiment, information about the recent browsing history, current location, e-mails read, GPS location etc. is send to the enterprise server and maintained in a file associated with user in order to customize a dictionary or to aid in selecting likely words for the user based on their recent behavior.
A can be seen in
The disclosed technology may also be applied to a mobile or other device using a scrolling device, or otherwise comprising only one or two directional indication keys. With a scrolling device or one or two direction keys, the predefined zones are preferably provided with a predefined sequence, such as perhaps left to right, then top to bottom, so that upon scrolling or utilizing a directional key, a user is forwarded through each of the predefined zones, highlighting or otherwise indicating the presently selected zone. Pressing in the scrolling device, or other method of selection allows a user to choose the zone, and then scrolling is employed to move to a next zone for selection.
The disclosed technology may further be applied to a projected keyboard where various sensing mechanisms determine where a user touches a projected keyboard. As the accuracy of such sensing is likely difficult to determine, allowing a user to select from a smaller number of larger zones would be quite beneficial.
II. Microgesture InputIn accordance with another aspect of the disclosed technology, a rapid text entry system and method are provided that leverage error tolerant predictive text algorithms. A user can employ his familiarity with QWERTY keyboard (or other common keyboard layouts) and enter shorter microgestures of two or three letters that comprise less than an entire word. Shorter words can be entered with one such microgesture, while longer words may use two or more microgestures. Thus, the text entry system described below overcomes at least one major drawback of prior systems namely, the requirement that the user enter the entire intended word entry without lifting his finger and thus provide a single trace for an entire intended word.
In accordance with one embodiment, a text entry system notes a starting and ending location for each microgesture. Furthermore, if a bend is detected in the microgesture entered by the user, it is determined that a third, middle letter was intended. A number of different methods may be used to determine the intended middle letter. First, a location of the bend in the microgesture may be determined. Second, a direction moved from the starting position may be provided to a text parsing algorithm to help select among a number of possible next letter text entries, given the start and endpoint inputs. Alternatively, a recognition of the existence of such a bend may be determined, thus indicating that a middle letter was intended. The text parsing algorithm may then assist in determining such letter without any additional help, as if it had encountered a missing letter in a more traditional text entry system.
Once a first microgesture is determined, the selected letters are input to a text selection algorithm, and word suggestions and the like are provided to a user in a more traditional format. If no word is selected, a next tap of a key, or a next microgesture is determined to be part of text entry for a same, longer word.
In one embodiment, users slide their fingers between a number of characters on the keyboard instead of lifting and pressing each intended letter to be entered. It has been determined that some users prefer such sliding to neighboring characters instead of tapping each character individually, because it is more convenient and reduces the potential for text entry error as compared to a traditional tap entry system where the user may tap the wrong key if required lift their finger and press each key in succession. Further, the user is able to avoid any type of repetitive stress issues, and may be able to enter text more quickly.
When using smaller, typically mobile, touch screen devices users have been found to generally type using one or two fingers instead of all fingers as they would on a more traditional regular laptop/PC keyboard. The use of only one/two fingers can result in finger fatigue and discourage users from typing on a touch screen device. The required act of lifting and placing the finger for each letter to be entered is often associated with repetitive stress injuries such as carpal tunnel syndrome and the like. In accordance with an embodiment of the disclosed technology, a user has the option to either slide their fingers to enter a few letters, or tap the entry of a particular letter as desired. Thus, rather than requiring tapping as in traditional text entry systems, or tracing the letters for an entire word as noted in the prior art patents listed above, the user has the option and flexibility of entering text as he or she prefers. Such preferences may be based upon particular input device, particular word being entered, etc. Ultimately, users are provided with a most flexible option and can choose to enter with microgestures, tapping, or a combination of both based on convenience.
Microgestures allow users to select 2 or 3 characters at a time with a single microgesture. One further benefit of such microgestures is that users do not have to mentally or physically trace a complete path for a word before the predictive text algorithm begins suggesting corresponding words. Tracing a path for longer words can be confusing and difficult. Users tend to forget which characters of a word they have already entered and tend to misspell. Tracing paths for long words may also result in mental and/or physical fatigue. Further, because an apparatus receiving a path tracing an entire word must wait for completion of that word before processing the input data, it is not possible to provide automatic completion of such text entry, or suggestion of possible intended word entries. Thus, the user must complete tracing of the entire word, and only then can the system provide word suggestions.
The microgeture input system of the disclosed technology is also useful to input text or words in languages whose alphabets do not fit onto a single keyboard and therefore require getting letters/characters from alternative/function/shift keyboards.
At the completion of each microgesture, however, the system in accordance with one embodiment of the disclosed technology is able to auto-complete an input. The ability to use such an auto-complete enables the user to enter only a partial word and choose from the suggestions provided. This enhances the user experience, helps in faster text entry, and even further improves over prior art systems requiring tracing of an entire word because a user will often be able to select a desired input word after simply entering a first microgesture. In this manner, the predictive text algorithm in accordance with the disclosed technology supports error tolerant input for gliding or tapping. The user can be sloppy in entering text and need not provide the exact character input to get the desired word. In effect, letters that are selected in a microgesture entry are considered by a text parsing algorithm as if they had been tapped in a traditional text entry system. Thus, existing text parsing algorithms can be leveraged, and a far simpler and flexible system is provided to a user. Thus, missing and/or incorrect letters can be received and the system is still able to determine correct and intended text entry by a user.
Selecting letters for text entry through the use of gliding allows users to trace the path for fewer than all the characters of a word as shown by the example in
However, in accordance with an embodiment of the disclosed technology, a user need not tap each of the five letters, and indeed may employ one or more microgesture glides to enter the desired letters. There are a number of different sequences for entering the word INPUT in accordance with the disclosed technology.
As shown in
Glide “i-n-p” and glide “u-t”
Tap “i”, “n” and glide “p-u-t”
Tap “i”, “n”, “p” and glide “u-t”
In order to further ease a user experience any glide or tap entered by the user may be maintained on the display for a period of time to provide sufficient feedback to the user to determine what was entered. As noted above, however, such maintenance of glide entry is useful only for the user and preferably may not play a role in determining user input (Of course, it would be possible to use such glide location to determine text input in an alternative embodiment of the disclosed technology.) Thus, while the trace shown in
The following example further depicts the use of such glide and vector algorithms. If the user traces a path “t-h-e, “h” is lower and to the right of “t”. An initial pass through the algorithm will return words like “tie”, “toe” and “the”. Upon viewing of a vector of movement after entering the first letter “t”, it can be determined that the user moved down towards the letter “h”. Because both the letters “o” and “i” are to the right of the letter “t”, they will be discarded (or provided as lower valued selections to the user) as there is a closer option that the algorithm will find. The letter “h” from the potential suggestion choice “the” more closely meets the vector proximity of the input and hence will preferably be chosen as the middle character. Furthermore, error tolerant text entry may be provided, and thus, the user could trace “r-j-w” and still get “the” as a suggested word. The letter “j” is located downward and to the right of the letter “r” and the letter “h” is located downward and to the right of the letter “t”. Since both vectors are in close proximity, the letter “h” will be chosen as the middle character.
An explanation of vector direction will now be described, and is more completely described in U.S. patent application Ser. No. 12/091,700 to Sharan et al., filed Jun. 14, 2007, entitled “DATA ENTRY SYSTEM AND METHOD OF ENTERING DATA”, the entire contents of which is herein incorporated by reference. The disclosed technology advantageously encodes the user inputs with regard to the translational movements i.e. the directions of one key relative to another when actuated sequentially by the user.
The translational movements are preferably coded within a set of predetermined primary directions, these being a reduced set of all possible directions which can be taken between two consecutive key actuations. One example of the reduced set would be the four directions: “up”, “down”, “left” and “right”. Thus, rather than registering each translational movement according to the actual direction taken, the system preferably assigns one of the reduced sets. For example, if the actual direction of movement between two actuations is 10° off vertical, the transitional movement would be assigned, for example to “up”, this being the closest of the primary to a real 10° heading.
Reducing possible directions from an infinite number to a small number, such that all translational directions are categorized within a reduced group, e.g. to “up”, “down”, “left” and “right”, means that many disparate translational directions will be assigned the same primary direction. For example, all movements with a bearing up to ±44° with respect to a direction straight up could be assigned the same primary direction “up”. Moreover, this will include translational movements with the same actual bearing, but with completely different starting points, such that the same vertical movement occurring at many different points on the keyboard could all be represented simply by “up”.
An exemplary predictive text algorithm for use in accordance with the microgesture text entry system of the present disclosed technology is described in U.S. patent application Ser. No. 12/186,425 referenced above. Such an algorithm utilized in accordance with the present disclosed technology addresses the predictive text requirements of the disclosed technology. The user interface will pass the algorithm a character (or string of characters associated with a particular jumper point and zone) and the algorithm will interpret that character (or characters) and return a result to the user interface in the form of one or more suggested word selections, or may return an indication of an inability to properly parse out a word selection thereby indicating that the user has not yet entered enough letters as of yet. Of course, user context, history or the like may be employed to aid in such suggested word selection. Additionally, while the described predictive text algorithm may be employed, the present disclosed technology may be used with any appropriate text parsing or predictive text algorithm. As long as jump points can be associated with a number of different inputs, and these inputs can be differentiated from each other and used to select an intended input by a user, the disclosed technology may be employed.
Thus, using a matrix of probabilities the algorithm knows that for each character pressed the user could have meant characters to the left, right, above or below, or the other characters in the zone indicated in accordance with the disclosed technology as described above. Once the algorithm knows which characters are probable (based upon the keys associated with a selected jumper point, or alternatively, a dynamic or other determination of user intention, i.e. which letters it is likely the user intended, even if a predefined zone is not provided, the algorithm then runs a search against a dictionary, as is more completely described in the '425 application referenced above. Any words that match the string of probable characters are designated and then preferably scored based on distance between characters, frequency, probability, and any number of other characteristics that may be indicative of a user intention. The system can also adopt over time to give a higher probability of selection to frequently used patterns of a particular user. Thus, the prediction mechanism may be personalized for a particular user.
In order to additionally aid in determining user intention, in accordance with the disclosed technology the predictive algorithm of the proposed system may, in addition to relying upon a traditional dictionary, attempt to further determine user intention by determining the context of a particular text input, such as determining whether the input is for an email, chat, search, or the like, or whether the text input is taking place during a particular television show, or while a music application is playing a song. To the extent a user is likely to use different words and/or other language differences, such context determination may aid in the accuracy of the predictive text algorithm. The algorithm may also receive information about information from stored content, the content of one or more databases, such as a user's contact list or the like, to improve the reliability of the prediction. Such context may be stored within a device being used for text entry, or may be stored remotely in a database accessible over the Internet, or other available network. The list of words is then sorted based on that score and returned to the user interface. Thus, the disclosed technology uses the redundancy in the input language or the possible input functionalities to map approximate input into non-ambiguous input per the user intention. Additionally, device features and functionality may be considered when determining user input, thus perhaps taking into account device limitations when determining such input.
While the present disclosed technology has been described in one embodiment employing the predictive text algorithm of the '425 application, it is possible to use other text parsing algorithms. One such alternative text entry system is described in U.S. patent application Ser. No. 12/500,676 to Sharan et al., filed Jul. 10, 2009, entitled “METHOD AND APPARATUS FOR TEXT INPUT”, the entire contents of which is herein incorporated by reference. The '676 application describes a system and method for dividing the keyboard into a plurality of zones, and then considering any text entry in a particular zone to be a selection of all of the letters in the zone. Thus, the user is able to be less precise when entering text. This system may be used in conjunction with the microgesture entry method described above, and thus, each determined starting point, ending point, and any vector directions or bend determination need only indicated zones that include the intended letter, thus further easing the difficulty of text entry for a user.
Additionally, while the disclosed technology has been described for a user making contact with a displayed or touch screen keyboard, it would be equally applicable to a keyboard that did not require contact, but might allow for text entry upon a user drawing a finger or the like in proximity to a particular key or the like. A microgesture may comprise the finger moving above the keyboard to a final key, which might be considered entered when the user removes the finger. Thus, processing may proceed in accordance with the systems and methods described above, but relying on proximity detection rather than contact for key selection and entry.
While a primary use for the method and system of the disclosed technology may be for small mobile devices or other small electronic devices such as cameras, personal assistants, netbooks and the like, such input system may be equally applicable for entry on, for example, set top boxes or television using a remote control, or other input device, such as a WHO controller or the like. Thus, a user may begin entry by pointing to a key (or zone), depressing a selection button, holding down the button while the user moves an indicator through a second letter (if desired) to a third and final letter of the microgesture, indicated by release of the selection button. A more complete description of entry employing the zone input system and such a remote control in the '676 application, previously incorporated herein by reference. Indeed, it is contemplated that the present disclosed technology may be applicable to any input scheme or technology noted in the '676 application, including but not limited to remote control, WHO remote, hand gestures, or other entry technique.
III. Suitable User InterfaceThe user interface comprises a touch screen or similar display that facilitates both tactile input from a user (e.g., by fingertip or by stylus) and visual output. As indicated by the illustrative axes, each point on the display may be described by a unique X-Y coordinate. Additionally, although not shown in
The sample display shown in
The user interface display includes at the bottom a virtual soft-key keyboard area 510 that is configured to display a virtual keyboard and receive tactile user input that indicates character selection from that virtual keyboard. As shown, a virtual keyboard has multiple defined areas arranged in a particular layout on the display. Each defined area circumscribes a key. Except for special function keys, each key in the virtual keyboard is associated with one or more characters, as described in greater detail herein. As used herein, the term “characters” may include alphabetic letters (with or without diacritic markers such as accents), digits, characters or symbols that represent a phonetic or sub-word component, including Japanese kana, Korean jamos, and Chinese zhuyin, or another linguistic and non-linguistic characters such as digits, and punctuation that are contained in abbreviations, chat slang, emoticons, user IDs or URLs. Special function keys in a virtual keyboard are associated with the performance of special functions related to the input of textual information (e.g., capitalization of a letter, deletion of characters). Special function keys may include, but are not limited to, a shift key, delete key, numerical input key (e.g., to select a keyboard layout suitable for faster entry of numerical and symbolic characters), a multi-keyboard selection key (e.g., to select a different keyboard layout and/or virtual keyboard that supports a different character set), a return key, and a space key.
A user may provide different types of tactile input to the virtual keyboard area 510, as described in greater detail herein, in order to indicate his selection of a sequence of characters. The user may intend for a set of his tactile inputs to collectively result in the formation of a particular intended word in the output area 506 shown in the user interface display (or at another output), For clarity, unless the context clearly requires otherwise, the term “word” as used herein denotes any type of linguistic object, including, but not limited to, a string of characters, word stem, prefix, suffix, phrase, sentence, abbreviation, chat slang, emoticon, user ID, URL or ideographic character sequence, emoticons or icons.
Without system intervention, a user's raw tactile input to a virtual keyboard might result in an unintended word being added to the output area that is different than his intended word, if for example, the (1) user is sloppy or inaccurate with his tactile inputs, (2) the user misspells the word, (3) multiple keystrokes are required in order to select a particular character (e.g., if a diacritical marker is needed, or if the virtual keyboard has a half-QWERTY or 12-key layout, such as the one shown in
Thus, to facilitate an improved user experience, the disclosed user interface may provide user input received from the virtual keyboard area 510 (and other areas of the touch-sensitive display and/or other inputs) to a disambiguation system or module such as the disambiguation module disclosed herein. The disambiguation module resolves the received user input to generate a suggested list of word choices (a “word choice list”) that may be rank-ordered or otherwise organized in a logical fashion. The user interface may then display a subset of the word choice list generated by the disambiguation module in the word choice list area 508 (e.g., in an ordered or grouped fashion). Within the word choice list area, one or more default words may be highlighted (e.g., “are” as shown in
Although the user interface display is described herein primarily as a touch-sensitive touch screen, one having skill in the art will appreciate that the user interface and disambiguation systems and methods described herein may be implemented instead using laser projection keyboards, muscle sensing keyboards, fabric keyboards, motion-sensitive video systems (such as the Nintendo Wii®), virtual reality systems, etc. since these systems are capable of displaying a virtual keyboard and translating user actions into positional information within the displayed keyboard that is analogous to the tactile input of a touch screen.
Modes of User InputIn the example shown in
The user interface described herein permits a user to input different types of input features in any order (e.g., to input a trace followed by two taps, followed by three traces, etc.). Furthermore, the user interface permits a user to pause after inputting a first feature (e.g., a trace), and then resume his input (e.g., to input a second trace or to input a tap). After a pause, a disambiguation module may utilize both the input features detected before the pause and after the pause in order to generate a word choice list.
As illustrated by
As another example, the user interface may permit a user to select, via tactile input (e.g., a trace input), a word from a word choice list, wherein the word choice list was generated by audio input (e.g., a word choice list generated using voice recognition software), free form handwriting input (e.g., a word choice list generated using handwriting recognition software) or another mode of input. In the example of
The above examples demonstrate the more general ability of the user interface to mix multiple kinds of disambiguation in a complementary fashion. In these examples, a first modality of disambiguation (e.g. textual disambiguation or audio disambiguation) generates a first word choice list (e.g. an N-best list), the members of which share a common characteristic (e.g., similar character arrangements on a keyboard or a phonetic similarity). Next, when it comes time to disambiguate a second type of user input, the set of possible words that must be searched to disambiguate the second input (e.g., the “dictionary” searched) is limited to the members of the first word choice list (e.g., an N-best list).
Alternate Keyboard Layouts and Other FeaturesThe suitable user interface of
Similarly, unlike conventional systems, in the systems described herein, repeated or prolonged user input features (e.g., multiple input taps or hovers) are not required to select accented characters (or other alternate characters) that may not be visually displayed on a virtual keyboard. As an example, in conventional systems, to select the accented character é on a QWERTY keyboard layout such as the one shown in
Additionally, the user interfaces described herein do not require repeated or additional input features from a user to indicate repeated characters and/or repeated keystrokes. For example, to efficiently input the intended word “moonlight” using the full-QWERTY user interface shown in
Although
Aspects of the disclosed technology can be embodied in a special purpose computer or data processor that is specifically programmed, configured, or constructed to perform one or more of the computer-executable instructions explained in detail herein. While aspects of the disclosed technology, such as certain functions, are described as being performed exclusively on a single device, the disclosed technology can also be practiced in distributed environments where functions or modules are shared among disparate processing devices, which are linked through a communications network, such as a Local Area Network (LAN), Wide Area Network (WAN), or the Internet. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
In one embodiment, the disclosed technology is implemented as a sequence of program instructions that are stored in a non-transitory computer readable media or computer readable memory. The instructions are executed by processor electronics to perform the functions described herein.
Aspects of the disclosed technology may be stored or distributed on tangible computer-readable media, including magnetically or optically readable computer discs, hard-wired or preprogrammed chips (e.g., EEPROM semiconductor chips), nanotechnology memory, biological memory, or other data storage media. Alternatively, computer implemented instructions, data structures, screen displays, and other data under aspects of the disclosed technology may be distributed over the Internet or over other networks (including wireless networks), on a propagated signal on a propagation medium (e.g., an electromagnetic wave(s), a sound wave, etc.) over a period of time, or they may be provided on any analog or digital network (packet switched, circuit switched, or other scheme).
The input and output components may include a global positioning system (GPS), a digital camera, a wireless LAN (WiFi) interface, a motion sensor, accelerometer, a Bluetooth® interface, an external USB or similar port, a headphone or headset jack adapter, projection display, haptic output device (e.g., a vibrator), as well as other components or a communication elements. For example, the Bluetooth® interface may communicate with an external wireless communications component, such as a wireless headset, to not only receive audio input but also provide audio output. In one implementation, the computing system 600 is a mobile phone, whereby the system also includes one or more radios coupled to the CPU 601 to permit wireless communications.
The output modalities of the system may include text-to-speech, audio, vibration, tactile, visual, etc. outputs. For instance the system can let the user know which input features, characters and/or letters have been recognized by adding a ‘vibrate’ or ‘sound’ feedback. Similarly, the system can provide the user with more detailed understanding of the features extracted and the level of uncertainty involved in its determinations.
The memory 610 may further comprise an operating system 611, a disambiguation module 613, and one or more application programs (shown as application programs 1 through N 614-616). The application programs can include word processing programs, slide presentation or image creation programs, email programs, SMS/MMS messaging programs, instant messaging programs, or other text-based communication or text-based document creation programs.
The disambiguation module 613 receives user input data from the operating system, including data related to a user's input with the touch screen 603, and uses the received input data to identify a user-selected word, which it may provide to an application program. For example, the disambiguation module 613 may receive a series of notifications from the operating system 611 of finger down events (when a user depresses a point on the touch screen), finger move events (e.g., when a user moves his finger while depressing the screen), and finger up events (e.g., when a user lifts their finger from the touch screen). (While generally described herein as using a finger, the user may use anything to touch or actuate the touch screen, such as a stylus.) Each of the received notifications may include a time series of position data, e.g., an X-Y set of coordinates (x, y) with a timestamp t (i.e., the coordinates (x, y, t)), which reflect the user's tactile input to the touch screen 603. Additionally, if the touch screen 603 comprises pressure sensors, the received notifications may include a time series of position and pressure data, e.g., a set of coordinates (x, y, t, p) that reflect the user's tactile input to the touch screen 603, including the applied pressure, p. The operation system 611 may also provide other kinds of input data to the disambiguation module 613, including multi-touch inputs (e.g., additional sets of simultaneous coordinates (x, y, t) representing a second finger simultaneously contacting the touch screen), audio inputs (e.g., voice inputs), visual inputs, accelerometer inputs, location data, sensor data, etc.
The disambiguation module may additionally receive and process dynamic context data from the operating system and/or application program(s) in order to facilitate its word identification. Context information is any information that may help inform the disambiguation module of the user's intent, and may include information such as sentence or word-pair context, language context, application-specific context (e.g. data fields/tags), user-specific context, location, time of day etc., as described in greater detail herein For example, the module may receive context information such as the last word selected by the user, other sentence-level context, the virtual keyboard layout utilized, the type of entry field into which the word will be entered (e.g., a “Contacts” field), location data (e.g. where the computing system includes a GPS receiver or accesses location data), calendaring data for the user, and/or context data or applications used or being used that may utilize the identified word. As another example, the disambiguation module may receive an “N-best” list from another mode of input. An N-best list is a rank-ordered list of suggested words, which may be capped at a fixed size (e.g., N). As one example, an N-best list may be generated from another type of disambiguation application program or module, such as a list of words generated from a speech recognition program (e.g., a list of homophones) or a handwriting-recognition program. As still another example, if a user is entering text in order to respond to an earlier email, the disambiguation module may receive a list of words used in the earlier email.
The disambiguation module may include a character selection submodule 618, a word list selection submodule 620, a keyboard mapping database 622, a vocabulary module 624, a user context database 626, an application context database 628 and a disambiguation parameter database 630. While depicted as separate modules, a single module, program or routine may incorporate two or more of the modules.
The character selection submodule 618 is configured to analyze received user input or input data to extract input features (or “input features”), e.g., taps, corners, segments, hovers, or other features. The character selection submodule 618 is further configured to analyze the extracted input features using a language-independent keyboard mapping database 622 to produce an input sequence that comprises an ordered set of feature objects (e.g. rank ordered set of letters for a word being spelled). Each feature object in the input sequence is associated with a set of characters associated with corresponding keys for one of multiple virtual keyboards or key layouts, a set of proximity weights, and one or more feature types. The proximity weights reflect the likelihood that a user intended to effectuate a particular keystroke via a particular input feature (e.g., a tap, corner, segment, or hover). This likelihood may be based on the proximity between the location(s) of the input feature and the location of the key on the virtual keyboard. During its operation the character selection submodule may utilize parameters, policies, and preferences retrieved from the disambiguation parameter database. Further details on aspects of the character selection submodule 618 may be found in the assignee's U.S. Patent Publication No. 2006/0274051 entitled “VIRTUAL KEYBOARD SYSTEMS WITH AUTOMATIC CORRECTION, and incorporated by reference herein.
The keyboard mapping database 622 permits the character selection submodule 618 to evaluate the likelihood that that a user intended to effectuate a particular keystroke by inputting a particular input feature. For each type of virtual key layout utilized by the system and displayed to the user on the touch screen, such as a full QWERTY (e.g. 28-key, 32-key, or more), 14-key half-QWERTY keyboard, 12-key “numeric” keypad, etc. The keyboard mapping database 622 provides an area-to-key mapping that maps a particular key to a particular defined area or point on the touch screen display, and vice versa. For example, in a full 28-key QWERTY key layout, an area-key mapping may define the “E key” using the set of four points of the display: {(x1,y1), (x2,y2), (x3,y3), (x4,y4)}. As another example, in a 4×3, 12-key layout, such as the one shown in
Additionally, for each type of character layout utilized by the system, (e.g., an American English full QWERTY character layout vs. a French full AZERTY character layout) the keyboard mapping database provides a key-character mapping that maps a particular key (e.g., the “E key”) to one or more characters (or for special function keys, to a particular function), and conversely, maps characters to a particular key. A key-character mapping may describe a one-to-many relationship, wherein a particular key is associated with more than one character, including characters that are not displayed on the virtual keyboard to the user of the touch screen. For example, in a key-character mapping related to a full QWERTY key layout, an “E key” may be mapped to two or more of the following characters: E, e, ë, ê, é, è, {hacek over (E)}, , , ė, Ė, {hacek over (e)}, {hacek over (E)}, ē, Ē. As another example, in a key-character mapping that is related to a 12-key layout like the one shown in
In addition, keys may combine to form Romanization, transliteration or transcription inputs allowing the user to enter foreign characters using standard Latin-1 based keyboard. Users can type a word using the Latin-1 based keyboard the way it sounds, and the transliteration component will produce a word in its native script. For example typing “serodija” results in the Russian word “”. One or more Latin1 characters can combine to form one or more Russian characters. These mappings can be implemented in a variety of ways, using state machines, or other commonly used methods. This translation, or transliteration layer adds a layer of mapping or additional submodule implemented as probabilities. In cases where there is a one to one mapping, the implementation is straightforward. When there are multiple interpretations, each mapping can be assigned a probability value. For example, in
Hindi, letters may have more than one valid mapping. For instance: the character has the following mappings: i, yi, ee and yi. Similarly the character maps to i and yi. Context or other parameters noted here may be used to further disambiguate or order the displayed listed of options to the user.
The word list selection submodule 620 is configured to receive the input sequence generated by the character selection submodule. The word list selection submodule 620 may analyze the input sequence using a variety of other inputs in order to generate a rank-ordered or otherwise structured word choice list, such as those shown in
To facilitate word selection, the vocabulary module 624 provides one or more dictionaries that define a set or logical group of words. For example, an “American-English” dictionary may define a logical group of words in common usage by English-speaking Americans, while an “American-English-Msg” dictionary may define a logical groups of words (including emoticons) commonly used by English-speaking Americans in text and multimedia messages (e.g. SMS, MMS or similar messages). The vocabulary module 624 may also provide additional language information, such as static or dynamic frequency information within a particular logical group of words (e.g., within a particular language) and/or particular context. For example, the additional language information may include frequency information in accordance with a linguistic model, which may include one or more of: frequency of occurrence of a word in formal and/or conversational written text; frequency of occurrence of a word when following certain preceding word or words; proper or common grammar of the surrounding sentences; frequency of the word being utilized as a left-stem or right-stem (or prefix or suffix), etc. In addition to frequency, transition probabilities may be assigned to particular transitions, linking one class of prefixes to likely stems or stems with one or more classes of suffixes while rejecting prefixes with other words in the dictionary which would form illegal words. For example the word work may have high transition probabilities with suffixes -er, -ers, -ing, -s and -ed, which often form valid words.
Notably, the disambiguation module does not need to change its dictionary if the user changes his keyboard configuration. Furthermore, by using a keyboard mapping database 622, the disambiguation module may be utilized within any system regardless of the keyboard layout.
The disambiguation parameter database 630 may store various parameters, policies, and preferences for performing interaction or input feature extraction, character selection and/or word selection. For example, the disambiguation parameter database may include parameters and policies that aid in the selection and application of distance functions, proximity metrics, matching metrics and post-processing steps, all of which are described herein.
To facilitate improved character and/or word selection, the user context database 626 and the application context database 628 may provide context information to the character selection submodule 618 and/or the word list selection submodule 620 to inform its processing; these submodules may also receive dynamic context information from an application program and/or the operating system. The entry of text, the selection amongst alternatives, contextual information and multiple modalities are managed by the disambiguation module. The system may use inputs from the user, the context of the application, external sensors and context resources and disambiguation, and correction approaches to determine the user's intent as noted herein. Based on an understanding of the user's intent, the system maintains the context and state of the application, and manages the composition of inputs and synchronization across multiple input modalities. By interfacing with business logic, the system can produce multimodal output to be presented to the user. The interaction, the business logic and the context and state may be managed through an interaction script or more complex state machines (State Chart XML or SCXML) or languages which are used for the control of user interfaces such as XHTML+Voice Profile or SALT.
The user context database 626 may include any historical information relating to the user's prior character and/or word selection behavior, such as previous words utilized by the user, the frequency of words entered by the user, and common phrases employed by the user Other examples of user context information include: word pairs entered by a user, a user's unique vocabulary in a specific application (text messaging vs. email, for example), a user's level of ‘sloppiness’ (increasing the character region size) while entering text into the system, etc. Additionally, user context information may reflect how frequently a user selects words from a particular language or what language the user primarily uses. For example, if the system utilizes both French and English language dictionaries for a bilingual user, the user context database 626 may track the relative frequency with which the user selects French words versus English words in order to determine the user's “primary language,” and then words from that dictionary may be preferred or presented first over words from the other dictionary.
Alternatively or additionally, some user context information may be reflected in a user-specific dictionary and/or other data structures in the vocabulary module 624. Additionally or alternatively, the user context database 626 may also provide user-specific policies and parameters that may override default disambiguation parameters stored in the disambiguation parameters database 230.
The application context database 628 may permit the disambiguation module 613 to access historical information related to character or word selection within a particular application program. The application context database 628 may also provide application-specific dictionaries and/or application-specific policies and parameters that may override default disambiguation parameters stored in the disambiguation parameters database 630. Context can also come from text fields in which the information is supposed to be entered. As an example, in some implementations, the disambiguation module may be designed to integrate with specific types of application programs that define application-specific fields, tags or labels that may be associated with a particular word entry. As an example, the text-field of a form (“Name” or “Email”) may be a label that could be utilized to provide context information. The application context database 628 may then correlate a user's historical word entries with application-specific field(s) to provide additional context information. The system may define, in the application context database, application-specific fields and rules (e.g., grammar rules, matching metrics and/or post-processing rules) that reflect how the field information and other context information may be utilized during word selection. These rules may even specify that word suggestions for a particular field can depend on what words a user has entered in other fields.
The system informs the disambiguation module of the intended textual information of the field (name, phone number, date, email address, password) and/or provides it with a grammar which describes the information. For instance a form which requires State, ZIP and City to be entered would use an approach that would be able to provide appropriate grammars based on the context and content of the other fields. If the State is “WA”, the grammar for cities is constrained by actual cities in the State of Washington and ZIP codes are similarly constrained. When City and State are entered, the range of ZIP codes is again constrained. Complex interactions between fields and grammars can simplify the entry of data in such “forms”. Alternatively, the system can learn the context for often used text fields.
To illustrate the use of this context information, consider the example of form filling, where the disambiguation module may remember what words users have entered in particular field(s). To illustrate, in an order form a user may enter his work email into an email field and then the disambiguation module would predict that the shipping address field would contain his work address. In contrast, if he enters his personal email address into an email field, the disambiguation module would predict his home address in the shipping address field.
Other contextual information includes environmental parameters, such as time of day, location, calendar entries, user settings, application settings and system settings.
While shown as separate databases, two or more of the databases described herein may be combined, and may be implemented in any data structure or logical construct.
The character selection submodule 618 and/or the word list selection submodule 620 may keep the user context database 626 and/or the application context database 228 current by providing notifications about a user's present behavior. For example, the word list selection submodule 620 may return information to these databases when a user selects a word from a word selection list.
Additionally, as described previously, the disambiguation module 613 may also receive additional dynamic context data from the operating system and/or an application program. Although not shown, the disambiguation module may additionally comprise voice recognition software configured to disambiguate speech or audio inputs, as described herein.
Suitable Disambiguation MethodsThe flow diagrams described herein do not show all functions or exchanges of data, but instead provide an understanding of commands and data exchanged under the system. Those skilled in the relevant art will recognize that some functions or exchange of commands and data may be repeated, varied, omitted, or supplemented, and other (less important) aspects not shown may be readily implemented. Further, although process steps, method steps, blocks, algorithms or the like may be described in a particular order, such processes, methods, blocks and algorithms may be configured to work in alternate orders. In other words, any sequence or order described herein does not necessarily indicate a requirement that the steps or blocks be performed in that order. The steps or blocks of processes and methods described herein may be performed in any order practical, and some steps may be performed simultaneously.
The period of time that the disambiguation module waits for and receives user input data may vary. For example, in one implementation, the disambiguation module may wait to receive an entire group of notifications corresponding to (1) a finger down notification, (2) the next finger up notification, and (3) all of the finger move notifications that occur in between these first two notifications, before proceeding to block 730. In some other implementations, at block 710, the disambiguation module 613 may continuously analyze incoming user input data to identify other types of logical groupings in the user input data and may proceed to block 730 only when a logical group is received in its entirety. For example, at block 710, the disambiguation module may continuously analyze received notifications as they arrive to determine if an input feature (e.g., a tap, a corner, a hover, etc.) or special function has occurred (using, e.g., the techniques described herein at block 802 and block 730, respectively) and when a logical group of input data that corresponds to a single input feature or special function is detected, processing of that logical group may then proceed at block 730.
At decision block 730, the disambiguation module determines whether the user has provided input that indicates that the disambiguation module should perform a special function, such as word selection, character deletion, or a function related to a special function key (e.g., shift, caps lock, keyboard selection key, etc.). As shown in
At block 740, the disambiguation module utilizes the character selection submodule 618 to identify what input features (e.g., taps, corners, segments, hovers, audio inputs, etc.), if any, are indicated by the received input data. Furthermore, at block 740, the disambiguation module utilizes the character selection submodule to generate a feature object for each identified input feature. A feature object includes a character set, a set of feature types, and a proximity weight set. The character selection submodule may perform these tasks in accordance with the processes shown in
During block 740, the character selection submodule may also characterize each identified input feature using one or more feature types. For example, the character selection submodule may characterize each input feature as a tap, a corner, a segment, a hover, etc. Additionally, the character selection submodule may also determine whether an identified input feature is an “intentional input feature” (or “intentional feature”). An intentional input feature is an input feature that appears more likely to correspond to an intended keystroke by a user as compared to other types of input features. If a feature is not an intentional input feature, it is instead characterized as an “incidental input feature” (or “incidental feature”). When matching candidate words to input features, intentional input features may be treated differently (e.g. weighted higher) than incidental features, as described herein.
To illustrate, in some implementations, the disambiguation module treats all taps and all corners as intentional input features, since these features indicate a distinct and abrupt change in the user's input behavior. As another illustration, the disambiguation module may treat all taps, corners and hovers as “intentional” input features. The disambiguation module may further refine which input features are characterized as intentional or incidental by employing other factors described herein. Characterization of input features is described in greater detail herein with respect to
At block 750, the disambiguation module may increment an estimated input length variable (L) to reflect the number of new intentional input features, if any, that were identified at block 740. As described herein, the disambiguation module may later use the estimated input length L to pre-filter dictionaries or for other purposes when generating a word choice list at block 370. Similarly, at block 750 the disambiguation module may additionally update other variables that reflect the composition of the input sequence (e.g., a variable that reflects the number of segment features present in the input sequence).
At block 760, the disambiguation module appends any newly generated feature objects to an input sequence. The input sequence reflects the ordered set of all input feature objects created since the user last selected a word. The various feature objects in the input sequence are ordered on the basis of time.
At block 770, the disambiguation module utilizes the word list selection submodule 618 to create a word choice list using the current input sequence and to output this list to the user (e.g., via the word choice list area on a touch screen display). In this way, the disambiguation module gives the user an opportunity to select his intended word. The creation of a word choice list is described in greater detail herein with respect to
In one variant of the system, if the user selects a word in the word choice list, the system may respond by only showing words that start with the selected word. In addition to words, the selection list may also contain objects which are represented by the word. For instance the word love may have an icon in the shape of a heart associated with it. In addition, words which are retrieved from different vocabularies or providers, such as the contact dictionary, may offer additional options when selected. For instance, if a user has entered a name which matches a name in the contacts dictionary, the user may select to add the phone number, driving directions, e-mail address or any other related object, including an object which represents the contact in a machine readable format (vCard etc). Other words may be sent to ‘recognizers’ which may recognize keywords, phone numbers, potential calendar entries etc. When the user has typed: “Let's meet for Lunch at Joey's,” the recognizers could add an action: add to calendar, call Joey's, make reservation, insert directions and more. When the user selects any action, the system may launch an activity which at completion returns the user to the activity of text entry. Such activities may be launched with any relevant contextual parameters that may help the activity to be better informed about the intent of the user, including user history, user preferences. In another example, the user may want to add a song to the email. By typing the name of the song, artist, album or even such keywords as mp3, song, the user may launch an activity that allows the user to seamlessly attach the object. Optionally, the word(s) which were selected to cause the start of the activity may be automatically deleted.
At decision block 784, the disambiguation module determines if the input data indicates that the user has selected a word (e.g., from a word choice list). User selection of a word may occur, for example, via vocal selection (as described previously), if the user taps, traces or hovers near or over a special key (e.g., a space bar, tab key, enter key, etc.), or if the user taps, traces or hovers on or near a word in the word choice list area. Of course a user may select a word by any other suitable method, such as by shaking a device, pushing an input button, etc.
In one example, the disambiguation module 613 may provide “auto-spacing” functionality by analyzing the types of input features utilized by the user to determine whether the user's input indicates that she intent to end one discrete word and begin another. In some implementations, the disambiguation module 613 will interpret two consecutive traces in an input sequence (i.e., without another type of input feature like a trace or hover intervening) as an indication that the user intended to (1) select the default word determined for the last trace and the other previous features in the input sequence, and (2) begin a new word starting with the second trace. Thus, at decision block 784, the disambiguation module may determine whether the last two input features were traces. If they were, the system may (1) proceed to block 792 in order to process the user's selection of the last default word, and (2) restart process 700 with the components of the second trace (i.e., its corners and segments) as the first features in a new input sequence. Otherwise, the system may simply continue the process 700. Of course, any other suitable feature combination could be utilized to test for auto-spacing at decision block 784. More generally, the disambiguation module 613 may perform auto-spacing upon the occurrence of a particular feature composition within the input sequence; the triggering compositions may be set using disambiguation parameters and preferences in the disambiguation parameters database 630.
Additionally, for certain languages (e.g., compounding languages) or contexts (e.g., URL entry), the auto-spacing feature may be disabled or implemented in a different fashion. For example, if two consecutive traces (or a different triggering condition) is detected, the system may (1) proceed to block 792 in order to process the user's selection of the last default word, and (2) restart process 700 with the components of the second trace (i.e., its corners and segments) as the first features in a new input sequence, and may set a flag that indicates that the last default word and the new word should be compounded when the new word is selected.
If the input data indicates the user has selected a word, processing proceeds to block 792 where the disambiguation module processes the selected word, otherwise processing proceeds directly to block 786. At block 792 the disambiguation module outputs the selected word to the desired output (e.g., to the output area of a window associated with a currently running application program). The nature of this output may be affected by context (e.g., sentence context), flags, parameters, or other factors. For example, if the immediately previous word in a sentence is a common left stem in a compounding language, and the currently selected word is a common right stem in the same compounding language, the disambiguation module may determine whether to insert a space before the currently selected word, or whether to compound the newly selected word to the immediately previous word. To make such a decision, the disambiguation module may utilize language information from the vocabulary module 624 and/or grammar module. As another example, if the disambiguation module may eliminate a spaces between words if it can infer from the context (e.g., the previous entry of “google.com”) that a user is entering a URL.
Additionally, at block 792, the disambiguation module may update the vocabulary module to reflect the selection. For example, the disambiguation module may update frequency and recency information related to the selected word and/or may add a user-created word to a user-specific dictionary or temporary dictionary until the user has entered the same word several times. This avoids the accidental addition of garbage words to the user database. The system may even take notice of later edits to the word to remove such accidental additions to the dictionary without further user intervention. In addition, the user may indicate that a user generated word should be added immediately to the user dictionary through a particular action, such as a long press on the word. In addition to the word, the system may also include information about the textual context of the word such as the preceding word/words to further improve future predictions (e.g. with word pairs, as noted herein).
As another example, the disambiguation module may update a dictionary that is both application- and user-dependent (e.g., if the system maintains different user dictionaries for email vs. text messaging). As yet another example, the disambiguation module may retrieve the previous word entered in order to update the user context database 626 with information that relates to word pairs utilized by a user. For example, if the user frequently used the word pair “school book”, and in a current entry a previous word was “school”, then the work “book” may receive a higher probability and be promoted within the word choice list. As another example, in some implementations, the disambiguation module may update the application context database 628, e.g., by associating the selected word with an applicable field label.
Additionally, at block 792, the disambiguation module may analyze current and/or historical received input data, current and/or historical input sequences and current and/or historical selected words in order to adjust its future behavior. For example, the disambiguation module may compare the location of the keys corresponding to a user's selected word to the location of detected features in an input sequence to determine if it can discern any trends in how the user inputs data to the user interface (can the system learn the user's patterns or habits based on a set of similar user inputs). If trends are discerned, the disambiguation module may update user-specific or other disambiguation parameters to reflect any observed trend. As examples, the disambiguation module may adjust its character selection parameters if a user always overshoots or undershoots a character at a corner (or does so in certain cases), if a user wiggles slightly during a tap in an effort to apparently provide extra emphasis or indicate a repeated letter, if the user typically pauses mid-segment in order to indicate an intended keystroke, etc. As another specific example, the disambiguation module may adjust applicable distance function and/or proximity metric parameters (e.g., those used to generate a corner oval, as described herein) to reflect how a user is particularly precise or imprecise in their input entry. In many instances, the disambiguation module employs a feedback loop, wherein the module stores the series of user inputs associated with the same word the user selected from the word list, compare those inputs, and make adjustments to ordered list of words on subsequent word lists, disambiguation of weights to inputs, etc. Different adaptation models may exist when the system detects a single finger entry (such as the thumb when holding the device in one hand) versus the entry through multiple fingers when holding the keyboard with both hands. In addition, the models can take into consideration other contextual parameters such as speed of the user, the availability of environmental light, etc.
As another example, the disambiguation module may analyze the selected word to determine the primary or preferred language of the user by determining the frequency with which the user selects words from a particular language, so that the user need not explicitly set his preferred language.
Additionally, at block 792, the disambiguation module may also update the application context database to reflect the frequency or recency of the word or phrase in the context of a particular application program (e.g., a text message program) and/or a particular application field (e.g., a “name” field). If appropriate, selected fields such as password fields may be remembered, but can be protected from being retrieved until the user enters a secret passcode, gesture, shake, speech command which unlocks the secured memory, vault or location containing these sensitive words for a period of time which may depend on various context parameters. For instance, when a device has entered a ‘sleep’ mode or a ‘screen saver’ mode, the vault may auto lock. The behavior of the vault may be defined by the user, if necessary on a per case basis. Furthermore, the user may have defined various ‘personas’ or guises and based upon explicit or implicit activation, usernames, passwords and other sensitive data may change. For instance, during working hours, the persona would be ‘work’ while after hours or during the weekends the persona could be ‘leisure’ or ‘home’ or ‘family’, and differing dictionaries, databases and preferences of the system would be used depending upon the persona.
Finally, at block 792, the disambiguation module may clear the input sequence and restart a new process 700 as shown in
At block 786, the disambiguation module determines whether the input data indicates the deletion of characters. For example, the disambiguation module may determine whether a delete special function key was been depressed (or traced), or whether the user shook the device or otherwise performed a deletion gesture. If a deletion event is detected, processing proceeds to blocks 794 and 796 before block 788. Otherwise processing proceeds directly to block 788. At block 794, the disambiguation module removes one or more feature objects from the input list. Typically, for a single depression (e.g., tap) or trace of a delete key, the disambiguation module may remove the last intentional feature object and any subsequent incidental feature objects from the input sequence. However, the disambiguation module may remove a different subset of feature objects from the input sequence; the number, nature and order of the features removed at block 394 may depend on the precise nature of the input data received (e.g., whether a “delete all” key or gesture was utilized instead of a “delete one” key or gesture; whether a user's deletion gesture or input indicated that a particular feature should be deleted, etc.) At block 794, the disambiguation module may additionally decrement the estimated input length variable L to reflect the deletion of one or more intentional feature objects from the input sequence. At block 796, the disambiguation module creates a new word list using the updated input sequence and outputs this new list to the user (e.g., as described in
At block 788, the disambiguation module determines whether the input data indicates a special function that is related to how the disambiguation module should select candidate characters and/or words when generating a word choice list. For example, the disambiguation module may determine whether the user has tapped or traced over a shift key, a caps lock key, a keyboard selection key, or a similar key that modifies how the disambiguation module should select characters and/or words. In another example, the keyboard may contain a ‘Command’ key which allows the user to quickly enter system commands. For instance by tracing over a Command key the word “COPY”, or tracing from the Command key to then trace the word COPY on the keyboard, the user can copy selected text. Similarly, a CTRL key could be used to map to commonly used functions: CTRL-A “Select All”, CTRL-CN to copy and paste, CTRL-E to center text etc. In addition, the user may be able to define commands and short cuts. As another example, the disambiguation module may determine whether the user has indicated a similar intent via other input means (e.g., by tracing off-keyboard (e.g. to indicate an undisplayed character is desired); by shaking, rubbing or other physical gesture; or by voice command). If a special function is indicated, the disambiguation module takes appropriate responsive action at block 798, for example, by adjusting a character determination flag (such as a capitalization flag, an area-key mapping selection flag, a key-character mapping selection flag, and/or a diacritical marker flag), or a word determination flag (such as a capitalization flag), by adjusting the contents of the input sequence, and/or by taking any other suitable action
At block 790, the disambiguation module determines whether the input data indicates another function. For example, the disambiguation module may determine that the user has tapped on a menu button. If another function is indicated, at block 799 the disambiguation module takes suitable steps needed to perform the other indicated function before the process 315 returns. Otherwise, the process 715 returns immediately.
The process 800 begins at block 802, where the character selection submodule processes the raw input data (e.g., a time series of finger down, finger move and finger up notifications or events associated with position measurements) to identify the types of input feature(s) present and the location of the identified input features. To identify each input feature, the character selection submodule may analyze a group of input data corresponding to a single depression of a user's finger, i.e., a time series of input data that starts with a finger down event, any number of finger move events, and a finger up event. The character selection submodule may then analyze this group to determine movement and/or pressure metrics from the related time series of position/pressure data, X−Y={x(t),y(t), and optionally, p(t)}. For example, the character selection submodule may calculate first-order values such as: the total time of the depression, the total distance traveled (e.g., by summing all of the line segments reflected in X−Y), the range of (x(t), y(t)) or some combination thereof, the net difference in pressure during the depression, etc. As another example, the character selection submodule may perform higher-order and/or time-dependent calculations, such as determining the velocity of the user's finger along a trace or performing well-known geometric analysis algorithms that may be used to identify corners, inflection points (e.g., using second derivatives), and/or other interest points.
At decision block 805, the character selection submodule determines whether the input data indicates a tap feature. To identify a tap feature, the character selection submodule may simple determine whether the total depression time falls under a threshold tap time value and/or the total range of movement during the time series also falls under a threshold tap distance value. If these conditions are satisfied, the character selection submodule may identify a suitable location (e.g. the initial, average, or final (x,y) coordinate seen during the depression window) to utilize as the location of the tap feature. However, the character selection submodule may apply any other suitable method to identify a tap, to distinguish a tap from other types of features and/or to determine the location of the tap. For example, the character selection submodule may ignore low-pressure taps.
If a tap is detected at decision block 805, processing next continues at block 850, where the character selection submodule generates a feature object for the tap by characterizing the tap, identifying a related character set associated with the tap, and calculating proximity weights for each character in the character set. Typically, a character selection submodule will characterize a tap feature as an intentional feature as described previously, but under certain circumstances dictated by disambiguation parameters (e.g., a low-pressure tap); the character selection submodule may instead characterize a tap as an incidental feature. Additionally, the character selection submodule may characterize the tap feature as both a “finger-down” and “finger-up event.” Suitable methods for identifying a related character set and calculating proximity weights for a detected tap feature are described in greater detail herein with respect to
At block 810, the character selection submodule determines whether the input data contains a corner feature. The character selection submodule may detect a starting corner and an ending corner of a trace simply by utilizing the first and last members or data points of the X-Y set of a trace. To identify a mid-trace corner feature, the character selection submodule may utilize any suitable curvature analysis method to detect an abrupt change in the user's direction. As one example, the character selection submodule may repeatedly evaluate a series of coordinates (e.g. three consecutive points) on the trace to determine if the central point falls more than a threshold distance from the line connecting the other two points. If it does not, the evaluation repeats for the next set of points on the trace. If it does, the central point is identified as a corner location. This process is then iterated to find corners. As another example, to identify a mid-trace corner feature, the character selection submodule may evaluate the pressure, velocity and/or acceleration behavior along a trace. However, the character selection submodule may apply any other suitable method to identify a corner, to distinguish a corner from other types of features and/or to determine the location of the corner.
If a corner is detected at decision block 805, processing next continues at block 860, where the character selection submodule generates a feature object for the corner by characterizing the corner, identifying a related character set associated with the corner and calculating proximity weights for each character in the character set. Typically, a character selection submodule will characterize a corner feature as an intentional feature as described previously, but under certain circumstances dictated by disambiguation parameters (e.g., a low-pressure corner or a mid-trace corner with a lower curvature); the character selection submodule may instead characterize a corner as a incidental feature. Additionally, the character selection submodule may characterize a starting corner feature as a finger-down feature and characterize an ending corner feature as a finger-up feature. Suitable methods for identifying a related character set and calculating proximity weights for a detected corner feature are described in greater detail herein with respect to
At decision block 820, the character selection submodule determines whether a segment feature is detected in the input data. The character selection submodule may detect a segment and select the location data for a segment feature simply by selecting the X-Y trace data that falls between two adjoining corners (where, again, a “corner” can include an endpoint of the trace). The submodule may also determine whether where the trace is greater than an area of key on the virtual keyboard to indicate a potential segment between two features. However, the character selection submodule may apply any other suitable method to identify a segment, to distinguish a segment from other types of features and/or to determine the location of the segment.
If a segment is detected at decision block 820, processing next continues at block 870, where the character selection submodule generates a feature object for the segment by characterizing the segment, identifying a related character set associated with the segment and calculating proximity weights for each character in the character set. Typically, the character selection submodule will characterize a segment feature as an incidental feature unless otherwise dictated by disambiguation parameters. Suitable methods for identifying a related character set and calculating proximity weights for a detected segment feature are described in greater detail herein with respect to
At decision block 830, the character selection submodule determines whether another type of feature is detected in the input data. For example, the character selection submodule may identify any hovering or anomalous localized pressure that occurred during an X-Y trace. Examples of other features that the character submodule may detect include: spoken commands (e.g., to select a letter), a second finger tapping the screen during a trace, a moderate change in curvature, and a constant change in curvature (e.g., suggesting a circling motion). If another type of feature is detected, processing continues at block 880, where the character selection submodule generates a feature object for the feature by characterizing the feature, identifying a related character set associated with the feature and calculating proximity weights for each character in the character set. The character selection submodule may characterize the identified feature as an intentional feature if the feature is a clear manifestation of the user's intent to effectuate a particular keystroke. For example, a clearly spoken letter may be characterized an intentional feature.
During the process 800, the character selection submodule may adjust the location of an input feature to offset known peculiarities or trends in a user's input style by evaluating various user-specific disambiguation parameters. For example, the character selection submodule may adjust the position of a tap feature to the left of its actual detected location if the user has a tendency to tap to the right of keys.
At block 806, the character selection submodule loads an area-key mapping from the keyboard mapping database 622 that reflects the key layout (e.g., 12-key, 28-key, etc.) of the virtual keyboard being displayed to and utilized by the user.
Then, starting at block 809, the character selection submodule performs a loop for each key in the area-key mapping. The loop begins at block 812, where the character selection submodule determines the location of the key by using the area-key mapping, which typically is a language-independent mapping. Next, at block 815, the character selection submodule calculates the key's distance score by evaluating the applicable distance function using the location of the key and the location of the input feature, as described in greater detail herein with respect to
At block 827, the character selection submodule uses one or more key-character mappings to identify all characters associated with the key, even characters that may not be displayed to the user on the virtual keyboard (e.g., an accented letter). The submodule may determine which key-character mappings to use at block 827 by evaluating various flags (e.g., a keyboard selection flag), dynamic context information (e.g., a user's language settings received from the operating system), user preferences, and/or disambiguation parameters. For example, the submodule may receive an indication that the user interface has a full QWERTY keyboard with a French character layout, and may use this information to select a suitable key-character mapping. The character selection submodule may utilize more than one key-character mapping at this block (e.g., to accommodate a multi-lingual user who may wish to continuously enter text without switching between different on-screen character layouts).
At block 833, the character selection submodule adds each character identified at block 827 to the feature object's character set and associates each identified character with the calculated key proximity weight. At block 836, the character selection submodule determines if there are more keys in the area-key mapping to process. If so, the loop repeats starting at block 812, otherwise the process 890 returns.
As shown in
Similar to the methods described previously with respect to
At block 915, the word submodule may pre-filter the dictionaries using some portion of the input sequence, the estimated input length L and/or the context to generate a subgroup of words. As a first example, the word submodule may pre-filter the dictionary on the basis of whether a word matches the first “finger-up” feature and/or the last “finger-down” feature (described previously), since these features provide possibly the strongest indication of the user's intent to effectuate a particular keystroke. As a second example, the word submodule may pre-filter the dictionary on the basis of estimated input length L. For example, the word submodule may filter out any words that are shorter than L minus X, where X is determined using disambiguation parameters. As a third example, the word submodule may pre-filter the dictionary to a list of words generated by voice recognition software. As a fourth example, if the dynamic context indicates that the user is typing in a name field, it may filter the dictionary to only include proper nouns. Of course, the word submodule may use a combination of these filtering techniques.
At block 920, the word submodule selects the next word in the subgroup on the basis of language data, context, word length, and/or any other suitable factors. For example, the word submodule may sort the subgroup on the basis of frequency of use and present these words in that order.
At block 925, the word submodule determines the matching score for the selected word by evaluating a match metric using input sequence, context, word length, and language data. A suitable method for determining a matching score using a match metric is shown in
At decision block 930, the word submodule may determine if the word has a matching score that exceeds a minimum threshold value that is determined using disambiguation parameters. If it does, the word is added to the word choice list at block 935 before the process proceeds to block 940. Otherwise processing proceeds directly to block 940. The word submodule may eliminate poorly matching words and provide a word list of fixed or minimum size. Alternatively block 930 may be eliminated from the word submodule.
At block 940, the word submodule determines if there are more words to process in the subgroup. If there are, the processing returns to block 920 so that another word may be evaluated.
Otherwise, the process proceeds to block 945, where the word choice list is post-processed. For example, the word submodule may sort the list by matching score and/or by selecting one or more default words that have the highest matching score. As another example, after sorting, the word submodule may group or adjust the ordering of words in a logical fashion to facilitate improved user browsing behavior. For example, the word submodule may rearrange the list to group together words that have the same root (e.g., “interest,” “interesting,” and “interestingly”) to permit a user to scan the word choice list more rapidly. As yet another example, dynamic or static context data may be utilized at block 950 to adjust the list, e.g., to promote or demote particular words in the list and/or to otherwise adjust the order of the list. For example, if a user has utilized a particular word in the same application program before, that word may be promoted. As another example, the word submodule may promote words that end with a punctuation mark over words that have embedded punctuation marks. As yet another example, if more than one language dictionary was utilized to generate a word list or the disambiguation module is otherwise providing bilingual integration (e.g., to accommodate a bilingual user), the post-processing may promote words in the primary language of the user and/or demote words in a secondary language of the user. The “primary language” of the user may be determined by language parameters or flags and/or by user context data from the user context database 626.
As still another example of post-processing, the disambiguation module may promote or demote particular words based on application-specific field information and/or rules or grammar retrieved from the application context database.
In another example of post-processing, the disambiguation module may determine whether the user is attempting to utilize a word that may not be listed or found in a standard dictionary, such as neologism like “podcast,” or an uncommon proper noun. If it determines, based on typical character sequences in a language that the user may be attempting to enter an “unlisted” word that does not appear in the dictionary, it will make the unlisted word the default word, and provide the word choice list as an alternative suggestion. Otherwise, it may utilize a high-scoring word selected from the dictionary as the default word.
To illustrate, if the user crisply taps out the letters “podcast,” the disambiguation module may keep “podcast” as default word, despite the fact that it may not be in a standard English dictionary, since the character sequences (the combinations of “po,” “ca,” and “st,” etc.) are frequently seen in the English language. On the other hand, if the user crisply taps out the letters “Pfsx,” the disambiguation module may assume that the user has made a misspelling (or typed sloppily) and will not select “pfsx” as the default word, since the combinations of “pf” “fs” and “sx” are all infrequently seen in the English language. Instead the disambiguation module will select a dictionary word with a high matching score found during blocks 905-940.
To make this determination, the disambiguation module may first generate a “stroke sequence” by selecting, for each intentional feature (e.g., tap) in the input sequence, the associated character that has the best proximity score. Next, the disambiguation module may evaluate the composition of the stroke sequence by using a table or database within the vocabulary module 624 that provides the probabilities or frequencies of various character sequences (character bigrams, 3-grams, 4-grams, . . . n-grams, etc.) in a given language (or other logical group of words) to derive a composition score. The composition score may be the product of various probabilities/frequencies of bigrams/etc., or another higher-order function of the various probabilities/frequencies. If the composition score exceeds a threshold value, the disambiguation module may use the stroke sequence as the default word. In some implementations, the stroke sequence may be defined using a different subset of the input sequence (e.g., tap features only, not corners).
At block 950 the word choice list is output to the user, e.g., via a word choice list area on a display or otherwise. Alternatively, the word choice list and/or a default word may be provided to another process (e.g., an application program) that does not need user intervention to select a word.
The process 990 begins at block 972 when the word submodule sets a word's matching score to a default score, which may be a unity value of 1. A loop for each character in the word then begins, as shown at block 974. At block 975, the word submodule retrieves the character set and related proximity weight set that corresponds to the next intentional input feature in the input sequence, or if this is the first iteration of the character loop, the first intentional input feature in the input sequence. For example, the word submodule retrieves the character set and proximity weights associated with the next tap or corner feature in the input sequence.
At block 976, the word submodule identifies the proximity weight for the character. To do so, the submodule determines whether the character appears in the retrieved character set, and if it does, it uses the retrieved proximity weight associated with the character. Otherwise, if the character does not appear in the retrieved character set, the submodule may use a default proximity weight (e.g., a low weight) specified by a disambiguation module, terminate the loop 974, or end the process 990 by returning a default matching score (e.g., a score of zero or other default matching score).
At block 978, the word submodule adjusts the matching score to reflect the identified proximity weight for the character. In one example, the submodule adjusts the matching score by multiplying it by the retrieved proximity weight. However, the matching score may be adjusted in any manner to reflect the retrieved matching score. For example, the previous matching score and the retrieved proximity weight may combined by any suitable function including simple addition, a weighted summation, multiplication, division, and higher order functions etc.
At decision block 979, the word submodule tests whether there are any more characters to process in the word and whether there are more intentional features in the input sequence. If both conditions are satisfied, the character loop repeats starting at block 975. Otherwise the process proceeds to block 980.
At block 980, the word submodule adjusts the matching score to reflect context data and/or language data (e.g., frequency of use). In one implementation, the word submodule scales the matching score by the frequency of use. As another example, the word submodule may increase a matching score based on context data, for example, the matching score may be increased if the user has utilized the same word in the same type of data field and/or the same application. As yet another example, if the surrounding sentence context (e.g., previous words entered by the user) suggests that the word is more or less likely, the matching score may be adjusted upward or downward. These examples are not intended to be exhaustive. The previous matching score may be adjusted in any suitable manner (e.g., via higher order functions) that reflects how the context and/or frequency of use increases or decreases the likelihood that the word was the intended entry of the user.
At block 982, by utilizing known edit distance techniques, the word submodule determines the type of edits needed to match a character sequence reflected in the intentional features of the input sequence. For example, the word submodule may determine what characters would need to be added to the word, or what characters would need to be removed from the word so that it would match a sequence of characters reflected in the sequence of intentional features.
When determining the types of edit(s) needed, the word submodule may only attempt to match a first portion of the word against the intentional features in the input sequence. For example, the word submodule may ignore any mismatches that occur after the Lth character in the word, where L is the estimated input length. The cutoff point could alternatively be chosen as L−1, L+2, L+3, etc. In this way, the word submodule improves the chances that a longer word having a good initial match to the user's initial input will be returned.
At block 984, for each edit needed, the word submodule determines an edit penalty based on the estimated input length and the input sequence, including the feature types reflected in the input sequence and the proximity weights. As utilized herein, a “low penalty” is one that is lower than a standard or default penalty assessed by a standard edit cost function for a similar edit.
As a first example, the word submodule may assign a zero or low penalty for character changes needed after a first portion of the word. For example, there may be no penalty or a low penalty assessed for character changes needed after the Lth position in the word. This technique similarly improves the chances that a longer word that has a good initial match to the user's initial input will be returned.
As a second example, the word submodule may assign a zero or low penalty for character additions or deletions involving characters that are reflected by the full input sequence, including both intentional and incidental features. For example, if the only difference between the word and the input sequence of intentional features is a missing “E” between the third and fourth intentional features, the word submodule may assign a zero or low penalty if an “E” character appears in a character set associated with an incidental feature (e.g., a segment) that falls between the third and fourth intentional features in the input sequence. In some implementations, the penalty assessed may be a function of the proximity weight associated with the missing character. To illustrate, returning to the previous example, if the segment feature that provides the missing “E” cuts directly through the center of the E key (i.e., the proximity weight for the E character is near its maximum possible value), the penalty assessed may be lower than if the segment runs further away from the E key. By reducing or eliminating edit penalties in these circumstances, the word submodule permits a user to indicate her selection of a letter simply by tracing through the letter.
As a third example, the word submodule may assign a zero or low penalty for character additions or deletions caused by repeating characters or repeating keystrokes. Stated another way, there is a no penalty or lower penalty assessed if the word would match the input sequence if only one or more input features were repeated (one or more times) within the sequence. By reducing or eliminating edit penalties in these circumstances, the word submodule permits a user to indicate her selection of a word that has repeating characters or repeating keystrokes with a reduced number of input features. If a low penalty is assessed, it may be a function of the type of input features that would need to be repeated (e.g., intentional vs. incidental), the number of times each feature would need to be repeated, and/or the proximity metrics associated with the repeated input features. To illustrate, if the word is “WEED” and only difference between the word and the input sequence of intentional features is that the second “E” is missing between the second and third intentional features, the word submodule may assign a zero or low penalty.
As a fourth example, the word list selection submodule may assign a zero or low penalty for character changes that reflect the addition or deletion of punctuation marks and/or similar symbols.
At block 986, the word submodule adjusts the matching cost to reflect any calculated penalties. In some cases, the penalty is subtracted from the matching score. However, in other implementations, higher-order adjustments may be utilized. Then the process 990 returns.
IV. Application to Security SystemsIn accordance with another aspect of the disclosed technology, a password or other text entry system and method are provided that allow a user to leverage the familiarity with QWERTY keyboard (or other common keyboard layouts).
This password or other text input is achieved by providing the user with preferably nine, or other appropriate number of defined zones, each anchored by a distinct key that acts as a zone output key. When the user selects one key from a particular zone, the zone output key of the selected zone is considered as having been selected. Thus, while a user is able to enter a password on a standard keyboard in a standard manner, depending on the zone definitions, a sequence of zone output keys will be output representative of the input password. If a receiving device is aware of the zone definition used when generating the sequence of output keys corresponding to the input password, the receiving device is able to decipher the sequence of output keys to determine whether the proper password has been entered.
A plurality of zone definitions may be available. Thus, by transmitting a zone definition identifier and sequence of zone output keys, a receiving device, that is aware of the plurality of zone definitions and identifiers associated therewith, can properly interpret the transmission while maintaining the security of the password.
Additionally, the zone definition need not necessarily include keys that are contiguous with each other. By defining non-contiguous zones, the user must be more accurate in key entry, but the difficulty for a hacker or other thief in breaking the code may be greatly increased, and the number of possible available zone definitions is similarly greatly increased.
Additionally, rather than using only a standard keyboard for text entry of the password characters, characters may be input using directional input on a joystick, 4 or 8 directional keypad (such as the ones found in TV remote control, telephones and mobile phones), various combinations of numeric keys on such a remote control or other available numeric keypad, or pointing functionality of such a remote control or other device, such as a Nintendo 0 WHO remote or other wand like input device, body movement entry solutions, or the like. When using such a non-traditional input device, the user may move the joystick or other device in the direction of a desired key to be input. When performing this movement, rather than moving between keys, the selector moves between adjacent zones of keys, all keys contained in the zone moved to will be selected, and a zone output key associated with that zone may be further delineated. Thus, when a key is desired to be selected, a joystick or the like may be used to move to a zone that contains the desired key. By selection of such a zone, the associated zone output key may be presented as part of the output sequence of characters. By way of example, if the user wishes to select the letter “R” as part of a password, movement to a zone containing the letters R, T, Y, U, and I, and using Y as a zone output key, selection of such a zone causes the letter “Y” to be output as part of the output stream. In addition to providing a coded password transmission, a user need only select between the number of zones, and not be as precise as selecting each individual key in that selection of any key in a particular zone will result in the same zone output key being output as part of the password data stream.
The disclosed technology is also applicable to any keyboard configuration, and language, and any other symbol grouping. Furthermore, the zones assigned to an input device need not be visible to a user, rather only working to encrypt the data to be transmitted in the background.
Referring first to
Furthermore, while it is contemplated that all of the available symbol keys, in this case letters, be included within at least one zone, this need not be the case. Indeed, any particular key or keys may be left alone, requiring direct user selection. Such a configuration may be desirable for rarely-used keys, or keys with special significance.
As can be understood from this description referring to
When employing the eight direction joystick as noted above, the user can move between the zones through movement of the joystick. Two or more schemes are possible. In a first, a valid movement of the controller may be Up, Down, Left, Right, Up-Right, Up-Left, Down-Right, and Down-Left. These eight distinct movements allow the user to quickly move between the zone output keys on the keyboard, moving from zone to zone. A similar scheme may be employed with a less versatile four direction joystick. Without diagonal movements available, however, more jumps may be necessary for moving from one zone to another. In a second scheme, each of the eight directional keys and a central “neutral” position may each correspond to the zones, allowing a user to directly access each of the zones without having to navigate from one to the other. Alternatively, when, for example, a nine key number pad is provided, each number may correspond to a particular zone.
While a most common option would be for these keys to be assigned to various zones based upon corresponding locations, in an alternative embodiment of the disclosed technology, it is possible to configure the zones in any contiguous or non-contiguous manner as the zone location will likely not affect the user experience, if the user is actually entering the password keys, rather than simply selecting one of a number of zones, but the available multiple zone configurations will assist in protecting the security of the input password. In either of these entry situations, it is not necessary that the zones be visible to a user. Rather, a user may be prompted to enter a password as normal on a keyboard. Background processing may assign a zone definition (as will be described below) and prepare a data output stream to transmission.
Referring next to
Once the zone definition has been selected, a user may start to enter characters that comprise the password. Such an input character is received at step 1215. At step 1220, the zone output key associated with the zone comprising the key selected by the user is defined as the output character associated with the input character. This output character is then provided as a first character to the eventual output data stream to be transmitted. Then processing passes to step 1225 where it is inquired whether all input characters have been received. If this inquiry is answered in the negative, and it is therefore determined that not all of the desired input characters have been received, control returns to step 1215 and a next input character is received.
If, however, the inquiry at step 1225 is answered in the affirmative, and it is therefore determined that all desired input characters have been received, then control passes to step 1230, where the utilized zone definition and generated output data stream representing the input password is transmitted to a receiving device. Processing then ends at step 1235. While the embodiment has described transmitting the zone layout definition along with the output data stream, such zone layout definition may be transmitted separately. It is contemplated that such zone layout definition preferably not comprise a complete listing of zones, but rather a zone definition indicator, thus allowing a receiving device to select from one of a plurality of predefined zone layout definitions. In such a manner, only such an otherwise meaningless indicator is transmitted, thus not giving away the zone layout and increasing the security of the system. Further, it is possible that no zone definition is transmitted, and that the transmitting and receiving devices are synchronized and simply cycle through a predetermined number of zone layout definitions in a predetermined manner. Further, it may be possible to use any other indicator or the like to enter into an equation, algorithm, or the like to select from the plurality of zone layout definitions, as long as both the transmitting and receiving devices are able to process the same algorithm to them utilize the same zone layout definition for both generation and interpretation of the password.
Referring next to
If it is determined at step 1330 that all of the password characters have been received, then processing passes to step 1335 where the system recognizes that a valid password has been entered, and the user is so notified. Processing then ends at step 1340. Therefore, in accordance with the charts displayed in
In an alternative embodiment of the disclosed technology, it may be possible to provide two consecutive zone definitions that divide the keyboard in a manner so that upon entry of a password twice by a user, the system transmits zone indications for the two different zone definitions allowing for particular notification of the selected keys at the receiving end, even if the keys themselves. For example, if the letter “A” is first associated with zone 1, and then in a second zone definition is the only letter previously in zone 1 that is now associated with zone 4, a sequence of Z1, Z4 for the first letter in two consecutive password entries would result in a guaranteed determination of the letter “A”, even though this letter was never transmitted. Thus, through the user of multiple iterations of this password scheme, further precision can be achieved.
An embodiment of the disclosed technology employing a traditional nine key keypad is shown in
In an additional embodiment of the disclosed technology, rather than making an indication on a remote device as described above to indicate zone selection, it may be possible to employ a pointing device, such as a remote control utilized with, for example, a NINTENDO® Wii® controller, body movement sensing to determine a user input, or the like. Thus, as a user points such a controller at a displayed virtual keyboard, or otherwise indicates zone selection, a zone pointed at by the controller is highlighted or otherwise indicated as being designated. A user can select such a zone by pressing an indicator on the remote, or by other method as is known to one of ordinary skill in the art. The use of zones in accordance with the disclosed technology is particularly useful in such a situation in that the ability to select each individual key may be difficult in such a configuration. Accuracy may be difficult. However, in accordance with the present disclosed technology, a user need only select between, for example, one of nine zones, thus making this ability to select much easier for a user. Thus, a user will likely be able to move through character selection in a much faster and easier manner.
As is shown in
In addition to providing a visual indication of the remote pointing device moving from one displayed zone to another, in an alternative embodiment of the disclosed technology, a haptic or other tactile response is also provided to a user to provide an additional, non-visual indication of movement and selection. Thus, as a user moves the pointing device to designate a second zone, a vibration or other non-verbal indicator may be provided to the user. Thus, if as in the movement between FIGS. 14A and 14B a user moves the pointer two zones, such a haptic response may be provided to a user twice, once each time the border between zones is crossed. Such haptic response may also be modified, thus providing a different response for different zones. Thus, each zone may have a different haptic signature, or a zone determined by a parsing algorithm to be the most likely selected zone in the next selection may be provided a haptic indicator different from the other zones. The zones may be differentiated in any other manner including such a haptic indicator.
Referring once again to
As noted above, the displayed virtual keyboard represents a full QWERTY keyboard, but may comprise any other desired configuration. The zone output keys may be preferably highlighted in a different color on the keyboard, or designated using some other distinguishing characteristic. When the input device is used to go to a zone corresponding to a particular zone output key, the other characters in the zone are preferably highlighted as well to provide to the user visual feedback on the set of letters that are grouped with the selection of a particular zone.
While the disclosed technology has been described related to the transmission of a password, the features of the disclosed technology may be applied to a more lengthy encrypted data transmission.
CONCLUSIONUnless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense, as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.” As used herein, the terms “connected,” “coupled,” or any variant thereof means any connection or coupling, either direct or indirect, between two or more elements; the coupling or connection between the elements can be physical, logical, or a combination thereof. Additionally, the words “herein,” “above,” “below,” and words of similar import, when used in this application, refer to this application as a whole and not to any particular portions of this application. Where the context permits, words in the above Detailed Description using the singular or plural number may also include the plural or singular number respectively. The word “or,” in reference to a list of two or more items, covers all of the following interpretations of the word: any of the items in the list, all of the items in the list, and any combination of the items in the list.
The above Detailed Description of examples of the disclosed technology is not intended to be exhaustive or to limit the disclosed technology to the precise form disclosed above. While specific examples for the disclosed technology are described above for illustrative purposes, various equivalent modifications are possible within the scope of the disclosed technology, as those skilled in the relevant art will recognize. For example, while processes or blocks are presented in a given order, alternative implementations may perform routines having steps, or employ systems having blocks, in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified to provide alternative or subcombinations. Each of these processes or blocks may be implemented in a variety of different ways. Also, while processes or blocks are at times shown as being performed in series, these processes or blocks may instead be performed or implemented in parallel, or may be performed at different times. Further any specific numbers noted herein are only examples: alternative implementations may employ differing values or ranges.
The teachings of the disclosed technology provided herein can be applied to other systems, not necessarily the system described above. The elements and acts of the various examples described above can be combined to provide further implementations of the disclosed technology. Some alternative implementations of the disclosed technology may include not only additional elements to those implementations noted above, but also may include fewer elements.
Any patents and applications and other references noted above, including any that may be listed in accompanying filing papers, are incorporated herein by reference. Aspects of the disclosed technology can be modified, if necessary, to employ the systems, functions, and concepts of the various references described above to provide yet further implementations of the disclosed technology.
These and other changes can be made to the disclosed technology in light of the above Detailed Description. While the above description describes certain examples of the disclosed technology, and describes the best mode contemplated, no matter how detailed the above appears in text, the disclosed technology can be practiced in many ways. Details of the system may vary considerably in its specific implementation, while still being encompassed by the disclosed technology disclosed herein. As noted above, particular terminology used when describing certain features or aspects of the disclosed technology should not be taken to imply that the terminology is being redefined herein to be restricted to any specific characteristics, features, or aspects of the disclosed technology with which that terminology is associated. In general, the terms used in the following claims should not be construed to limit the disclosed technology to the specific examples disclosed in the specification, unless the above Detailed Description section explicitly defines such terms.
Claims
1. A computer system, comprising:
- one or more processors;
- a memory configured to store programmed instructions;
- a first input device;
- a second input device different from the first input device; and
- processor electronics configured to execute the programmed instructions to perform operations of receiving text inputs from a user by: receiving a first input in a first modality through the first input device; receiving a second input, through the second input device, in a second modality different from the first modality, wherein a combination of the first input in the first modality and the second input in the second modality are indicative of text to be entered; analyzing the combination of the first input in the first modality and the second input in the second modality to select one or more word choices; and using the selected word choices to determine the text to be entered.
2. The computer system of claim 1,
- wherein the first input device is a keyboard;
- wherein the second input device is a microphone;
- wherein receiving the first input in the first modality comprises receiving text input via the keyboard;
- wherein receiving the second input in the second modality comprises receiving audio input via the microphone; and
- wherein analyzing the combination of the first input in the first modality and the second input in the second modality to determine the one or more word choices corresponding to the text to be entered is performed by: using the text input to determine multiple potential word choices; presenting the multiple potential word choices to the user; and using the audio input to select one of the multiple potential word choices.
3. The computer system of claim 1,
- wherein the first input device is a keyboard;
- wherein the second input device is a microphone;
- wherein receiving the first input in the first modality comprises receiving audio input via the microphone;
- wherein receiving the second input in the second modality comprises receiving text input via the keyboard; and
- wherein analyzing the combination of the first input in the first modality and the second input in the second modality to determine the one or more word choices corresponding to the text to be entered is performed by: using the audio input to determine multiple potential word choices; presenting the multiple potential word choices to the user; and using the text input to select one of the multiple potential word choices.
4. The computer system of claim 1, wherein analyzing the combination of the first input in the first modality and the second input in the second modality to determine the one or more word choices corresponding to the text to be entered is performed by:
- using the first input in the first modality to determine multiple potential word choices;
- presenting the multiple potential word choices to the user; and
- using the second input in the second modality to select one of the multiple potential word choices.
5. The computer system of claim 1,
- wherein the first input device is a keyboard;
- wherein the second input device is a microphone;
- wherein receiving the first input in the first modality comprises receiving input via the keyboard; and
- wherein receiving the second input in the second modality comprises receiving input via the microphone.
6. The computer system of claim 1,
- wherein the first input device is a tactile input;
- wherein the second input device is a microphone;
- wherein receiving the first input in the first modality comprises receiving input via the tactile input; and
- wherein receiving the second input in the second modality comprises receiving input via the microphone.
7. The computer system of claim 1,
- wherein the operations of receiving text inputs from the user are further performed by: determining recent actions taken by the user; and selecting the one or more word choices based on a determined correspondence between the one or more word choices and the recent actions taken by the user.
8. The computer system of claim 1,
- wherein the operations of receiving text inputs from the user are further performed by: determining a location of the user; and selecting the one or more word choices based on a determined correspondence between the one or more word choices and the location of the user.
9. The computer system of claim 1,
- wherein the first input device is a virtual keyboard;
- wherein the second input device is a microphone;
- wherein receiving the first input in the first modality comprises receiving a swipe across the virtual keyboard;
- wherein receiving the second input in the second modality comprises receiving a sequence of one or more taps to the virtual keyboard;
- wherein the operations of receiving text inputs from the user are further performed by receiving a third input in a third modality via the microphone; and
- wherein the combination of the first input in the first modality and the second input in the second modality is further combined with the third input in the third modality to collectively determine the one or more word choices corresponding to the text to be entered.
10. The computer system of claim 1,
- wherein the first input device is an accelerometer; and
- wherein receiving the first input in the first modality comprises receiving input via the accelerometer.
11. The computer system of claim 1,
- wherein the operations of receiving text inputs from the user further include, after receiving the first input in a first modality, determining an ambiguity in the first input; and
- wherein the second input is received in response to determining the ambiguity in the first input.
12. A computer-readable storage medium, excluding a transitory signal, storing instructions that, when executed by a computing system, cause the computing system to perform operations for receiving text inputs from a user, the operations comprising:
- receiving a first input in a first modality;
- receiving a second input in a second modality different from the first modality, wherein a combination of the first input in the first modality and the second input in the second modality are indicative of text to be entered;
- analyzing the combination of the first input in the first modality and the second input in the second modality to select one or more word choices; and
- using the selected word choices to determine the text to be entered.
13. The computer-readable storage medium of claim 12,
- wherein the computing system includes a keyboard and a microphone;
- wherein receiving the first input in the first modality comprises receiving audio input via the microphone;
- wherein receiving the second input in the second modality comprises receiving text input via the keyboard; and
- wherein analyzing the combination of the first input in the first modality and the second input in the second modality to determine the one or more word choices corresponding to the text to be entered is performed by: using the audio input to determine multiple potential word choices; presenting the multiple potential word choices to the user; and using the text input to select one of the multiple potential word choices.
14. The computer-readable storage medium of claim 12,
- wherein the operations for receiving text inputs from the user further include, after receiving the first input in a first modality, determining an ambiguity in the first input; and
- wherein the second input is received in response to determining the ambiguity in the first input.
15. The computer-readable storage medium of claim 12,
- wherein the operations for receiving text inputs from the user are further performed by: determining recent actions taken by the user; and selecting the one or more word choices based on a determined correspondence between the one or more word choices and the recent actions taken by the user.
16. The computer-readable storage medium of claim 12,
- wherein the computing system includes a virtual keyboard and a microphone;
- wherein receiving the first input in the first modality comprises receiving a swipe across the virtual keyboard;
- wherein receiving the second input in the second modality comprises receiving a sequence of one or more taps to the virtual keyboard;
- wherein the operations for receiving text inputs from the user are further performed by receiving a third input in a third modality via the microphone; and
- wherein the combination of the first input in the first modality and the second input in the second modality is further combined with the third input in the third modality to collectively select the one or more word choices.
17. A method for receiving text inputs from a user to a computing system, the method comprising:
- receiving a first input in a first modality;
- receiving a second input in a second modality different from the first modality, wherein a combination of the first input in the first modality and the second input in the second modality are indicative of text to be entered;
- analyzing the combination of the first input in the first modality and the second input in the second modality to select one or more word choices; and
- using the selected word choices to determine the text to be entered.
18. The method of claim 17,
- wherein the computing system includes a keyboard and a microphone;
- wherein receiving the first input in the first modality comprises receiving audio input via the microphone;
- wherein receiving the second input in the second modality comprises receiving text input via the keyboard; and
- wherein analyzing the combination of the first input in the first modality and the second input in the second modality to determine the one or more word choices corresponding to the text to be entered is performed by: using the audio input to determine multiple potential word choices; presenting the multiple potential word choices to the user; and using the text input to select one of the multiple potential word choices.
19. The method of claim 17 further comprising:
- after receiving the first input in a first modality, determining an ambiguity in the first input;
- wherein the second input is received in response to determining the ambiguity in the first input.
20. The method of claim 17,
- wherein the computing system includes a virtual keyboard and a microphone;
- wherein receiving the first input in the first modality comprises receiving a swipe across the virtual keyboard;
- wherein receiving the second input in the second modality comprises receiving a sequence of one or more taps to the virtual keyboard;
- wherein the method further comprises receiving a third input in a third modality via the microphone; and
- wherein the combination of the first input in the first modality and the second input in the second modality is further combined with the third input in the third modality, which is used to select the one or more word choices.
Type: Application
Filed: Jun 24, 2015
Publication Date: Oct 15, 2015
Inventors: David Kay (Seattle, WA), Pim van Meurs (Kenmore, WA), Erland Unruh (Seattle, WA), Santosh Sharan (Acton, MA), Gordon Kessler (Mount Kisco, NY), Mayank Agarwal (Ashland, MA), Greg Sigel (Kentfield, CA)
Application Number: 14/749,621