Encoding Method which Encodes Codes in Consideration of Shape
An encoding method includes a mapping for mapping a plurality of encoding sequences to a plurality of decoding sequences. Each of the encoding sequences includes at least one encoding symbol chosen from an encoding symbol set. Each of the decoding sequences includes at least one decoding symbol chosen from a decoding symbol set. The encoding method is characterized in that at least one of the encoding sequences includes at least two encoding symbols, and a predetermined shape changing type of a formal symbol in the encoding sequence is denoted by a latter symbol neighboring to the formal symbol, wherein the shape changing type including at least one of shape rotating, shape mirroring, shape deflating, stroke removal, cutting and notching.
This application claims priority from U.S. Provisional Patent Application No. 60/914,760, filed on Apr. 30, 2007.
BACKGROUND OF THE INVENTION1. Field of Invention
The present invention relates to an encoding method. More particularly, the present invention relates to an encoding method which encodes codes in consideration of shape.
2. Description of Related Art
Today, computer is powerful and small enough to be carried around, in the form of many tiny devices (cellphone, MP3 Player, etc.). The Man Machine Interface (MMI) becomes a challenge due to the limited size of a device. For text input, current solutions include traditional multitap, predictive text input, and small QWERTY keyboards.
The traditional multitap, known as ABC input method, is simple but inefficient and hardly to be mastered. Small QWERTY keyboard assumes that users are familiar with the traditional QWERTY keyboard and this familiarity can be well transferred to the tiny QWERTY keyboard. The first assumption is limited to the PC users but many teenagers may master text input on cellphone before they master the QWERTY keyboard. Although this arrangement may help computer users to find characters more easily than scattered or alphabetic order, the even much smaller key size reduces the usability considerably.
On the other hand, predictive text input greatly reduces the keys required per character over multitap by introducing new problems: (a) predicted text is randomly changed while user input; (b) users can hardly check their typing correctness during input and typo recovery is irritating; (c) upon missed word in its dictionary, users are required to use another input method (typically multitap) and restart again; and (d) the behavior in summary is non-predictable and unreliable in the sense of perception as well as performance.
Furthermore, given a mapping which maps encoding sequences to decoding sequences wherein each encoding sequence is a token, it is so called “spatially ambiguous” if there are multiple mappings for a token. An example of spatially ambiguous is shown in
After mapping to the domain of encoding symbol set, it is impossible to distinguish the grouped symbols purely base on knowledge of this domain. This process can be viewed as a lossy encoding. To recover the lost information, all possible combinations may be generated and let user choose the correct one.
Multitap resolves spatial ambiguity at character level and lets users roll and choose the intended symbol in the group for each input.
Furthermore, traditional predictive text inputs resolve spatial ambiguity at word level. Take an input of “HOME” for example and illustrated in
Typically, linguistic knowledge can be used to greatly reduce the possible outcomes. A common practice is providing a word dictionary to match up and output those words for user to choose. However, even the linguistic knowledge is introduced, the input sequence “4663” can still be interpreted in many ways such as “HOME”, “GOOD”, “GONE”, “HOOD”, “HOOF”, “HONE”, “GOOF”, “IMME”, “ENNL”, “HOND”, “INOF” and “GOOE”. In other words, there are too many possible resultants to make users to effectively input words by traditional input methods and apparatus.
Therefore, what is needed is to provide a better encoding scheme and corresponding decoding method that can be easily realized by and accommodated to the users.
SUMMARY OF THE INVENTIONOne of the objects of the invention is to provide an encoding method such that mapping between encoding symbols and decoding symbols can be easily realized by the users.
To at least achieve the above and other objects, the invention provides an encoding method, which includes a mapping for mapping a plurality of encoding sequences, each of which includes at least one encoding symbol chosen from an encoding symbol set, to a plurality of decoding sequences, each of which includes at least one decoding symbol chosen from a decoding symbol set. The encoding method is characterized in that at least one of the encoding sequences includes at least two encoding symbols, and a predetermined shape changing type of a formal symbol in the encoding sequence is denoted by a latter symbol neighboring to the formal symbol, wherein the shape changing type including at least one of shape rotating, shape mirroring, shape deflating, stroke removal, cutting and notching.
In one embodiment, the mapping is predetermined in consideration of at least one of correlating to frequency and token enumeration, wherein the token enumeration lists possible mappings between each decoding symbol in the decoding symbol set and encoding symbols in the encoding symbol set.
In another embodiment, the mapping is predetermined by at least one of the followings: (a) a first mapping, which assigns a first unique encoding sequence to a first unique decoding symbol in consideration of frequency of each decoding symbol; and a second mapping, which assigns a second unique encoding sequence to a second unique decoding symbol in consideration of possible mappings listed in the token enumeration.
According to the encoding method mentioned above, the mapping is generated in accordance with shape changing concepts such that users can easily realize how the decoding sequences are encoded by the encoding sequences.
The present invention further provides an encoding method, which includes a mapping for mapping a plurality of encoding sequences, each of which includes at least one encoding symbol chosen from an encoding symbol set, to a plurality of decoding sequences, each of which includes at least one decoding symbol chosen from a decoding symbol set. The encoding method is characterized in that the encoding symbol set is composed of numerals, and each of substantial number of the decoding symbols is mapped to one encoding symbol sequence such that the shape of the decoding symbol mapped by the encoding sequence substantially conforms to shape composition of the encoding sequence.
It is to be understood that both the foregoing general description and the following detailed description are exemplary, and are intended to provide further explanation of the invention as claimed.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
Reference will now be made in detail to the present preferred embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
Refer to
The keyboard 200 may be an input device with a plurality of keys, each of which denotes an encoding symbol chosen from an encoding symbol set. For the reduced keyboard problem, we need to map a keyboard alphabet of size K (i.e. encoding symbol set) to a letter alphabet of size N (i.e. decoding symbol set) wherein K<N. For clarity of discussion, and without limiting the scope of this invention, the keyboard 200 in
Given the mapping 232 that maps each of a plurality of encoding sequences composed by the encoding symbols chosen from the encoding symbol set to a corresponding decoding sequence composed by at least one decoding symbol chosen from the decoding symbol set wherein each encoding sequence is a token, it is so called “spatially ambiguous” if there are multiple mappings for a token, and it is so called “temporally ambiguous” if there are multiple temporal interpretations of an input sequence to generate tokens. The spatial ambiguity is discussed in the prior arts and the temporal ambiguity will be discussed in more detail below.
For fixed-length coding, every fixed-length M encoding symbols in an encoding sequence represent a token and maps to a decoding sequence of decoding symbol. For variable-length coding, a prefix-free code is usually used so that decoder can unambiguously identify each token. If a non-prefix-free code is used, a pre-determined timeout or a timeout kill signal (delimiting key) is needed to clearly identify each intended token. The timeout kill method introduces additional key presses, which is an overhead. Although the timeout method omits additional key presses, it requires user to wait and is less efficient.
The temporal ambiguity exists when there are multiple temporal interpretations of an input sequence. To introduce temporal ambiguity into a traditional variable-length coding scheme, we omit the requirement of both timeout and timeout kill signal. This can be demonstrated in
For an input sequence, we can segment it according to temporal ambiguity such that the adjacent symbols of two segments together forming a token (also known as encoding sequence) that has no mapping in the coding scheme. And each segment has multiple temporal interpretations if its length is greater than one. Take
The procedure of temporal segmentation can be done by successively examining each symbol of the input sequence and combined with next symbol to check whether it is a mapped token. If it is, current symbol is accumulated. Else current symbol with previously accumulated symbols formed a temporally ambiguous segment. We define ambiguity length as the length of a temporally ambiguous segment. There is no temporal ambiguity in a segment if its ambiguity length equals one. Furthermore, there are some input sequences can be disambiguated purely by the knowledge of a coding scheme. Take the coding scheme of an embodiment shown in
For the preferred embodiment, the coding scheme is variable-length such that a decoding symbol is encoded by either one-numeral or two-numeral. In most case, two-numeral with combination 10×10 is sufficient to encode all the decoding symbols. So if a three-numeral is really used, it can be kept in few and specifically handled.
A legal combination of a temporally ambiguous segment is a disambiguated input sequence that can be unambiguously decoded. To enumerate all legal combinations of a temporally ambiguous segment in such coding scheme, it is equivalent to enumerate all combinations such that its ambiguity length is N, each symbol may be connected to previous one or next one but not both. Two connected symbols forms a two-numeral encoding while a symbol without any connection forms a one-numeral encoding. If a symbol is connected in both directions, it is a three-numeral encoding which is illegal in our enumeration and should be specifically handled.
For example, let “1” denotes symbols, “−” denotes bar without connection and “+” denotes bar connected. A sequence of “1111” can be enumerated in 5 combinations: “1−1−1−1”, “1−1−1+1”, “1−1+1−1”, “1+1−1−1”, and “1+1−1+1”. The enumeration is accumulating K connections as follows:
0 connection: C(N,0)
1 connection: C(N-1, 1)
2 connections: C(N-2, 2)
. . .
K connections: C(N-K, K)
. . .
N/2 Connections
For N=10, the combinations are C(10,0)+C(9,1)+C(8,2)+C(7,3)+C(6,4)+C(5,5)=1+9+28+35+15+1=89. In comparison with traditional spatial ambiguity, the combinations for length=10 are 3̂10=59049 which is obviously impractical to enumerate. For a given sequence with temporal ambiguity in length N (where N is unlikely greater than 8 which is calculated against the dictionary of 57,000 words, hence we fix the maximum length of temporal ambiguity sequence to 8 and resolve the rare exceptions by dictionary lookup), we can enumerate all combinations in 2̂(8-1) and omit illegal combinations such that exists a symbol with 2 connections.
It is obvious that the temporal ambiguity proposed by the present invention is a better solution comparing to the prior ones when amount of combinations is taken into consideration.
In one embodiment, the temporal ambiguity can be resolved by lookup against a stored dictionary like traditional spatial ambiguity. Temporal ambiguity is less ambiguous than traditional spatial ambiguity. In current implementation against about 39,000 dictionary words, there are less than 50 encoded sequences map to two words and no encoding maps to greater than two words. In other words, 99.87% of these dictionary words can be uniquely identified. Current invention greatly reduces the mental processing overhead of the traditional spatial ambiguity by providing very limited alternatives that in this case has only one alternative. For example, in traditional spatial ambiguity, it is required to exam up to 12 alternative words that match encoding “HOME” while there is only one dictionary word matches current invention's encoding “HOME”. Even in the worst case, only one alternative dictionary word is possible. With this observation, users can quickly select alternative word if non-intended word is presented without very careful examines.
If in the very rare case that there are more than two candidate words, users is required to select one from them to resolve ambiguity. One solution is use a special function key, such as ‘*’, to roll over candidates and let users choose one. Another solution is display them simultaneously since the candidates are quite a few. The priority of appearance for choosing can base on word frequency or linguistic score calculation.
It is theoretically possible to apply linguistic resolution on traditional spatial ambiguity but impractical due to its exponential combinations. Instead, linguistic resolution can be applied on temporal ambiguity introduced by this invention to reduce dictionary lookup. This is illustrated in
Refer to
Refer to
Accordingly, the linguistic temporal ambiguity resolution mentioned above returns a disambiguated input sequence EnumMAX composed of encoding symbols, and can be decoded into sequence of decoding symbols unambiguously. One may check the predication for a given word by firstly encoding this word into sequence of encoding symbols and then do linguistic temporal ambiguity resolution. If the prediction of a word encoding is equivalent to this word, it is called a hit and no dictionary lookup is needed. Hit ratio is defined as hit words over total words for a set of words. In addition to avoid dictionary lookup, the words that can be linguistically resolved also can be omitted from the stored dictionary. This can greatly reduce the size of the stored dictionary. In one embodiment, only about 3100 words are needed in the stored dictionary with 81.3% hit ratio of 36,000 of least frequency words and 100% hit ratio of the rest in a English dictionary of 57,000 words.
It is possible to further improve the hit ratio by knowledge of target language. Linguistic score can be compensated according to the knowledge to provide a better guess. For example, 3 successive consonants (excluding ending ‘S’ used with plural nouns) are very rare. A linguistic score penalty is added to such conditions to avoid some false candidate words. The penalty can be carefully tuned for better hit ratio. If there are words do with 3 successive consonants and false rejected by this adjustment, they can be found from the stored dictionary.
Optimistic temporal ambiguity resolution is a linguistic temporal ambiguity resolution that utilizes some heuristics without evaluating all combinations. The present invention applies a greedy approach to provide an optimistic resolution, i.e. should any ambiguity occur we prefer maximum matching sequences. For example, “10” can be “1:0” as “IO” or be “10” as “D” in coding of
When the optimistic temporal ambiguity resolution is applied, a disambiguated input sequence should be generated by optimistic resolution first. The linguistic score of this disambiguated input sequence is evaluated by way of those showing in
A plurality of mappings that maps one encoding symbol set to one decoding symbol set can be used in the above mentioned coding scheme. One of the mappings is a shape-based mapping. Although there are a plurality of shape-based mappings are disclosed by the prior arts, such as U.S. Pat. No. 4,008,793, U.S. Pat. No. 4,877,405, U.S. Pat. No. 5,307,267, U.S. Pat. No. 6,837,633, U.S. Pat. No. 6,874,960, U.S. Pat. No. 7,098,919, U.S. Pat. No. 4,173,753, U.S. Pat. No. 5,305,207, U.S. Pat. No. 5,790,055, U.S. Pat. No. 6,362,752, U.S. Pat. No. 6,686,907, U.S. Pat. No. 6,766,179, U.S. Pat. No. 5,982,303 and U.S. Pat. No. 6,753,794, the present invention provides a new shape-based mapping different to those prior arts. It is a great amount of researches on symbol encoding but none of them utilize shape-changing operations in their coding scheme. However, according to one embodiment of the present invention, one encoding sequence may include at least two encoding symbols, and a predetermined shape changing type of a formal symbol in the encoding sequence is denoted by a latter symbol neighboring to the formal symbol.
Some operations of the shape-based mapping provided in the present invention may be categorized into non-operation, transformative operation, constructive operation, destructive operation and deformative operation. The non-operation is used when the shape of the encoding symbol is substantially identically to the decoding symbol. The transformative operation includes shape rotate, shape mirror, shape deflate, etc. The constructive operation includes stroke-based or combination of shapes. The destructive operation includes stroke removal, cutting (i.e., dividing without obeying stroke construction) and notching (i.e., break closed area). The deformative operation is used when shapes of one encoding symbol and mapped decoding symbol can be correlated with intermediate shapes.
The coding scheme shown in
According to the invention, users no longer hunt-and-peck letters from the specific labels on a keyboard. Instead, one can use numerals on the keyboard to quickly enter mental mnemonics for input. The manufacturers of cellphone or keyboard can seamlessly adapt this invention since no special labels are needed.
While input, users can intuitively check whether the intended button was indeed selected. Take decoding ‘Q’ for example, if the display present other than a ‘Q’ correlated shapes such as ‘0’ after entering first code, users are aware of that previous key press was wrong and a correction is needed. After correctly entering ‘0’ and ‘1’, a ‘Q’ decoding can be expected but a sequence of “OI” is also possible according to this coding scheme. In either case, the appearances are all correlated with input “01” instead of jumps around like the traditional predictive text input. So the result makes this invention reliable and predictable in the sense of perception.
For each decoding symbol, all possible encoding sequences of encoding symbol that is accepted by design decisions are generated. Every option has an associated weight that is based on design decisions. Generally we give more weight to preferred options. A shape-based token enumeration is a token enumeration with shape-based design decisions, and a multi-level token enumeration is a hierarchy of different design decisions. In one embodiment of a multi-level token enumeration, all possible encoding sequences of encoding symbol for each decoding symbol are generated. The possible encoding sequences may be divided into several logical interpretations such as shape association, cultural association or any other association acceptable.
After token enumeration, such as multi-level token enumeration mentioned above, the result is evaluated to determine the acceptable assignments.
In one embodiment, a linguistic unigram table and a 2-gram table are generated from a corpus of target language, and letters with higher frequency are assigned to one-encoding-symbol based on the linguistic unigram table. If a preferred shape assignment is in a relatively low frequency, it is re-assigned to a double-tap and the one-encoding-symbol is assigned to another decoding symbol with higher frequency. For example, while initial assignments of ‘Z’ and ‘N’ are (‘Z’, “2”) and (‘N’, “2@”), they can be re-assigned as (‘Z’, “22”) and (‘N’, “2”) to take advantage of that letter ‘N’ has higher frequency. The 2-gram table may be utilized to prevent from assignments that introduce more ambiguity than others. For example, “12” is an option to encode “D” in TABLE 1 shown below. After consulting bit 95 2-gram table, which is disclosed by William Soukoreff and Scott MacKenzie, Linguistic Diagram Frequency Tables (http://dynamicnetservices.com/˜will/academic/bit95.tables.html), it can be found that 2-gram “IN” has relative high frequency and should be avoid if possible. If “12” is chosen according to certain design decision, linguistic score compensation can be utilized to reduce the ambiguity introduced by this assignment.
As shown in TABLE 1, possible mappings between encoding symbols and decoding symbols are categorized into a plurality of relationships including a substantially identical relationship, a transform relationship, a deformative relationship, a composition relationship and a destructive relationship. Furthermore, in the TABLEs shown in the specification below, the symbol “̂” stands for a destructive operation, the symbol “@” stands for a transformation operation of rotating, the symbol “%” stands for a transformation operation of mirroring, the symbol “/” stands for “OR”, and the one who surrounded by symbol “−” stands for a depreciated assignment.
It is important that assignment of the coding can be applied by two kinds of mappings mentioned above, wherein one of the mappings assigns an encoding sequence to a decoding symbol in consideration of frequency of each decoding symbol, and another one of the mappings assigns an encoding sequence to a decoding symbol in consideration of possible mappings provided in token enumeration. The two mappings can be applied to make decisions of assignment of the coding in any order. For example, result of code assignments and confliction resolving based on unigram statistics, which is provided by First-Order Statistics (Statistical Distributions of English Text, http://www.data-compression.com/english.html), is provided in TABLE 2. It can be seen that some of the decoding symbols with higher frequency are assigned to unigram (or single-symbol) encoding sequences. These decoding symbols are such as “A”, “E”, “I”, “O”, “R”, “S”, and “T”. Some decoding symbols, such as “B” and “G”, are assigned to unigram encoding sequences due to their highly likeness to the encoding symbols “8” and “6”, respectively. The decoding symbol “Z” is firstly assigned to a unigram encoding symbol “2”. However, since the unigram encoding symbol “2” is a possible mapping of decoding symbol “N” and the decoding symbol “N” has much higher frequency than the decoding symbol “Z” in according with linguistics, the unigram encoding symbol “2” is re-assigned to the decoding symbol “N” and the decoding symbol “Z” is assigned to a double-tap encoding symbol “22”.
Another kind of the possible mappings are taken into consideration. In one embodiment, the assignments are considered in the order of substantially identical relationship, transform relationship, deformative relationship, and composition or destructive relationship. The results of the assignments are shown in TABLE 3.
Combinations of encoding sequences from TABLE 3 provides 72 choices because of possible mappings of decoding symbols “C”, “D”, “H”, “K”, and “X”. Operators used for denoting operation performed on the formal encoding symbol of the operators can be chosen in any consideration. For example, operator stands for mirror operation can be chosen from numeric 8 or 0, operator stands for rotating operation can be chosen from numeric 6 or 9, and operator stands for destructive operation can be chosen from any other numeric. In one embodiment, numeric “8” stands for mirror operation because the numeric “8” looks like two mirrored “o”; numeric “6” stands for rotating operation because the numeric “6” looks like a whirl; numeric “4” stands for destructive operation because the numeric “4” looks like a stab. Accordingly, one of the final mappings can be shown in TABLE 4 below.
It is important that mapping between encoding symbols and decoding symbols may contain assignments other than those shown in TABLE 4. However, the assignment can be determined by enumerating all acceptable combinations of possible mappings, determining operators and at least one criterion selected from the group consisting of shape-based mapping score, size of a stored dictionary to disambiguate input, ambiguity length distribution of decoding sequences, hit ratio of optimistic temporal ambiguity resolution, hit ratio of linguistic temporal ambiguity resolution, hit ratio of cascade of optimistic and linguistic temporal ambiguity resolution, hit ratio distribution according to frequency of decoding sequence, keys per character calculated from unigram statistics, optimization across a set of natural languages; and temporal ambiguity measures. For all enumerated acceptable combinations, a weighted score of pre-determined criteria described above is calculated, or alternatively, a preferred assignment is manually selected and accepted if it satisfies the pre-determined criteria.
The shape-based mapping and temporally ambiguous coding scheme can work independently or, in another way, work together to form a text input system.
If the linguistic score calculated in step 711 against the result from step 709 is greater than a pre-determined value Q2, the disambiguated input sequence is accepted and is decoded in step 720. Otherwise, linguistic temporal ambiguity resolution module 807 is activated to elect the best disambiguated input sequence (step 713). All the disambiguated input sequences generated from the temporally ambiguous segments are concatenated in step 722 to generate prediction result 824 and output to the display 801 via the output control module 809. If all the segments are processed and the execution is branched from step 707 to step 715, linguistic score of the input sequence is calculated in step 715. If the result of step 715 is smaller than another predetermined value Q1, the prediction is assumed to be not acceptable and dictionary lookup is performed for better prediction (step 719) by outputting a prediction query 823 to the dictionary module 808 from the output control module 809. Otherwise, request for alternatives from user in step 717 is checked. If no such request, the prediction is implicitly accepted by user, else dictionary lookup is activated (step 719). After dictionary lookup, one of the matches is determined to replace the original prediction in step 724 and user can utilize a special function key such as ‘*’ to select and confirm intended word in step 726.
In another embodiment, the output control module 809 may directly coupled to the input sequence divider 803 or even the keyboard 802 for receiving the text input 820 and control input 822 directly. The received text input 820 and control input 822 may be sent to the dictionary module 808 for looking up a matching encoding sequence and corresponding decoding sequence for outputting.
It is noted that, the components of this invention can be arranged in client-server model such that the coupling is over a communication channel and messages are transmitted in a pre-determined protocol. The temporal ambiguity segmentation module may be duplicated both on client and server sides depends on whether a module requires this data as input exists. The output of the temporal ambiguity segmentation module may direct to consumer modules in parallel as depicted in the preferred embodiment or alternatively in a waterfall manner. The said keyboard of this invention has a plurality of keys responsive to user activation, wherein keys are responsive to the user activation mediated by one of the physical senses of sight, hearing, touch, taste, and smell. The keyboard of this invention can also be any output of sequences of keyboard symbols. Moreover, the display can be selected from the group consisting of visual, auditory, tactile, gustatory and olfactory displays.
Although all temporal ambiguity resolutions are used in the preferred embodiment demonstrated in
Refer to
In
When the user further input a fifth encoding symbol “8”, the originally shown intermediate form of “R” is changed to the intermediate form of “P” as shown in
Consecutively entering of ‘*’ will sequentially enumerate all matches. In this case, only one candidate word is valid so it may be immediately confirmed as intended word.
However, the false prediction can be corrected in other ways. For example, delimiting control is shown in
Once the user finds out that a false prediction occurs at this time, a special function key, such as “#”, can be used for delimiting control. In the embodiment, the user may input encoding sequence “##” to activate as delimiting control encoding sequence. Accordingly, when the user inputs encoding symbols “##” after the input sequence “36098”, the delimiting control encoding sequence “##” is separated into control input as shown in
For traditional predictive text input, the confirmation is required to choose possible candidates or the information is too ambiguous to read. Additionally, any typo during input cannot be recovered and may make a short message completely unreadable. In summary, it is not applicable in the situation of feedbackless typing. The present invention can be applied for feedbackless typing because the present invention utilizes mental mnemonics instead of table lookup, and no timeout or timeout kill is needed to distinguish from previous input token. Furthermore, the candidate words are correlated with visual appearance hence delayed feedback typing can be applied to accept unintended predictions when input and later be resolved manually by user. In an extreme condition, all inputs are logged without any prediction and later resolved by this invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the invention. In view of the foregoing descriptions, it is intended that the present invention covers modifications and variations of this invention if they fall within the scope of the following claims and their equivalents.
Claims
1. An encoding method including a mapping for mapping a plurality of encoding sequences to a plurality of decoding sequences, each of the encoding sequences including at least one encoding symbol chosen from an encoding symbol set, each of the decoding sequences including at least one decoding symbol chosen from a decoding symbol set, wherein the encoding method is characterized in that at least one of the encoding sequences includes at least two encoding symbols, and a predetermined shape changing type of a formal symbol in the encoding sequence is denoted by a latter symbol neighboring to the formal symbol, wherein the shape changing type includes at least one of shape rotating, shape mirroring, shape deflating, stroke removal, cutting and notching.
2. The encoding method of claim 1, wherein the mapping is predetermined in consideration of at least one of correlating to frequency and token enumeration, wherein the token enumeration lists possible mappings between each decoding symbol in the decoding symbol set and encoding symbols in the encoding symbol set.
3. The encoding method of claim 2, wherein the possible mappings are categorized into a plurality of relationships include at least one of a substantially identical relationship, a transform relationship, a deformative relationship, a composition relationship and a destructive relationship.
4. The encoding method of claim 2, wherein the mapping being predetermined in consideration of at least one of correlating to frequency and token enumeration is performed by at least one of:
- a first mapping, which assigns a first unique encoding sequence to a first unique decoding symbol in consideration of frequency of each decoding symbol; and
- a second mapping, which assigns a second unique encoding sequence to a second unique decoding symbol in consideration of possible mappings provided in token enumeration.
5. The encoding method of claim 4, wherein the first mapping includes:
- assigning single-symbol encoding sequences to the decoding symbols having higher frequency; and
- assigning multi-symbol encoding sequences to rest of the decoding symbols.
6. The encoding method of claim 5, wherein the first mapping further includes:
- applying the second mapping on one decoding symbol which is already assigned one single-symbol encoding sequence to change the assigned one single-symbol encoding sequence to another encoding sequence.
7. The encoding method of claim 4, wherein the second mapping assigns encoding sequences to the decoding symbols in accordance with a predetermined order of possible mappings listed in the token enumeration.
8. The encoding method of claim 7, wherein the predetermined order of possible mappings is selected from a substantially identical relationship, a transform relationship, a deformative relationship, a composition relationship and a destructive relationship.
9. The encoding method of claim 7, wherein the second mapping further includes:
- applying the first mapping on one decoding symbol which is already assigned by the second mapping to change the assigned encoding sequence to another encoding sequence.
10. The encoding method of claim 2, wherein the mapping is predetermined further by at least one predetermined criterion selected from the group consisting of:
- shape-based mapping score;
- size of a stored dictionary to disambiguate input;
- ambiguity length distribution of decoding sequences;
- hit ratio of optimistic temporal ambiguity resolution;
- hit ratio of linguistic temporal ambiguity resolution;
- hit ratio of cascade of optimistic and linguistic temporal ambiguity resolution;
- hit ratio distribution according to frequency of decoding sequence;
- keys per character calculated from unigram statistics;
- optimization across a set of natural languages; and
- temporal ambiguity measures.
11. An encoding method including a mapping for mapping a plurality of encoding sequences to a plurality of decoding sequences, each of the encoding sequences including at least one encoding symbol chosen from an encoding symbol set, each of the decoding sequences including at least one decoding symbol chosen from a decoding symbol set, wherein the encoding method is characterized in that the encoding symbol set is composed of numerals, and each of substantial number of the decoding symbols is mapped to one encoding symbol sequence such that a shape of the decoding symbol mapped by the encoding sequence substantially conforms to a shape composition of the encoding sequence.
12. The encoding method of claim 11, wherein the mapping is predetermined in consideration of at least one of correlating to frequency and token enumeration, wherein the token enumeration lists possible mappings between each decoding symbol in the decoding symbol set and encoding symbols in the encoding symbol set.
13. The encoding method of claim 12, wherein the possible mappings are categorized into a plurality of relationships include at least one of a substantially identical relationship, a transform relationship, a deformative relationship, a composition relationship and a destructive relationship.
14. The encoding method of claim 12, wherein the mapping being predetermined in consideration of at least one of correlating to frequency and token enumeration is performed by at least one of:
- a first mapping, which assigns a first unique encoding sequence to a first unique decoding symbol in consideration of frequency of each decoding symbol; and
- a second mapping, which assigns a second unique encoding sequence to a second unique decoding symbol in consideration of possible mappings provided in token enumeration.
15. The encoding method of claim 14, wherein the first mapping includes:
- assigning single-symbol encoding sequences to the decoding symbols having higher frequency; and
- assigning multi-symbol encoding sequences to rest of the decoding symbols.
16. The encoding method of claim 15, wherein the first mapping further includes:
- applying the second mapping on one decoding symbol which is already assigned one single-symbol encoding sequence to change the assigned one single-symbol encoding sequence to another encoding sequence.
17. The encoding method of claim 14, wherein the second mapping assigns encoding sequences to the decoding symbols in accordance with a predetermined order of possible mappings listed in the token enumeration.
18. The encoding method of claim 17, wherein the predetermined order of possible mappings is selected from a substantially identical relationship, a transform relationship, a deformative relationship, a composition relationship and a destructive relationship.
19. The encoding method of claim 17, wherein the second mapping further includes:
- applying the first mapping on one decoding symbol which is already assigned by the second mapping to change the assigned encoding sequence to another encoding sequence.
20. The encoding method of claim 12, wherein the mapping is predetermined further by at least one predetermined criterion selected from the group consisting of:
- shape-based mapping score;
- size of a stored dictionary to disambiguate input;
- ambiguity length distribution of decoding sequences;
- hit ratio of optimistic temporal ambiguity resolution;
- hit ratio of linguistic temporal ambiguity resolution;
- hit ratio of cascade of optimistic and linguistic temporal ambiguity resolution;
- hit ratio distribution according to frequency of decoding sequence;
- keys per character calculated from unigram statistics;
- optimization across a set of natural languages; and
- temporal ambiguity measures.
21. The encoding method of claim 11, wherein the numerals are selected from a group consisting of ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, and ‘0’.
Type: Application
Filed: Apr 28, 2008
Publication Date: Oct 30, 2008
Inventor: Jen-Te Chen (Shengang Township)
Application Number: 12/110,377
International Classification: H03M 11/00 (20060101);