SYSTEM FOR RECOGNIZING MULTIPLE OBJECT INPUT AND METHOD AND PRODUCT FOR SAME
Methods, systems, and computer program products are provided for the recognition of input of multiple objects into a computing device, wherein the computing device has a processor and at least one application for recognizing the input under control of the processor. The application is configured to determine at least one geometrical feature of a plurality of elements of the input, and compare the determined at least one geometrical feature with at least one pre-determined geometrical threshold to determine a positive or negative result. If the comparison yields a negative result, the application considers the elements as belonging to one object in the recognition of the input. If the comparison yields a positive result, the application considers the elements as belonging to multiple objects in the recognition of the input.
This application claims priority to European Application No. 15290183.1 filed on Jul. 10, 2015, the entire contents of which is incorporated by reference herein.
TECHNICAL FIELDThe present invention relates generally to the field of computing device interfaces capable of recognizing input of multiple handwritten objects.
BACKGROUNDThe ubiquity of computing devices to daily life continues to grow. They take the form of personal and professional desktops, laptops, hybrid laptops, tablet PCs, e-book readers, mobile phones, smartphones, wearable computers, global positioning system (GPS) units, enterprise digital assistants (EDAs), personal digital assistants (PDAs), game consoles, and the like.
Computing devices generally consist of at least one processing element, such as a central processing unit (CPU), some form of memory, and input and output devices. The variety of computing devices and their subsequent uses necessitate a variety of input devices. One such input device is a touch sensitive surface such as a touch screen or touch pad wherein user input is received through contact between the user's finger or an instrument such as a pen or stylus and the touch sensitive surface. Another input device is an input surface that senses gestures made by a user above the input surface. Either of these methods of input can be used generally for drawing or inputting so-called digital ink to express text, symbols, etc., which the computing device interprets using handwriting recognition systems or methods. Other systems for handwriting input to computing devices include electronic or digital pens which interact with paper, encoded surfaces or digitizing surfaces in order to have their movement relative to the surface tracked by a computing device, such as the systems provided by Anoto AB., Leapfrog Enterprises, Inc., and Livescribe, Inc.
Regardless of the input method used, handwriting recognition systems and methods typically involve determining the initiation of a digital ink stroke, such as when first contact with a touch sensitive surface is made (pen-down event); the termination of the stroke, such as when contact with the touch sensitive surface is ceased (pen-up event); and any movement (gestures or strokes) made between stroke initiation and termination. These determined strokes are processed to interpret the input which is usually performed in several stages including preprocessing, segmentation, recognition, and interpretation. Generally, the preprocessing stage involves discarding irrelevant input data and normalizing, sampling, and removing noise from relevant data. The segmentation stage specifies the different ways to break down the input data into individual elements to be recognized depending on the type of input, e.g., characters, words, symbols, objects, or shapes. The recognition stage generally includes a feature extraction stage, which characterizes the different input segments, and a classification stage which associates the segments with possible recognition candidates. The interpretation stage generally involves identifying the elements associated with the candidates. Less, more, or different stages are also possible.
The type of computing device or input surface can also determine the type of handwriting recognition system or method utilized. For instance, if the input surface is large enough (such as a tablet), the user can handwrite input anywhere on or above the input surface, as if the user was writing on paper. This however adds complexity to the recognition task, because the separate elements to be recognized may be related dependent of the relative positions of the elements or may be unrelated independent of their relative positions.
For example, one desired use for handwriting recognition is in note-taking for the capture of mathematical equations or expressions, physics concepts, chemistry formulas, musical notation, etc., during education sessions, such as classes or lectures. That is, a student may wish to write multiple equations over several lines to express the working of a mathematical problem which the educator has demonstrated (which could also be in digital ink) or which the student is required to solve as an assignment or assessment, or the educator may wish to prepare a worksheet for students involving a list of non-related equations that define a set of problems to be solved by the student, either manually or automatically by the computing device, or the capture of a system of equations or vector/matrix may be desired. The need for the entry of multiple connected or un-connected expressions may also occur in enterprise settings, such as during budget setting meetings, technology research and development sessions, technical documentation, etc., or in personal settings, such as a consumer writing a long addition over several lines whilst grocery shopping in order to calculate the total amount.
Systems for the recognition of handwritten mathematical equations are known. These systems concentrate on determining the elements of input equations through matching against databases/lexicons containing known mathematical symbols and relationships. These systems generally recognize the elements without any consideration of the actual content or structure of the equations themselves. As such when multiple equations are entered, say in a vertical list, it is possible that the recognition may consider elements of intended separate equations to belong to the same equation, or at least the recognition element will form and test hypotheses with respect to this. This of course substantially increases recognition processing and time, and decreases recognition accuracy.
Some known systems relate to providing calculations or likely solutions of input equations, and therefore may take the content into account. However, these systems do not recognize multiple equations input either, rather they recognize the input of mathematical operators, such as the equals sign or a result line, or user gestures, to determine when a solution is to be provided to the currently input equation, such that the next input is inherently another separate equation or an edit to the current equation, see for example European Patent No. 0 676 065.
Other known systems provide recognition of systems of equations and tabular structures, such as matrices, involving equations. However, these systems rely on indicative elements for recognition, such as brackets or spatial alignment, e.g., within rows and columns, and as such do not recognize multiple equations as such rather a structure involving multiple inputs of any type, see for example U.S. Pat. No. 7,447,360.
What is required is a system that recognizes multiple equation inputs independent of links between the equations that do not rely on the input of specific designation elements or gestures and do not significantly increase processing time or complexity to the recognition of the equations themselves whilst retaining sufficient recognition accuracy.
SUMMARYThe examples of the present method, system, and computer program product are described herein below as providing the recognition of input of multiple objects into a computing device, wherein the computing device has a processor and at least one method or system for recognizing the input under control of the processor.
In an aspect of the disclosed method, system, and computer program product the disclosed system and method determines at least one geometrical feature of a plurality of elements of the input, and compares the at least one geometrical feature with at least one pre-determined geometrical threshold to determine a positive or negative result. If the comparison yields a negative result, the disclosed method or system considers the elements as belonging to one object in the recognition of the input. If the comparison yields a positive result, the method or system considers the elements as belonging to multiple objects in the recognition of the input.
The at least one geometrical feature may include one or more distances between pairs of elements of the plurality of elements. The one or more distances may be between one or more factors of the content of each element of each pair of elements. The one or more factors may include at least one of a factor common to the elements of each pair of elements and a geometrical boundary including each element.
Each element of each pair of elements may represent one or more handwritten strokes, such that the common factor is the barycenter of the one or more strokes, the at least one pre-determined geometrical threshold is a barycenter distance threshold, and the comparison yields a positive result if the barycenter distance determined for a pair of elements is greater than the barycenter distance threshold, such that the elements of the pair of elements are considered as belonging to different objects.
The at least one pre-determined geometrical threshold may be a geometrical boundary distance threshold, such that the comparison yields a positive result if the geometrical boundary distance determined for a pair of elements is greater than the geometrical boundary distance threshold, such that the elements of the pair of elements are considered as belonging to different objects.
The comparison may include comparing a first distance with a first pre-determined distance threshold and a second distance with a second pre-determined distance threshold for each pair of elements. In this case, the comparison yields a positive result for a pair of elements if both the first and second distances are greater than the respective first and second pre-determined distance thresholds, such that the elements of the pair of elements are considered as belonging to different objects. For each pair of elements, the first distance may the distance between the common factor of the elements and the second distance may be the distance between the geometrical boundary of the elements, such that the first pre-determined distance threshold is a common factor distance threshold, and the first pre-determined distance threshold is a geometrical boundary threshold.
The elements of each pair of elements may be geometrically adjacent, and the method or system may be configured to determine at least one of a positional and temporal order of input of the elements of the plurality of elements.
The at least one geometrical threshold may be pre-determined with consideration of the determined temporal order of input of the elements.
For at least one pair of the pairs of elements, the method or system may be configured to determine the at least one geometrical feature by determining the geometrical boundary distances between pairs of elements which each contain a first element having a first positional order relationship with one element of the at least one pair and a second element having a second positional order relationship with the other element of the at least one pair, and determining the minimum distance of the determined geometrical boundary distances. In this case, the at least one pre-determined geometrical threshold includes a geometrical boundary distance threshold, such that the comparison includes comparing the determined minimum geometrical boundary distance with the geometrical boundary distance threshold, and the comparison yields a positive result if the determined minimum geometrical boundary distance is greater than the geometrical boundary distance threshold, such that the elements of the at least one pair are considered as belonging to different objects.
The positional order may be directional, with the first and second directional relationships being first and second directions from the elements of the at least one pair, respectively.
The pairs of first and second elements may contain first elements within a geometrical area of the second element. In this case, each element of each pair of elements represents one or more handwritten strokes, and the geometrical area is based on a characteristic of the one or more handwritten strokes.
The multiple objects may be one or more geometrical separated handwritten mathematical equations, with the elements being handwritten characters, symbols and operators of each of the multiple mathematical equations.
The present system and method will be more fully understood from the following detailed description of the examples thereof, taken together with the drawings. In the drawings like reference numerals depict like elements. In the drawings:
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. However, it should be apparent to those of ordinary skill in the art that the present teachings may be practiced without such details. In other instances, well known methods, procedures, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings. Reference to and discussion of directional features such as up, down, above, below, lowest, highest, horizontal, vertical, etc., are made with respect to the Cartesian coordinate system as applied to the input surface on which the input to be recognized is made.
The various technologies described herein generally relate to multiple handwritten object recognition. The system and method described herein may be used to recognize a user's natural writing or drawing style input to a computing device through the processes of pre-processing and recognition. The user's input to the computing device can be made via an input surface, such as a touch sensitive screen, connected to, or of, the computing device or via an input device, such as a digital pen or mouse, connected to the computing device. Whilst the various examples are described with respect to recognition of handwriting input using so-called online recognition techniques, it is understood that application is possible to other forms of input for recognition, such as offline recognition (ICR) in which images rather than digital ink are recognized.
The computing device 100 has at least one display 102 for outputting data from the computing device such as images, text, and video. The display 102 may use LCD, plasma, LED, iOLED, CRT, or any other appropriate technology that is or is not touch sensitive as known to those of ordinary skill in the art. At least some of display 102 is co-located with at least one input surface 104. The input surface 104 may employ technology such as resistive, surface acoustic wave, capacitive, infrared grid, infrared acrylic projection, optical imaging, dispersive signal technology, acoustic pulse recognition, or any other appropriate technology as known to those of ordinary skill in the art to receive user input. The input surface 104 may be bounded by a permanent or video-generated border that clearly identifies its boundaries.
In addition to the input surface 104, the computing device 100 may include one or more additional I/O devices (or peripherals) that are communicatively coupled via a local interface. The additional I/O devices may include input devices such as a keyboard, mouse, scanner, microphone, touchpads, bar code readers, laser readers, radio-frequency device readers, or any other appropriate technology known to those of ordinary skill in the art. Further, the I/O devices may include output devices such as a printer, bar code printers, or any other appropriate technology known to those of ordinary skill in the art. Furthermore, the I/O devices may include communications devices that communicate both inputs and outputs such as a modulator/demodulator (modem; for accessing another device, system, or network), a radio frequency (RF) or other transceiver, a telephonic interface, a bridge, a router, or any other appropriate technology known to those of ordinary skill in the art. The local interface may have additional elements to enable communications, such as controllers, buffers (caches), drivers, repeaters, and receivers, which are omitted for simplicity but known to those of skill in the art. Further, the local interface may include address, control, and/or data connections to enable appropriate communications among the other computer components.
The computing device 100 also includes a processor 106, which is a hardware device for executing software, particularly software stored in the memory 108. The processor can be any custom made or commercially available general purpose processor, a central processing unit (CPU), a semiconductor based microprocessor (in the form of a microchip or chipset), a macroprocessor, microcontroller, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, state machine, or any combination thereof designed for executing software instructions known to those of ordinary skill in the art. Examples of suitable commercially available microprocessors are as follows: a PA-RISC series microprocessor from Hewlett-Packard Company, an 80×86 or Pentium series microprocessor from Intel Corporation, a PowerPC microprocessor from IBM, a Sparc microprocessor from Sun Microsystems, Inc., a 68xxx series microprocessor from Motorola Corporation, DSP microprocessors, or ARM microprocessors.
The memory 108 can include any one or a combination of volatile memory elements (e.g., random access memory (RAM, such as DRAM, SRAM, or SDRAM)) and nonvolatile memory elements (e.g., ROM, EPROM, flash PROM, EEPROM, hard drive, magnetic or optical tape, memory registers, CD-ROM, WORM, DVD, redundant array of inexpensive disks (RAID), another direct access storage device (DASD)). Moreover, the memory 108 may incorporate electronic, magnetic, optical, and/or other types of storage media. The memory 108 can have a distributed architecture where various components are situated remote from one another but can also be accessed by the processor 106. Further, the memory 108 may be remote from the device, such as at a server or cloud-based system, which is remotely accessible by the computing device 100. The memory 108 is coupled to the processor 106, so the processor 106 can read information from and write information to the memory 108. In the alternative, the memory 108 may be integral to the processor 106. In another example, the processor 106 and the memory 108 may both reside in a single ASIC or other integrated circuit.
The software in memory 108 includes an operating system 110, applications 112 and a handwriting recognition (HWR) system 114, which may each include one or more separate computer programs, each of which has an ordered listing of executable instructions for implementing logical functions. The operating system 110 controls the execution of the applications 112 and the HWR system 114. The operating system 110 may be any proprietary operating system or a commercially available operating system, such as WEBOS, WINDOWS®, MAC and IPHONE OS®, LINUX, and ANDROID. It is understood that other operating systems may also be utilized.
The applications 112 may be related to handwriting recognition as described herein, different functions, or both. The applications 112 include programs provided with the computing device 100 upon manufacture and may further include programs uploaded or downloaded into the computing device 100 after manufacture. Some examples include a text editor, telephone dialer, contacts directory, instant messaging facility, computer-aided design (CAD) program, email program, word processing program, web browser, and camera.
The HWR system 114, with support and compliance capabilities, may be a source program, executable program (object code), script, application, or any other entity having a set of instructions to be performed. When a source program, the program needs to be translated via a compiler, assembler, interpreter, or the like, which may or may not be included within the memory, so as to operate properly in connection with the operating system. Furthermore, the handwriting recognition system with support and compliance capabilities can be written as (a) an object oriented programming language, which has classes of data and methods; (b) a procedure programming language, which has routines, subroutines, and/or functions, for example but not limited to C, C++, Pascal, Basic, Fortran, Cobol, Perl, Java, Objective C, Swift, and Ada; or (c) functional programing languages for example but no limited to Hope, Rex, Common Lisp, Scheme, Clojure, Racket, Erlang, OCaml, Haskell, Prolog, and F#. Alternatively, the HWR system 114 may be a method or system for communication with a handwriting recognition system remote from the device, such as server or cloud-based system, but is remotely accessible by the computing device 100 through communications links using the afore-mentioned communications I/O devices of the computing device 100.
Strokes entered on or via the input surface 104 are processed by the processor 106 as digital ink. A user may enter a stroke with a finger or some instrument such as a pen or stylus suitable for use with the input surface. The user may also enter a stroke by making a gesture above the input surface 104 if technology that senses motions in the vicinity of the input surface 104 is being used, or with a peripheral device of the computing device 100, such as a mouse or joystick. A stroke is characterized by at least the stroke initiation location, the stroke termination location, and the path connecting the stroke initiation and termination locations. Because different users may naturally write the same object, e.g., a letter, a shape, or a symbol, with slight variations, the present system accommodates a variety of ways in which each object may be entered whilst being recognized as the correct or intended object.
The recognition stage 118 may include different processing elements or experts.
The segmentation expert 122 defines the different ways to segment the input strokes into individual element hypotheses, e.g., alphanumeric characters and mathematical operators, text characters, individual shapes, or sub expression, in order to form expressions, e.g., mathematical equations, words, or groups of shapes. For example, the segmentation expert 122 may form the element hypotheses by grouping consecutive strokes of the original input to obtain a segmentation graph where each node corresponds to at least one element hypothesis and where adjacency constraints between elements are handled by the node connections.
The recognition expert 124 provides classification of the features extracted by a classifier 128 and outputs a list of element candidates with probabilities or recognition scores for each node of the segmentation graph. Many types of classifiers exist that could be used to address this recognition task, e.g., Support Vector Machines, Hidden Markov Models, or Neural Networks such as Multilayer Perceptrons, Deep, Convolutional or Recurrent Neural Networks. The choice depends on the complexity, accuracy, and speed desired for the task.
The language expert 126 generates linguistic meaning for the different paths in the segmentation graph using language models (e.g., grammar or semantics). The expert 126 checks the candidates suggested by the other experts according to linguistic information 130. The linguistic information 130 can include a lexicon, regular expressions, etc. The language expert 126 aims at finding the best recognition path. In one example, the language expert 126 does this by exploring a language model such as final state automaton (determinist FSA) representing the content of linguistic information 130. In addition to the lexicon constraint, the language expert 126 may use statistical information modeling for how frequent a given sequence of elements appears in the specified language or is used by a specific user to evaluate the linguistic likelihood of the interpretation of a given path of the segmentation graph.
The system and method described herein makes use of the HWR system 114 in order to recognize multiple equations. Multiple equations are defined as the layout of several equations on one or more pages. The equations can be linked (e.g., a sequence or system of equations, showing different steps of a demonstration) or not (e.g., several exercises on a topic).
If only the first handwritten equation 401 was present, say, the recognition stage 118 would create and test multiple hypotheses using its experts. The candidates for each element within the equation 401, i.e., ‘3’, ‘5’, ‘÷’, ‘7’, ‘=’ and ‘5’ would be considered and scored to provide the output 120 of the recognized typesetted versions of the elements, i.e., ‘35÷7=5’. In order to create the hypotheses, many candidates are considered including various concatenations of the individual segmented strokes of each element.
However, since the subsequent handwritten equations 402-404 are present in the example of
This also applies to any handwritten input of a 2D system of objects, such as drawings, geometrical sketches, charts, graphs, diagrams, tables, circuits, music, chemical formulas, etc., in which multiple objects are input and need to be separately recognized. Detecting the presence of multiple 2D objects prior to creating extraneous hypotheses therefore assists in the reducing recognition processing overhead (e.g., time and memory resources) and in the increasing recognition accuracy (e.g., unintended but otherwise probable candidates based on the expert models are not tested).
One method of detecting multiple input objects would be to require specific user action, such as creating a new writing area, tapping a new object/line physical or virtual button, inputting certain handwritten gestures—e.g., a downward stroke—or other interactions—e.g., a tap on the input surface in order to indicate the end of input of an object. However, such specific user actions would not provide a satisfactory experience for the users as they are not included in natural handwriting. The recognition process of multiple 2D objects should not be affected in a way that reduces the users' experience of the system or which places constraints on the type of input that will be recognized. For example, the recognition of mathematical operator symbols in the equations, such as the equals symbol ‘=’ in
Another example of the recognition stage 118 for detection of multiple 2D objects input as a vertical list includes the determination of empty vertical space (i.e., space of the input surface 104 on which handwritten strokes are not input) between vertically displaced handwritten objects using geometrical features of elements of the input objects. Vertical empty space greater than a predetermined size, e.g., a threshold, is detected. This is a process of ‘cutting’ the input into sections to classify or filter those cuts above the threshold as being a space between individual objects, such as equations. All elements over multiple lines are not considered in subsequent processing by the experts of the recognition stage 118 as belonging to the same equation. This significantly cuts down the number of hypotheses created and tested. The classification is performed by allocating one or more geometrical costs. The threshold includes (vertical) geometrical cost(s), which is adjustable so that the filtering can be performed in order to optimize or train the filtering process to determine suitable threshold levels that optimize multiple object detection without returning a substantial number of false positives. This example is similarly applicable in the horizontal direction.
The input of handwritten strokes are received (step 701) and the elements are determined (step 702). It is then determined if groups of the elements can be established (step 703). If they cannot, the elements are sent to the next stage to create hypotheses using all input strokes (step 704). If the elements can be grouped, a bounding box is established about each group (step 705). In
Then it is determined if adjacent groups are present (step 707). If so, ‘cut’ lines at adjacent bounding box edges are established in the y and/or x directions depending on the application (step 708). In
If the geometrical cost is above the threshold, the first and second groups represent two separate 2D objects, such as two equations, and the next recognition stage hypotheses does not involve elements of adjacent multiple objects being created or tested (step 711). This simplifies recognition and increases recognition speed. On the other hand, if the geometrical cost is below the threshold, hypotheses for all elements of the determined groups have to be created and tested by the next recognition stage (step 712).
The example shown in
The setting of the threshold described in
In the example described above, vertically separated input is searched by establishing groups of horizontally displaced elements. These groups may first be established by performing a search for horizontal empty space between horizontally adjacent elements until there are no more horizontally adjacent elements. Those elements are considered a group, and a bounding box is established about the boundary thereof to contain all elements in both the x and y directions. This grouping may be refined by setting a horizontal distance threshold or horizontal geometrical cost, so elements or groups of elements that are horizontally displaced by a large distance are not grouped together. For example, mathematical equations may not be expected to have large gaps in a single equation, whereas in drawing input large gaps between drawing elements may be intended for geometrical information.
Grouping may also be made based on positional relationships other than a vertical relationship, including horizontal relationships for horizontally displaced elements which overlap vertically—e.g., elements 601-604—or general positional relativity such as geometrical features. This grouping may be based on elements sharing a common trend-line or virtual center of gravity in the y-direction. This could be based on the common geometrical features of the strokes themselves (e.g., the barycenter) or common or non-common geometrical features of the elements (e.g., edges of the bounding boxes, mean center-lines of the y-direction extents of each bounding box, the horizontal lines a and a′ in
The above steps of the recognition stage 118 can be considered as ‘pre-processing’ or ‘filtering’ based on a threshold with the purpose of reducing the number of possibilities (and so the processing time) to cut the handwritten input into individual objects. Accordingly, this process may be carried out in either preprocessing 116 or the recognition stage 118, depending on the computer device 100. The confidence applied to this filtering depends on the application and type of handwritten input, as well as the desired level of accuracy. Further, this filtering may also be part of the decision making process for recognizing multiple objects, such as a vertical list of equations. The recognition stage 118 may be supplemented with consideration of other factors to create and explore different segmentation hypotheses, for example, using the grammar and language models. Accordingly, the threshold may factor in more than the geometrical cost for optimization.
For instance, the threshold may factor in the temporal or time-order entry of the strokes/elements. By taking the time-order of input into account, success of filtering is enhanced for instance in the horizontal input of a group of elements followed by a second group of one or more elements vertically displaced from the first group indicates potential multiple object entry, such as multiple equations. Whereas, the vertical input of a group of elements followed by the input of one or more elements horizontally displaced from the first group may indicate, a single equation having vertical functions, e.g., divisions.
For example, element 604 is vertically displaced from element 605, and element 604 is horizontally displaced by a relatively large horizontal distance from element 605. A similar horizontal distance exists between elements 601 and 604, but there are intervening elements 602 and 603. All of this relative geometrical information could be used to determine whether the element 605 belongs to a new object or at least does not belong to the element group 610. However, the geometrical cost may be below, but near, the geometrical cost threshold such that the system is not confident of multiple equation detection. If the system uses time-order information to determine element 604 was input at time t4 directly before the input of the element 605 at time t5 the time-order difference, being the time cost, could be used to boost the geometrical cost to be above the geometrical cost threshold. A combined threshold could be set in which the geometrical and time costs are compared. A weighted threshold could also be used to apply different weightings to the geometrical cost threshold based on the time-order.
Other factors can also be used to hone the multiple object detection through the setting of combined thresholds and/or adjustment of calculated costs for comparison with the threshold(s). These factors include absolute and relative locations of the elements (e.g., determination of barycenter or relative distances between adjacent elements), which may be most useful when the time-order is not aligned with the position-order (e.g., if in
The above described recognition of a vertical list of objects based on grouping of horizontally displaced elements has the purpose of reducing the processing through not considering the geometrical relationship between all vertically separated objects when creating hypotheses for recognition in subsequent recognition processing. While processing speed is important to the user experience, accuracy of the recognition is just as important. Typically handwriting recognition systems compromise between speed and accuracy, increasing accuracy typically increases processing time and decreasing processing time typically decreases accuracy. In order to provide an effective system, a balance between these factors needs to be found. In the alternative, the following example includes multiple geometrical costs to provide a better balance between time and accuracy.
The input 1000 includes multiple elements 1001-1006 illustrated as boxes containing one or more handwritten input strokes. The boxes may be bounding boxes around the extent of each stroke or set of strokes of an element. Each of the elements 1001-1006 has been written with horizontal (designated as the ‘x’ direction) and/or vertical (designated as the ‘y’ direction) displacement with respect to each of the other elements 1001-1006. In order to recognize whether elements 1001-1006 belong to one or more vertically displaced objects, process 1300 is performed as illustrated in the flow diagram of
This process begins with the determination of the number n of elements present (i.e., n=6 in the present example) representing the input of multiple handwritten strokes (step 1301). The y-order, being a positional or directional order of entry in the y-direction, of these n elements is then determined (step 1302) as Y1 to Yn. In the present example as depicted in
Tests 1 and 2 are performed iteratively for each consecutive y-order element (beginning with the first element, e.g., Y1, as designated by step 1303). The geometrical cost determinations of Tests 1 and 2 are considered together (step 1306) in order to allow a decision as to whether a ‘cut’ line should be created at the lowest (in the y-direction) edge of the current y-order element, i.e., between that element and the elements below, thereby defining a boundary between vertically displaced objects (step 1307). This cut line creation may include defining a bounding box about the elements considered to possibly belong to the same object, thereby grouping elements for which the different geometrical costs were determined to be below the corresponding confidence thresholds in the sequential y-order iteration. Once this decision is made, the parameter i is incremented to i+1 (step 1308) and processing returns to implement Tests 1 and 2 for the next consecutive element until the final element of the input is tested.
Test 1 involves determining whether a (first) geometrical cost is more than a (first) pre-determined threshold. As shown in
The current first geometrical cost ei is then compared to a (first) pre-determined (barycenter) threshold distance as a first test of confidence for there being multiple objects (step 1404). The result of this step is a ‘yes’ or ‘no’ determination. The setting of the first threshold may be done so a high level of confidence is achieved if a large barycenter separation of consecutive elements is present. For example, in
Test 2 involves determining whether a (second) geometrical cost is more than a (second) pre-determined threshold. As shown in
For example, in
The current second geometrical cost fi is then compared to a (second) pre-determined (gap) threshold distance as a second test of confidence for multiple objects (step 1504). The result of this step is a ‘yes’ or ‘no’ determination. The setting of the second threshold may be done so a high level of confidence is achieved if a large gap between consecutive elements is present. For example, in
Returning to
On the other hand, for the example of
Depending on the input, the examples of
For example, if a large number of elements are present—i.e., step 1301 determines n is a relatively large number—but few or no cuts are created after performing the tests on all (or a statistically relevant number) of the elements. The present system may consider more cuts should have been created due to the large number of elements that may indicate the presence of multiple objects, particularly if the y-direction extent of the elements and relative sizes of the elements are also taken into account. The present system may then adjust one or both the barycenter and gap thresholds to allow more positive results to occur. This process should be performed with caution as the return of too many false positives will lead to inaccurate recognition in the next stage. Such decisions of adjustment can be assisted through the training of the present system using a statistically large number of input samples in an initialization exercise for setting the thresholds. This training can also be or on an ongoing basis, particularly if the HWR system 114 is hosted on a remote server.
As described earlier in relation to steps 1405 and 1505, the measured/calculated first and second geometrical costs may also be output by the separate tests. As illustrated in
As discussed earlier, a further consideration is determining how the results of the separate tests affect the balance between speed and accuracy. In the present example, the first test (Test 1) considers the barycenter distances between consecutive y-order elements. The determination of stroke barycenters, or alternatively or additionally other stroke factors, and measurement/calculation of distances therebetween is performed relatively fast, e.g., within units of microseconds. However taking just these distances into account to decide where to ‘cut’ may result in many false positives, since stokes factors, such as barycenter displacement, alone are not accurate indicators of multiple objects, e.g., the strokes within a single equation may have many varied sizes such that barycenters of adjacent strokes are distributed widely but the gaps between the strokes are relatively small, such as in
On other hand, the second test (Test 2) considers the gap or inter-element distances between consecutive y-order elements, and its use of distances between bounding boxes or other grouping mechanisms of strokes provides a reasonably accurate indicator of multiple objects. However, the establishment of those groups may be performed relatively slowly, e.g., within tens to hundreds of microseconds. Thus, Test 2 may be considered to provide relatively high recognition accuracy but relatively low processing speed. As such, the combination of the two tests allows a balance between speed and accuracy, and this balance can be adjusted (e.g., by adjusting or weighting the corresponding thresholds) based on the needs of the system, e.g., speed is favored over accuracy, or vice-versa.
More precisely, the test which determines and compares displacement of common geometrical features of the content of the elements, e.g., the barycenter of the strokes may result in a relatively high number of cuts but an unacceptable proportion of these may be false positives, leading to higher inaccuracy. Whereas the test that determines and compares displacement of common or non-common geometrical features of the elements themselves, e.g., the same or different edges of the bounding boxes of the elements provides a relatively lower number of false positives but not enough cuts, leading to higher processing time.
Further improvement of the accuracy can be provided by an alternative example of the second geometrical test 1305, illustrated in
First a non-incremental parameter j is set to i (step 1701) for use in later steps, and the incremental parameter i is set to i+1 (step 1702) to allow iteration. After these initializations, it is determined whether a next consecutive y-order element (i.e., Yi due to the increment of the parameter i) from the y-order element being tested (i.e., Yj due to the setting of the parameter j) is present (step 1703). So long as further elements are found, a geometrical area as a search zone is determined for the next consecutive y-order element Yi (step 1704). The extent of this search zone is adjustable and used to determine the minimum distance between the next element(s) (in the y-order) and the current element under consideration (i.e., Yj) and any prior elements in the y-order (i.e., Y1-Yj-1) to determine the second geometrical cost.
Upon establishment of the search zone, it is determined if there are any elements from the current element being tested and prior elements (i.e., Y1-Yj) within the search zone (step 1705). If none of the y-order elements Y1-Yj are within the search zone, the processing returns to step 1702 to iterate to a next element to be searched. If at least one of the y-order elements Y1-Yj is within the search zone, the vertical distance(s) fim (where m=the number of the elements Y1-Yj in the search zone) between the current next element any each of the elements Y1-Yj above that element is calculated/measured and the minimum of these distances is determined (step 1706). For example, in
In order to determine whether this minimum distance should be considered in later processing as the second geometrical cost (described later), it is first determined if a minimum distance f has been previously stored, for example, in the memory 108 (step 1707). If so, the determined minimum distance is compared with this previously stored minimum distance f (step 1708), and if not, the determined minimum distance is stored as the minimum distance f (step 1709). For example, in the processing of element 1004 in
In the comparison of step 1708, if the current minimum distance is greater than the stored minimum distance, it is discarded and processing returns to step 1702, since this means that there is another element present which is closer to the potential first object than the element presently under consideration. On the other hand, if the current determined minimum distance is less than the stored minimum distance processing moves to step 1709 so that the current determined minimum distance is stored in place of the currently stored minimum distance f, and then processing returns to step 1702. For example, from
Once all elements under the current test element (e.g., element 1003 in
It was earlier described in relation to
In this situation, the simpler Test 2 of the first example in which the bounding box of all elements below the element being tested is used to create the second geometrical cost will yield a ‘no’ determination. In this case, a ‘cut’ would not subsequently be created beneath element 1003. This would lead to bounding box 1109 being defined around all of elements 1001-1006 as illustrated in
The alternative test may result in a more accurate determination of there being separate 2D objects in example input 1000. As such, the alternative tests for the second geometrical cost may both be performed to provide adjustment of the results returned from Test 2 and/or adjustment of the first and second geometrical cost thresholds.
As with the earlier example described with respect to
The above described examples may include recognition performed during incremental recognition of multiple object elements, such as the strokes in equations, in which the HWR system 114 includes a device, such as an incrementer, which continuously parses the input strokes/elements to the recognition engine (after preprocessing if present) upon input (or short delay thereafter; typically measured by strokes) so that recognition of the strokes is performed and the recognized elements are stored (cached) in the memory 108. In this way, the multiple object testing can be performed in parallel to the recognition so as soon as a separate object is determined, the recognition processing of the strokes within that object has already been performed such that these strokes need not be processed again (i.e., re-recognized) when processing the next and subsequent objects, thereby further optimizing speed of the recognition process.
The above-described example application of the methods and systems described herein are is to a handwritten input of vertically displaced objects, such as mathematical equations. As described earlier, any handwritten input of a 2D system of objects, such as drawings, diagrams, music, chemical formulas, etc., in which multiple objects are input and need to be separately recognized is also applicable as the described methods and systems provide recognition of multiple text, symbols and objects at any orientation. Further, as described earlier, not only vertical lists, for example, can be recognized, but the described methods and system also provide recognition of horizontal lists and arbitrarily placed objects.
Further, as mentioned earlier, the various examples described herein can be applied to forms of input for recognition other than handwriting, such as offline recognition in which images rather than digital ink are recognized, for example, the elements may be input as an image captured as a photograph of writing on paper or a whiteboard, digitally captured on an interactive smartboard, etc.
The described methods and systems increase processing and recognition speed of multiple objects, such as a vertical list of multiple mathematical equations, as multiple object recognition is performed independent of the recognition of the objects themselves. Further, complex multiple object input, such as complex systems of abstract arithmetical equations, is enabled without confused recognition results. Furthermore, writing of multiple objects, such as equations, does not require specific user action for recognition, such as creating a new writing area, tapping a new line button, etc. Further still, matching of strokes/elements to artificial structure, such as tables, is not required for recognition of multiple objects. Further, no learning or training of the algorithm required, however this could be performed to improve results.
While the foregoing has described what is considered to be the best mode and/or other examples, it is understood that various modifications may be made therein and that the subject matter disclosed herein may be implemented in various forms and examples, and that they may be applied in numerous other applications, combinations, and environments, only some of which have been described herein. Those of ordinary skill in that art will recognize that the disclosed aspects may be altered or amended without departing from the true spirit and scope of the subject matter. Therefore, the subject matter is not limited to the specific details, exhibits, and illustrated examples in this description. It is intended to protect any and all modifications and variations that fall within the true scope of the advantageous concepts disclosed herein.
Claims
1. A method of recognizing input of multiple objects to a computing device, the computing device comprising a processor and at least one application for recognizing the input under control of the processor, the method comprising the steps of:
- determining, with the application, at least one geometrical feature of a plurality of elements of the input;
- comparing, with the application, the determined at least one geometrical feature with at least one pre-determined geometrical threshold to determine a positive or negative result;
- if the comparison yields a negative result, considering the elements as belonging to one object in the recognition of the input; and
- if the comparison yields a positive result, considering the elements as belonging to multiple objects in the recognition of the input.
2. A method as claimed in claim 1, wherein the at least one geometrical feature includes one or more distances between pairs of elements of the plurality of elements.
3. A method as claimed in claim 2, wherein the one or more distances is between one or more factors of the content of each element of each pair of elements.
4. A method as claimed in claim 3, wherein the one or more factors includes at least one of a factor common to the elements of each pair of elements and a geometrical boundary including each element.
5. A method as claimed in claim 4, wherein:
- each element of each pair of elements represents one or more handwritten strokes;
- the common factor is the barycenter of the one or more strokes;
- the at least one pre-determined geometrical threshold is a barycenter distance threshold; and
- the comparison yields a positive result if the barycenter distance determined for a pair of elements is greater than the barycenter distance threshold, such that the elements of the pair of elements are considered as belonging to different objects.
6. A method as claimed in claim 4, wherein:
- the at least one pre-determined geometrical threshold is a geometrical boundary distance threshold; and
- the comparison yields a positive result if the geometrical boundary distance determined for a pair of elements is greater than the geometrical boundary distance threshold, such that the elements of the pair of elements are considered as belonging to different objects.
7. A method as claimed in claim 4, wherein the comparison includes comparing a first distance with a first pre-determined distance threshold and a second distance with a second pre-determined distance threshold for each pair of elements.
8. A method as claimed in claim 7, wherein the comparison yields a positive result for a pair of elements if both the first and second distances are greater than the respective first and second pre-determined distance thresholds, such that the elements of the pair of elements are considered as belonging to different objects.
9. A method as claimed in claim 7, wherein, for each pair of elements:
- the first distance is the distance between the common factor of the elements;
- the second distance is the distance between the geometrical boundary of the elements;
- the first pre-determined distance threshold is a common factor distance threshold; and
- the first pre-determined distance threshold is a geometrical boundary threshold.
10. A method as claimed in claim 9, wherein:
- each element of each pair of elements represents one or more handwritten strokes; and
- the common factor is the barycenter of the one or more strokes.
11. A method as claimed in claim 2, wherein the elements of each pair of elements are geometrically adjacent.
12. A method as claimed in claim 2, further comprising determining, with the application, at least one of a positional and temporal order of input of the elements of the plurality of elements.
13. A method as claimed in claim 12, wherein the at least one geometrical threshold is pre-determined with consideration of the determined temporal order of input of the elements.
14. A method as claimed in claim 12, wherein, for at least one pair of the pairs of elements:
- the determining of the at least one geometrical feature includes: determining, with the application, the geometrical boundary distances between pairs of elements which each contain a first element having a first positional order relationship with one element of the at least one pair and a second element having a second positional order relationship with the other element of the at least one pair; and determining, with the application, the minimum distance of the determined geometrical boundary distances;
- the at least one pre-determined geometrical threshold includes a geometrical boundary distance threshold;
- the comparison includes comparing the determined minimum geometrical boundary distance with the geometrical boundary distance threshold; and
- the comparison yields a positive result if the determined minimum geometrical boundary distance is greater than the geometrical boundary distance threshold, such that the elements of the at least one pair are considered as belonging to different objects.
15. A method as claimed in claim 14, wherein the positional order is directional, the first and second directional relationships being first and second directions from the elements of the at least one pair, respectively.
16. A method as claimed in claim 14, wherein the pairs of first and second elements contain first elements within a geometrical area of the second element.
17. A method as claimed in claim 16, wherein:
- each element of each pair of elements represents one or more handwritten strokes; and
- the geometrical area is based on a characteristic of the one or more handwritten strokes.
18. A system for determining input of multiple objects to a computing device, the computing device comprising a processor and at least one application for recognizing the input under control of the processor, the at least one system application configured to:
- receive the input of a plurality of elements;
- determine at least one geometrical feature of the plurality of elements;
- compare the determined at least one geometrical feature with a pre-determined geometrical threshold to determine if the elements belong to one object or to multiple objects.
19. A computer program product, comprising a computer usable medium having a computer readable program code embodied therein, said computer readable program code adapted to be executed to implement a method for recognizing input of multiple objects to a computing device, the computing device comprising a processor and at least one application for recognizing the input under control of the processor, the method comprising the steps of:
- determining, with the application, at least one geometrical feature of a plurality of elements of the input;
- comparing, with the application, the determined at least one geometrical feature with at least one pre-determined geometrical threshold to determine a positive or negative result;
- if the comparison yields a negative result, considering the elements as belonging to one object in the recognition of the input; and
- if the comparison yields a positive result, considering the elements as belonging to multiple objects in the recognition of the input.
Type: Application
Filed: Sep 30, 2015
Publication Date: Jan 12, 2017
Patent Grant number: 9904847
Inventor: Sébastien ONIS (Nantes Cedex)
Application Number: 14/870,735