PARALLEL PROCESSING OF EXTRACTED ELEMENTS

The invention concerns a method for recognizing handwriting input from handwriting strokes of digital ink, on a computing device, the computing device comprising a processor with at least two processing units configured to process data in parallel, a memory and at least one non-transitory computer readable medium for recognizing input under control of the processor, the method, comprising: receiving the handwriting strokes of digital ink; performing element extraction from said strokes to extract a plurality of elements; recognizing the plurality of elements in parallel by: sending at least two elements of the extracted plurality of elements to at least two processing units, respectively; sending successively the remaining elements of the extracted plurality of elements to the processing units as the processing units become available; compiling the plurality of recognized elements to generate the recognized handwriting input.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation-in-part of U.S. patent application Ser. No. 17/867,440 filed on Jul. 18, 2022, which is a continuation-in-part of U.S. patent application Ser. No. 16/715,951 filed on Dec. 16, 2019, which claims priority to European Application No. 19189346.0, filed on Jul. 31, 2019, the entire contents of which are incorporated herein for all purposes.

TECHNICAL FIELD

The present disclosure relates generally to the field of computing device interface capable of recognizing user input of text handwriting. In particular, the present disclosure concerns computing devices and corresponding methods for recognizing extracted elements from strokes of digital ink by parallel processing.

BACKGROUND

Computing devices continue to become more ubiquitous to daily life. They may take various forms such as computer desktops, laptops, tablet PCs, hybrid computers (2-in-1s), e-book readers, mobile phones, smartphones, wearable computers (including smartwatches, smart glasses/headsets), global positioning system (GPS) units, enterprise digital assistants (EDAs), personal digital assistants (PDAs), game consoles, and the like. Further, computing devices are being incorporated into vehicles and equipment, such as cars, trucks, farm equipment, manufacturing equipment, building environment control (e.g., lighting, HVAC), and home and commercial appliances.

Various forms of computing devices are known for inputting and recognizing input elements hand-drawn or handwritten by a user, such as text content (e.g., alphanumeric characters) or non-text content (e.g. shapes, drawings). To this end, known computing devices are usually equipped with a touch sensitive surface or the like to enable users to input handwriting content in the form of strokes of digital ink which may be displayed on a display screen.

A user may typically use an input surface (or any appropriate user interface) to handwrite on a computing device input strokes in a free handwriting format (or free handwriting mode), that is, without any handwriting constraint of position, size and orientation of the text handwriting input. In a free handwriting mode, no line pattern is imposed to the user for the purpose of handwriting. A free handwriting format affords complete freedom to the user during handwriting input, which is sometimes desirable for instance to take quick and miscellaneous notes or make mixed input of text and non-text.

FIG. 1A shows an example of a computing device 1 comprising a display device 1 which displays ink input elements hand-drawn or handwritten by a user in a free handwriting mode using an appropriate user interface. The handwritten ink input elements may be text content, such as text content 4 and 6 in FIG. 1A, or non-text content such as non-text content 8, 10 and 12 in FIG. 1A.

In the present case, the computing device 1 detects and displays text content 4 and 6 and non-text content 8, 10 and 12. Each of these elements is formed by one or more strokes of digital ink. Input elements may comprise for instance text handwriting, diagrams, musical annotations, and so on. In this example, the shape 8 is a rectangle or the like which constitutes a container (a box) containing text content 6 so that both elements 6 and 8 can be selected and manipulated together.

Further, handwriting recognition may also be performed by a computing device by implementing various known techniques. The user handwriting input is typically interpreted using a real-time handwriting recognition system or method. Either on-line systems (recognition carried out using a cloud-based solution or the like) or off-line systems may be used. Once recognized, the computing device may convert the input strokes into a typeset version, as depicted in this example in FIG. 1B.

Accurately detecting and identifying the type of content is a first step in a recognition of the text content. Disambiguating between text and non-text content is one step whereas another step is the accurate extraction of text lines and text blocks. There is thus a need for a solution allowing efficient and reliable text line extraction and text block extraction in a computing device, in particular for text handwriting which is input in a free handwriting mode, to avoid that input strokes are associated with an inappropriate text line.

SUMMARY

The examples of the present invention that are described herein below provide computing devices, methods and corresponding computer programs for performing text line extraction (TLE) and text block extraction (TBE). In a page of strokes of digital ink, a multi-step process may take place to identify and output text lines and text blocks.

Text line extraction is one key step in text handwriting recognition. This operation aims at recognizing different text lines from text content input by a user in a free handwriting format. In other words, text line extraction allows a computing device to determine to which text line various input strokes belong. While text line extraction may be relatively straightforward in some cases, it may also become particularly complex and cause errors in others, in particular when a user does not handwrite in a chronological order. In many cases, users are handwriting text in a logical temporal order, such that a computing device may rely on the temporal order of each input stroke to identify the beginning and end of each text line. The difficulty however increases drastically when users handwrite delayed strokes, i.e. in a non-temporal order. A user may for instance decide to handwrite a group of characters along a certain direction without diacritics for saving time and decide later to supplement the all group of characters with the missing diacritics. Some languages are particularly prone to such a non-chronological handwriting input. For instance, FIGS. 2A and 2B show examples of handwriting input in Arabic and Vietnamese languages. As can be seen, a great number of diacritics, of various forms and styles, are attached to characters. In such languages, the issue of non-chronological handwriting input becomes critical. It may be particularly difficult for a known computing device to determine whether a given diacritic is attached at the top of a character (which means that the diacritic belongs to the text line underneath) or is attached at the bottom of another character (which means that the diacritic belongs to the text line above). Similarly, punctuation marks may be added in packets after handwriting a full sentence or the like, thereby giving rise to more uncertainty. A diacritic may for instance be easily confused with a coma or the like, rendering even more complex the task of text line extraction.

More generally, any delayed stroke for correcting or completing previously input text handwriting may lead to a break in the temporal order, thereby increasing the risk of errors in the process of text line extraction.

Considering that text handwriting is sometimes poorly input by users (e.g. because of a too high handwriting speed or a handwriting style difficult to recognize), known handwriting recognition systems are subject to non-reliable text line extraction. In particular, poor positioning of diacritics, punctuation marks or the like (i.e. by associating a stroke to a wrong text line) may negatively affect text handwriting recognition, and thus undermine the global user experience.

Further, text block extraction is a sequential gathering process and can be considered as a bottom-up approach: it starts from the smallest entities (the lines) and gathers (groups) them until having the biggest entities (the text blocks). This sequence can be described as an iterative step of spatially gathering text lines to create text block hypotheses. The process iterates gathering of text lines until the number of text lines is stable.

According to a particular aspect, the invention provides a method implemented by a computing device for processing text handwriting, comprising: displaying, in a display area, strokes of digital ink which are input substantially along a handwriting orientation; performing text line extraction to extract text lines from said strokes, said text line extraction comprising: slicing said display area into strips extending transversally to the handwriting orientation, wherein adjacent strips partially overlap with each other so that each stroke is contained in at least two adjacent strips; ordering, for each strip, the strokes at least partially contained in said strip to generate a first timely-ordered list of strokes arranged in a temporal order and at least one first spatially-ordered list of strokes ordered according to at least one respective spatial criterion, thereby forming a first set of ordered lists; forming, for each strip, a second set of ordered lists comprising a second timely-ordered list of strokes and at least one second spatially-ordered list of strokes by filtering out strokes below a size threshold from said first timely-ordered list and from said at least one first spatially-ordered list respectively; performing a neural net analysis to determine as a decision class, for each pair of consecutive strokes in each ordered list of said first and second set, whether the strokes of said pair belong to a same text line, in association with a probability score for said decision class; selecting, for each pair of consecutive strokes included in at least one ordered list of said first and second sets, the decision class determined with the highest probability score during the neural net analysis; defining text lines by combining strokes into line hypotheses based on the decision class with highest probability score selected for each pair of consecutive strokes; identifying a first number of available processing units at the processor; and, sending a corresponding number of defined text lines to the identified processing units to be processed in parallel.

In a particular embodiment, sending successively the remaining defined text lines to the processing units to be processed for recognition as the processing units become available; and, compiling in order the recognized text lines when all the text lines are recognized.

As indicated earlier, line extraction is a key step in text recognition and it may not always produce satisfactory results, especially regarding some types of strokes such as diacritics, punctuation marks and the like. More generally, errors may arise during text line extraction when text handwriting is input in a non-chronological order. The present invention allows for an efficient and reliable text line extraction when handwriting recognition is performed on text handwriting by a computing device.

The various embodiments defined above in connection with the method of the present invention apply in an analogous manner to the computing device, the computer program and the non-transitory computer readable medium of the present disclosure.

For each step of the method of the present invention as defined in the present disclosure, the computing device may comprise a corresponding module configured to perform said step.

In a particular embodiment, the disclosure may be implemented using software and/or hardware components. In this context, the term “unit” and “module” can refer in this disclosure to a software component, as well as a hardware component or a plurality of software and/or hardware components.

The present invention also relates to a method for recognizing handwriting input from handwriting strokes of digital ink, on a computing device, the computing device comprising a processor with at least two processing units configured to process data in parallel, a memory and at least one non-transitory computer readable medium for recognizing input under control of the processor, the method, comprising: receiving the handwriting strokes of digital ink; performing element extraction from said strokes to extract a plurality of elements; recognizing the plurality of elements in parallel by: sending at least two elements of the extracted plurality of elements to at least two processing units, respectively; sending successively the remaining elements of the extracted plurality of elements to the processing units as the processing units become available; compiling the plurality of recognized elements to generate the recognized handwriting input.

In a particular embodiment, the elements are text or non-text elements.

In a particular embodiment, said text elements are words, lines, paragraphs, or mathematical expressions.

In a particular embodiment the non-text elements are shapes, drawings or image data including characters, strings or symbols used in non-text contexts.

In a particular embodiment sending elements to the processing units comprises sending semantic groups of elements to the processing units.

In a particular embodiment, the method comprises grouping the plurality of elements to generate the semantic groups of elements according to semantic predefined rules, wherein the grouping of the plurality of elements comprises: merging at least two elements according to merging predefined rules to update the plurality of elements; and/or splitting at least one element according to splitting predefined rules to update the plurality of elements.

In a particular embodiment applying one merging predefined rule, to at least two consecutive elements of a sequence of text lines, comprises: detecting one text line of the sequence of text lines including a junction pattern of the one text line; generating a merged text line comprising the detected text line merged with a subsequent text line of the sequence of text lines.

In a particular embodiment the junction pattern is a merging punctuation mark as the last symbol of the one text lines such as a hyphen.

In a particular embodiment applying one merging predefined rule to at least two elements comprises: detecting a special formatting of a first element; detecting the special formatting of at least a second element in the vicinity of the first element; generating a merged element comprising the first and the at least second elements.

In a particular embodiment, the special formatting of the first and at least second elements is bolding, italicizing, underlining, or coloring.

In a particular embodiment applying one splitting predefined rule on one element, comprises: detecting a split pattern of the one element; generating a first split element and a second split element according to the splitting pattern.

In a particular embodiment, the split pattern is a splitting punctuation mark or a line break.

In a particular embodiment, applying another splitting predefined rule on one element comprises: detecting a first and a second formatting within the one element; generating at least two split groups of elements wherein a first split group of contiguous elements of the first formatting and at least a second split group of contiguous elements of the second formatting.

In a particular embodiment, the sending of the at least two elements of the plurality of elements comprises: identifying the total number of elements; identifying the available number of processing units; if the total number of elements is higher than the available number of processing units: sending a first number of the plurality of elements to the available processing units, the first number of elements being equal to the available number of processing units.

In a particular embodiment, if the total number of elements is lower than, or equal to, the number of available processing units: sending the plurality of elements to the available processing units simultaneously.

In a particular embodiment, the recognizing of the plurality of elements in parallel comprises: calculating a complexity score for each element of the plurality of elements; ordering each element according to the complexity score; sending the plurality of elements to the processing units using the ordering sequence.

According to a particular aspect, the invention provides a method for processing text handwriting on a computing device, the computing device comprising a processor having multiple processing units, a memory and at least one non-transitory computer readable medium for recognizing input under control of the processor, the method comprising: displaying, in a display area, strokes of digital ink which are input substantially along a handwriting orientation; performing text line extraction to extract text lines from said strokes, said text line extraction comprising: slicing said display area into strips extending transversally to the handwriting orientation, wherein adjacent strips partially overlap with each other so that each stroke is contained in at least two adjacent strips; ordering, for each strip, the strokes at least partially contained in said strip to generate a first timely-ordered list of strokes arranged in a temporal order and at least one first spatially-ordered list of strokes ordered according to at least one respective spatial criterion, thereby forming a first set of ordered lists; forming, for each strip, a second set of ordered lists comprising a second timely-ordered list of strokes and at least one second spatially-ordered list of strokes by filtering out strokes below a size threshold from said first timely-ordered list and from said at least one first spatially-ordered list respectively; performing a neural net analysis to determine as a decision class, for each pair of consecutive strokes in each ordered list of said first and second set, whether the strokes of said pair belong to a same text line, in association with a probability score for said decision class; selecting, for each pair of consecutive strokes included in at least one ordered list of said first and second sets, the decision class determined with the highest probability score during the neural net analysis; and defining text lines by combining strokes into line hypotheses based on the decision class with highest probability score selected for each pair of consecutive strokes; identifying a first number of available processing units at the processor; sending a corresponding number of defined text lines to the identified processing units to be processed in parallel.

In another embodiment, the method is further comprising: sending successively the remaining defined text lines to the processing units to be processed for recognition as the processing units become available; compiling in order the recognized text lines when all the text lines are recognized.

BRIEF DESCRIPTION OF THE DRAWINGS

Other characteristics and advantages of the present disclosure will appear from the following description made with reference to the accompanying drawings which show embodiments having no limiting character. In the figures:

FIGS. 1A-1B represent a process of text handwriting recognition.

FIGS. 2A-2B show examples of text handwriting in different languages.

FIG. 3 depicts schematically a computing device according to a particular embodiment of the present disclosure.

FIG. 4 represents text handwriting input on a computing device.

FIG. 5 is a block diagram representing schematically modules implemented by the computing device of FIG. 3, according to a particular embodiment of the present disclosure.

FIG. 6 is of flow diagram representing schematically steps of a method according to a particular embodiment of the present disclosure.

FIGS. 7 and 8 represents schematically the step of slicing text handwriting, according to particular embodiments of the present disclosure.

FIG. 9 represent a first set of ordered lists of vectors generated during a text line extraction, according to a particular embodiment of the present disclosure.

FIG. 10 represents schematically a stroke with some geometric descriptors thereof, according to a particular embodiment of the present disclosure.

FIG. 11 represent a second set of ordered lists of vectors generated during a text line extraction, according to a particular embodiment of the present disclosure.

FIG. 12 is a block diagram representing schematically steps performed during a text line extraction, according to a particular embodiment of the present disclosure.

FIG. 13 represents text lines identified during a text line extraction, according to a particular embodiment of the present disclosure.

FIG. 14 is a block diagram representing schematically steps performed during a text line extraction, according to a particular embodiment of the present disclosure.

FIG. 15A is a flow diagram representing schematically steps of a method according to a first example of the present disclosure.

FIG. 15B is a flow diagram representing schematically steps of a method according to a second example of the present disclosure.

FIG. 16 represents schematically line hypotheses which are generated during a text line extraction, according to a particular embodiment of the present disclosure.

FIGS. 17A-17D illustrate schematically how the present disclosure can limit vertical chaos ordering in accordance with a particular embodiment.

FIGS. 18A-18B illustrate schematically how the present disclosure can bring variability of stroke context in accordance with a particular embodiment.

FIG. 19 is a flow diagram representing schematically steps of a method according to particular embodiments of the present disclosure.

FIG. 20 is a flow diagram representing schematically steps of a method according to particular embodiments of the present disclosure.

FIGS. 21A-21E represent schematically text block hypotheses which are generated during a text block extraction, according to a particular embodiment of the present disclosure.

FIG. 22 is a flow diagram representing schematically steps of a method according to particular embodiments of the present disclosure.

FIGS. 23A-23C show an example of handwriting input recognition processed according to the method described in FIG. 22, according to a particular embodiment of the present disclosure.

FIGS. 24A-24J show an example of handwriting input recognition of a text block processed according to the method described in FIG. 22, according to a particular embodiment of the present disclosure.

The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the present disclosure. For simplicity and clarity of illustration, the same reference signs will be used throughout the figures to refer to the same or analogous parts, unless indicated otherwise.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. However, it should be apparent to those skilled in the art that the present teachings may be practiced without such details. In other instances, well known method, procedures, and/or components are described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings.

The following description of the exemplary embodiments refers to the accompanying drawings. The following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims. In various embodiments as illustrated in the figures, a computing device, a corresponding method and a corresponding computer program are discussed.

The use of the term “text” in the present description is understood as encompassing all characters (e.g. alphanumeric characters or the like), and strings thereof, in any written language and any symbols used in written text.

The term “non-text” in the present description is understood as encompassing freeform handwritten or hand-drawn content (e.g. shapes, drawings, etc.) and image data, as well as characters, and string thereof, or symbols which are used in non-text contexts. Non-text content defines graphic or geometric formations in linear or non-linear configurations, including containers, drawings, common shapes (e.g. arrows, blocks, etc.) or the like. In diagrams for instance, text content may be contained in a shape (a rectangle, ellipse, oval shape . . . ) called containers.

Furthermore, the examples shown in these drawings are in a left-to-right written language context, and therefore any reference to positions can be adapted for written languages having different directional formats.

The various technologies described herein generally relate to processing handwritten text content on portable and non-portable computing devices, more particularly for the purpose of text line extraction. The systems and methods described herein may utilise recognition of user's natural handwriting styles input to a computing device via an input surface, such as a touch sensitive screen (as discussed later). Whilst the various embodiments are described with respect to recognition of digital ink handwriting input using so-called online recognition techniques, it is understood that other forms of input for recognition may be applied, such as offline recognition involving a remote device or server to perform recognition.

The terms “hand-drawing” and “handwriting” are used interchangeably herein to define the creating of digital contents (handwriting input) by users through use of their hands (or fingers) or an input device (hand-held stylus or digital pen, mouse . . . ) on or with an input surface. The term “hand” or the like is used herein to provide concise description of the input techniques, however the use of other parts of a user's body for similar input is included in this definition, such as foot, mouth and eye.

As described in more details below, an aspect of the present invention implies detecting strokes of digital ink and performing text line extraction to extract text lines from the detected strokes. These strokes may be displayed in a display area. The text line extraction involves slicing the digital strokes into strips (or slices, or bands), ordering for each strip the strokes into ordered lists which form collectively a first set of ordered lists, forming for each strip a second set of ordered lists by filtering out from the ordered lists of the first set strokes which are below a given size threshold, and performing a neural net analysis based on said first and second sets to determine for each stroke a respective text line to which it belongs.

FIG. 3 shows a block diagram of a computing device 100 according to a particular embodiment of the present invention. The computing device (or digital device) 100 may be a computer desktop, laptop computer, tablet computer, e-book reader, mobile phone, smartphone, wearable computer, digital watch, interactive whiteboard, global positioning system (GPS) unit, enterprise digital assistant (EDA), personal digital assistant (PDA), game console, or the like. The computing device 100 includes components of at least one processing elements, some form of memory and input and output (I/O) devices. The components communicate with each other through inputs and outputs, such as connectors, lines, buses, links networks, or others known to the skilled person.

More specifically, the computing device 100 comprises an input surface 104 for handwriting (or hand-drawing) text content, or possibly mixt content (text and non-text), as described further below. More particularly, the input surface 104 is suitable to detect a plurality of input strokes of digital ink entered on (or using) said input surface. As also discussed further below, these input strokes may be input in a free handwriting format (or in a free handwriting mode), that is, without any handwriting constraint of position, size and orientation in an input area.

The input surface 104 may employ technology such as resistive, surface acoustic wave, capacitive, infrared grid, infrared acrylic projection, optical imaging, dispersive signal technology, acoustic pulse recognition, or any other appropriate technology as known to the skilled person to receive user input in the form of a touch- or proximity-sensitive surface. The input surface 104 may be a non-touch sensitive surface which is monitored by a position detection system.

The computing device 100 also comprises at least one display unit (or display device) 102 for outputting data from the computing device such as text content. The display unit 102 may be a screen or the like of any appropriate technology (LCD, plasma . . . ). The display unit 102 is suitable to display strokes of digital ink input by a user.

The input surface 104 may be co-located with the display unit 102 or remotely connected thereto. In a particular example, the display unit 102 and the input surface 104 are parts of a touchscreen.

As depicted in FIG. 3, the computing device 100 further comprises a processor 106 and a memory 108. The computing device 100 may also comprise one or more volatile storing elements (RAM) as part of the memory 108 or separate thereof.

The processor 106 is a hardware device for executing software, particularly software stored in the memory 108. The processor 108 can be any custom made or general purpose processor, a central processing unit (CPU), a semiconductor based microprocessor (in the form of microchip or chipset), a microcontroller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, or any combination thereof, and more generally any appropriate processor component designed for executing software instructions as known to the skilled person.

The memory 108 is a non-transitory (or non-volatile) computer readable medium (or recording medium) in accordance with a particular embodiment of the disclosure. The memory 108 may include any combination of non-volatile storing elements (e.g. ROM, EPROM, flash PROM, EEPROM, hard drive, magnetic or optical tape, memory registers, CD-ROM, WORM, DVD, or the like).

The memory 108 may be remote from the computing device 100, such as at a server or cloud-based system, which is remotely accessible by the computing device 100. The non-volatile memory 108 is coupled to the processor 106, so that the processor 106 is capable of reading information from and writing information to the memory 108. As an alternative, the memory 108 is integral to the computing device 100.

The memory 108 includes an operating system (OS) 110 and a handwriting application (or computer program) 112. The operating system 110 controls the execution of the application 112. The application 112 constitutes (or comprises) a computer program (or computer-readable program code) according to a particular embodiment of the invention, this computer program comprising instructions to implement a method according to a particular embodiment of the invention.

In the present embodiment, the application 112 includes instructions for detecting and managing strokes of digital ink handwritten by a user using the input surface 104 of the computing device 100, as discussed further below.

The application 112 may comprise a handwriting recognition (HWR) module (or HWR system) 114 for recognizing text handwriting input to the computing device 100. The HWR 114 may be a source program, an executable program (object code), script, application, or any other component having a set of instructions to be performed. In the present example depicted in FIG. 3, the application 112 and the HWR module 114 are combined in a single application (the HWR module 114 is part of the application 112). Alternatively, the HWR module 114 may be a module, method or system for communicating with a handwriting recognition system remote from the computing device 100, such as a server (or cloud-based system) SVI as depicted in FIG. 3 which is remotely accessible by the computing device 100 through an appropriate communication link. The application 112 and the HWR module 114 may also be separate components stored in the memory 108 (or in different memories) of the computing device 100, whereby the application 112 and the HWR module 114 operate together accessing information processed and stored in the memory 108.

A user may enter an input stroke with a hand or finger, or with some input instrument such as a digital pen or stylus suitable for use with the input surface 104. The user may also enter an input stroke by making a gesture above the input surface 104 if means configured to sense motions in the vicinity of the input surface 104 is being used, or with a peripheral device of the computing device 100, such as a mouse or a joystick or the like.

Each ink input element (letters, symbols, words etc.) is formed by one or a plurality of input strokes or at least by a portion of a stroke. A stroke (or input stroke) is characterized by at least a stroke initiation location (corresponding to a “pen down” event), a stroke terminal location (corresponding to a “pen up” event), and the path connecting the stroke initiation and the stroke terminal locations. Because different users may naturally write or hand-draw a same object (e.g. letter, shape, symbol . . . ) with slight variations, the HWR module 114 accommodates a variety of ways in which each object may be entered whilst being still recognized as the correct or intended object.

The handwriting application 112 allows generating handwritten or hand-drawn text content in digital ink form and have this content faithfully recognized using the HWR module 114. In particular cases, the application 112 may be configured to detect and recognize text content based on mixed content which contains text and non-text content (e.g., diagrams, charts, etc.).

The nature and implementation of the recognition process performed by the HRW module 114 may vary depending on each case. Text recognition may be performed either fully locally on the computing device 100 or at least partially remotely using for instance the remote server SVI (FIG. 3). An example of implementing handwriting recognition can for instance be found in US Patent Application No. 2017/0109578 A1. In particular, as is known to the skilled person, text recognition may be performed based on any one of language model(s) (e.g., grammar, semantics), linguistic information including text-based lexicon(s) (regular expressions, etc.) or the like, and statistical information modelling for how frequent a given sequence of elements appears in the specified language or is used by a specific user.

In the present embodiment, the computing device 100 is configured to detect and display text handwriting which is input using the input surface 104 in a free handwriting format (or free handwriting mode), that is, without any handwriting constraint of position, size and orientation of the text handwriting input. The free handwriting mode allows a user to handwrite input elements in a free environment (e.g. in a blank zone) in an unstructured or unguided fashion, that is, without any handwriting constraint of position, size and orientation of the text handwriting input (no line pattern to follow, no limitation of size or orientation, no constraint of interline, margin or the like, etc.). This free handwriting format affords complete freedom to the user during handwriting input, which is sometimes desirable for instance to take quick and miscellaneous notes or make mixed input of text and non-text.

As shown in FIG. 4, the display unit 102 of the computing device 100 is configured to display, in a display area (or input area) 200, text handwriting IN formed by a plurality of strokes (or input strokes) ST of digital ink. In the examples described hereafter, it is assumed that the detected strokes ST are input along (or substantially along) a same handwriting orientation X (e.g. the horizontal orientation in the present case). Variations of handwriting orientations, e.g. deviations from an intended orientation within the same line, may however be possible in some cases. Text handwriting IN may of course take many different forms and styles, depending on each case. It will be assumed in the following examples that the handwritten characters corresponding to the phrase “Vertical ordering can bring chaos” is detected and displayed as text handwriting input in the display area 200, although numerous other types and content of text handwriting are possible, notably in terms of language, style etc.

It the following examples, it is further assumed that the text handwriting IN is input in the free handwriting mode (or format) as described above.

As shown in FIG. 5 according to a particular embodiment, when running the application 112 stored in the memory 108 (FIG. 3), the processor 106 implements a line extraction unit (also called line extractor) MD2 comprising a number of processing modules, that is: a slicing module MD4, an ordering module MD6, a filtering module MD8, a neural net analysis module MD10, a selecting module MD12 and a line definition module MD14.

The application 112 comprises instructions configuring the processor 106 to implement these modules in order to perform steps of a method of the invention, as described later in particular embodiments. The line extraction unit MD2 is suitable to define text lines LN such that each input stroke ST detected by the computing device 100 is associated with a respective text line LN.

More particularly, the slicing module MD4 is configured to slice a display area (i.e. display area 200 as shown in FIG. 4) into strips (also called slices or bands) SP extending transversally to the handwriting orientation X. This slicing may be performed such that adjacent strips SP partially overlap with each other so that each stroke ST is contained in at least two adjacent strips SP.

The ordering module MD6 is configured to order, for each strip SP, the strokes ST at least partially contained in said strip SP to generate a first timely-ordered list of strokes arranged in a temporal order and at least one first spatially-ordered list of strokes ordered according to at least one respective spatial criterion, thereby forming a first set SLa of ordered lists. As discussed further below, various spatial criteria may be used to generate one or more first spatially-ordered list of strokes.

The forming module MD8 is configured to form, for each strip SP, a second set SLb of ordered lists comprising a second timely-ordered list of strokes and at least one second spatially-ordered list of strokes by filtering out strokes ST below a size threshold from respectively the first timely-ordered list and from the at least one first spatially-ordered list of the first set SLa.

The neural net module MD10 is configured to perform a neural net analysis to determine as a decision class, for each pair of consecutive strokes ST in each ordered list of said first set SLa and second set SLb, whether the strokes ST of said pair belong to a same text line, in association with a probability score for the decision class.

The selecting module MD12 is configured to select, for each pair of consecutive strokes ST included in at least one ordered list of said first set SLa and second set SLb, the decision class determined with the highest probability score during the neural net analysis.

The line definition module MD14 is configured to define text lines LN by combining strokes ST into line hypotheses based on the decision class with highest probability score selected for each pair of consecutive strokes.

The selecting module MD12 and the line definition module MD14 may form part of a decoder (or decoding module) implemented by the process 106 when running the application 12. A decoder is an algorithm that aims to translate an input information into a different output one. In the present context, the decoder (MD12, MD14) may use the local information that a pair of strokes belongs to a same text line with a probability P to construct gradually line hypotheses, as further described below. The decoding process may define these probabilities P as local rules to construct the line hypotheses and a decision process (combining locally a set of probabilities P) to control the validity of line hypothesis construction rules. After combining all the local probabilities, the final line hypotheses are the final text lines.

The configuration and operation of the modules MD4-MD14 of the computing device 100 will be more apparent in the particular embodiments described hereinbelow with reference to the figures. It is to be understood that the modules MD4-MD14 as shown in FIG. 5 represent only an example embodiment of the present invention, other implementations being possible.

For each step of the method of the present invention, the computing device 100 may comprise a corresponding module configured to perform said step.

A method implemented by the computing device 100 illustrated in FIGS. 3-5 is now described with reference to FIGS. 6-17, in accordance with particular embodiments of the present invention. More specifically, the computing device 100 implements this method by executing the application 112 stored in the memory 108.

An example scenario is considered where a user inputs handwriting text IN as shown in FIG. 4 on the computing device 100. Processing is then performed by the computing device 100, including line extraction as described below.

More specifically, in a detecting step S2, the computing device 100 detects text handwriting IN input by a user using the input surface 104 of the computing device 100. As shown in FIG. 4, the handwriting input IN comprises a plurality of input strokes ST of digital ink which are input along (or substantially along) a handwriting orientation X using the input surface 104. As already indicated, each input stroke ST is characterized by at least a stroke initiation location, a stroke terminal location and the path connecting the stroke initiation and the stroke terminal locations. Accordingly, the dot positioned for instance at the top of the character “i” (in the word “Vertical”) constitutes a single stroke ST by itself.

In the present example, the handwriting digital ink IN is input in an input area 200 of the display 102, according to the free handwriting format as previously described. Without any handwriting constraint of lines, size, orientation or the like to comply with, the user is allowed to handwrite text content IN in a free and easy manner. The size, orientation and position of each handwritten character or each handwritten word may vary arbitrarily depending on the user's preferences.

As shown in FIG. 4, the computing device 100 displays (S4, FIG. 6) the plurality of input strokes ST of the handwriting input IN on the display unit 102 in accordance with the free handwriting format (or mode).

The computing device 100 then performs (S10, FIG. 6) a text line extraction to extract text lines from the strokes ST detected in the text handwriting IN. As shown in FIG. 6, the text line extraction S10 comprises the steps S12-S24 as described further below in the present example.

For a matter of simplicity, it is assumed in the present example that the entire handwriting input IN detected by the computing device 100 is text. In other cases, handwriting input IN may however comprise text and non-text content. A disambiguation process may thus be performed during text recognition by a classifier according to any suitable technique known to the skilled person to distinguish text from non-text content.

More specifically, in a slicing step S12 (FIG. 7), the computing device 100 slices the display area 200 into strips SP extending transversally to the handwriting orientation X. The slicing S12 is carried out such that adjacent strips SP partially overlap with each other, causing each stroke ST to be contained in at least two adjacent strips SP. As can be seen, many configurations of the strips SP may be adopted by the skilled person. Some implementations of the slicing S12 is provided herebelow as mere examples.

In the example depicted in FIG. 7, the slicing S12 is performed such that the strips SP extend along a same strip orientation Y. As a result, the strips SP are parallel to each other. The orientation Y may be perpendicular to the handwriting orientation X (e.g. X is horizontal and Y is vertical) as shown in FIG. 7, although other configurations are possible.

The computing device 100 may thus assign each stroke ST of the text handwriting IN to at least two respective adjacent strips SP in which said stroke is at least partially contained.

As further discussed below, the slicing S12 facilitates the forthcoming neural net analysis and allows achieving an efficient text line extraction by taking decisions in different context for a same stroke ST.

FIG. 8 shows a particular embodiment where strips SP1-SP4 are defined during the slicing step S12. For a matter of simplicity, only the first stroke ST corresponding to the first character “V” is shown. Each strip SP extends in the Y orientation, perpendicular to the handwriting orientation X. Each strip SP is formed with a respective width WD1-WD4 (referred to collectively as WD) in the X orientation. In the present example, the width WD of each strip SP is identical, although other implementations are possible. In particular, embodiments are possible where the width WD is not the same for all the strips SP.

As can be seen in FIG. 8, the strips SP partially overlap with each other such that the input stroke ST forming the first character “V” is contained at least partially in the strips SP1, SP2 and SP3. In other words, this input stroke corresponding to “V” belongs to the adjacent strips SP1, SP2 and SP3.

As discussed further below, the slicing S12 may be configured based on the scale or size of the input strokes ST of the text handwriting IN. As used herein, the term “scale” refers to an approximation of the average size or height of characters, of input strokes or of parts of input strokes. The skilled person may also adapt the proportion of overlap between each pair of adjacent strips SP to achieve a desired result in the text line extraction process. By increasing the strip overlap, results of the text line extraction process may be improved, but at a higher cost in terms of resources and time.

The computing device 100 then orders or sorts (S14, FIGS. 6 and 12), for each strip SP, the strokes ST at least partially contained in said strip SP to generate a first timely-ordered list of strokes ST arranged in a temporal order and at least one first spatially-ordered list of strokes ST ordered according to at least one respective spatial criterion, thereby forming a first set SLa of ordered lists. As discussed below, the number and type of spatial criteria used, and thus the content of the first set SLa of ordered lists, may vary depending on each case.

As shown in FIG. 9, it is considered in the present example that, in the ordering step S14, the computing device 100 orders for each strip SP the strokes ST at least partially contained in said strip SP to generate a first timely-ordered list L1a of strokes ST arranged in a temporal order (referred to as TO) and 3 first spatially-ordered lists L2a, L3a and L4a of strokes ST ordered each according to a respective spatial criterion CR, thereby forming a first set SLa of ordered lists. As a result, a first set SLa of 4 ordered lists (L1a, L2a, L3a and L4a) is generated for each strip SP previously defined in the slicing step S12, as further described below.

The first timely-ordered list L1a comprises each stroke ST of a respective strip SP, these strokes being ordered according to their relative temporal order TO. In other words, in this list L1a, the strokes ST are arranged in a temporal sequence which is function of the time at which each stroke ST1 has been input over time.

The spatial criteria CR that may be used in the ordering step S14 (FIG. 6) to generate the first spatially-ordered lists for each strip SP are illustrated with reference to FIG. 10 which shows, by way of an example, the stroke ST corresponding to the first character “V” of the text handwriting IN detected in S2.

The first spatially-ordered lists L2a is a list of strokes ST of a respective strip SP, said strokes being ordered according to the position of their respective barycentre BY along the strip orientation Y (spatial criterion CR1). As illustrated for instance in FIG. 10, the barycentre BY of the stroke ST corresponding to the first character “V” is determined. The position along the strip orientation Y of the barycentre BY is defined by the coordinate BYy along the Y axis. For each strip SP, the coordinate BYy of the barycentre BY of each associated stroke ST is taken into account to order the strokes ST of said strip. The same operation is performed for each strip SP to generate a respective first spatially-ordered list L2a. The ordered list L2a may for instance list the strokes ST2 in an increasing (or decreasing) order of their respective position in the strip orientation Y (spatial criterion CR1).

As also illustrated in FIG. 10, other spatial criteria CR2 and CR3 based on the position along the strip orientation Y of some specific points of each stroke ST may be used for generating the first spatially-ordered lists L3a and L4a.

The spatially-ordered list L3a is a list of strokes ST of a respective strip SP, the strokes being ordered according to their outermost coordinate PTly in a first direction D1 along the strip orientation Y (spatial criterion CR2). In other words, in the list L3a, the outermost point PT1 of each stroke ST in the first direction D1 along the strip orientation Y is determined and the coordinate PTly of this outermost point PT1 on the Y axis is determined and used to generate the spatially-ordered list L3a.

The spatially-ordered list L4a is a list of strokes ST of a respective strip SP, the strokes being ordered according to their outermost coordinate PT2y in a second direction D2, opposite the first direction D1, along the strip orientation Y (spatial criterion CR3). In other words, in the list L4a, the outermost point PT2 of each stroke ST in the second direction D2 along the strip orientation Y is determined and the coordinate PT2y of this outermost point PT2 on the Y axis is determined and used to generate the spatially-ordered list L4a.

As indicated above, the computing device 100 generates in the present example the 3 first spatially-ordered lists L2a, L3a and L4a as described above, along with the first timely-ordered list L1a, in the ordering step S14. However, in the ordering step S14, the computing device 100 may generate any one of the first spatially-ordered lists L2a, L3a and L4a as described above, or a combination thereof (e.g. only L2a, or only L3a and L4), along with the first timely-ordered list L1a. It has been observed that high performances of the text line extraction process is achieved when a temporal order TO and at least one spatial criterion CR are used to generate respective ordered lists of strokes.

As discussed further below, by generating different orders of strokes for each strip, the problem of text line definition can be efficiently analyzed and broken down through different points of view, using different complementary criteria (temporal and spatial), to find for various pairs of strokes the best decision in the text line extraction process. Combining a temporal criterion TO with at least one spatial criterion CR allows improving significantly the performances of the text line extraction.

Once the ordering step S14 is completed, the computing device 100 forms (S16, FIGS. 6 and 12), for each strip SP, a second set SLb of ordered lists comprising a second timely-ordered list of strokes and at least one second spatially-ordered list of strokes by filtering out strokes ST below a size threshold from respectively the first timely-ordered list L1a and from each first spatially-ordered list generated in the ordering step S14.

As already described above, it is considered in the present example that the first timely-ordered list L1a and the first spatially-ordered lists L2a, L3a and L4a are generated in the ordering step S14. As a result, as shown in FIG. 11, the computing device 100 forms (S16) for each strip SP a second set SLb of ordered lists comprising a second timely-ordered list Lib of strokes and 3 second spatially-ordered lists L2b, L3b and L4b of strokes by filtering out strokes ST below a size threshold from respectively the first timely-ordered list L1a and from the first spatially-ordered lists L2a, L3a and L4a in S14.

In a particular embodiment illustrated in FIG. 10, during the forming (or filtering) step S16 (FIG. 6), the computing device 100 performs the following, for each strip SP defined in S12:

    • evaluating a first size of each stroke ST of said strip SP based on a height (or maximum distance) H in the strip orientation Y of said stroke and a second size of each stroke ST of said strip SP based on the length LG of said stroke ST; and
    • removing, from the first timely-ordered list L1a and from the at least one first spatially-ordered list generated in S14 (i.e. the spatially-ordered lists L2a-L4a in the present example), each stroke ST when either its first or second size is below a size threshold, thereby generating respectively the second timely-ordered list Llb and at least one second spatially-ordered list (i.e. the spatially-ordered lists L2b-L4b in the present example).

In other words, each stroke ST is excluded from the second timely-ordered list Lib and from the second spatially-ordered lists L2b-L4b if at least one of its respective first size and second size does not reach a size threshold.

As shown in FIG. 11, in the forming step S16, a plurality of strokes ST (or at least one assuming at least one stroke does not reach the aforementioned size threshold) are thus removed from the ordered lists of the first set SLa to obtain the ordered-lists of the second set SLb. The strokes ST which are filtered out from the ordered lists L1a-L4a of the first set SLa are selected based on their respective size: each stroke below a predefined size threshold is removed. In other words, all the strokes which do not meet a predetermined condition of size (defined by a maximum size threshold relative to the scale) is discarded from the ordered lists of the first set SLa to obtain the second set SLb.

In a particular example, the computing device 100 evaluates only one of the first size and second size to decide which strokes ST should be filtered out from the first set SLa in the forming step S16.

This step S16 of filtering out is designed to remove all the relatively small strokes from the ordered lists of the first set SLa, such as diacritics, punctuation marks, apostrophes, etc. which may cause problems or errors in the process of text line identification. A diacritic (also diacritical sign or accent) is a glyph (sign, mark, etc.) added or attached to a letter or character to distinguish it from another of similar form, to give it a particular phonetic value, to indicate stress, etc. (as a cedilla, tilde, circumflex, or macron). By generating a second set SLb of ordered lists devoid of such relatively small strokes ST, the performances of the text line extraction process can be improved. As already indicated, it can be difficult to determine to which text line belong the relatively small strokes corresponding to diacritics, punctuation marks or the like. By using this second set SLb in combination with the first set SLa, reliable decision can be made during text line extraction regarding these small strokes.

As shown in FIGS. 9 and 11, each or the ordered lists L1a-L4a and L1b-L4b of the first and second sets SLa, SLb comprises a sequence of strokes ST which form pairs PR of consecutive strokes (also referred to hereafter as pairs PR). A given pair PR can be defined as a duplet (STa, STb) of two consecutive strokes-referred to more specifically as STa and STb—in one of the ordered lists of the first and second sets SLa, SLb (FIG. 6). A same pair PR may be present in more than one ordered list within the sets SLa, SLb.

Once the ordering step S14 and the forming step S16 are completed, the computing device 100 performs, for each strip SP, a neural net analysis S18 (also called inter-stroke analysis) to determine as a decision class CL, for each pair PR of consecutive strokes ST in each ordered list of said first set SLa and second set SLb of said strip SP, whether the two strokes ST of said pair PR belong to a same text line LN, in association with a probability score P (FIGS. 6 and 12). As a result of the neural net analysis S18, the computing device 100 may thus form, for each ordered list of the first and second sets SLa, SLb a respective so-called probability list (or probability sequence) PL of duplets (CL, P) assigned to respective pairs PR of consecutive strokes ST, namely:

    • a probability list PLla of duplets (CL, P) determined for each pair PR of consecutive strokes in the temporally-ordered list L1a;
    • a probability list PL2a of duplets (CL, P) determined for each pair PR of consecutive strokes in the spatially-ordered list L2a;
    • a probability list PL3a of duplets (CL, P) determined for each pair PR of consecutive strokes in the spatially-ordered list L3a;
    • a probability list PL4a of duplets (CL, P) determined for each pair PR of consecutive strokes in the spatially-ordered list L4a;
    • a probability list PL1b of duplets (CL, P) determined for each pair PR of consecutive strokes in the temporally-ordered list L1b;
    • a probability list PL2b of duplets (CL, P) determined for each pair PR of consecutive strokes in the spatially-ordered list L2b;
    • a probability list PL3b of duplets (CL, P) determined for each pair PR of consecutive strokes in the spatially-ordered list L3b; and
    • a probability list PL4b of duplets (CL, P) determined for each pair PR of consecutive strokes in the spatially-ordered list LAb.

In other words, in the neural net analysis S18, a first set PLa of probability lists (PLla-PL4a) is derived from the first set SLa of strokes and a second set PLb (PL1b-PL4b) is derived from the second set SLb of strokes. This neural net analysis S18 is performed (S18) for each strip previously identified in the slicing step S12. As a result, a first set PLa of probability lists and a second set PLb of probability lists are formed in an analogous manner for each strip SP.

In the present example, the decision class CL thus represents a result as to whether the two strokes ST of a pair PR of consecutive strokes in one of the ordered lists in the first and second sets S1a, SLb belong to a same text line LN. The decision class CL for a pair PR may for instance be assigned either a first value (e.g. “same line”) meaning that the two strokes of said pair PR are considered to be in a same text line LN, or a second value (e.g. “break line”) meaning that the two strokes of said pair PR are considered to be in different text lines LN.

The probability score P (also called inter-stroke probability) represents the probability or level of confidence that the associated result CL is correct (i.e. that CL represents the correct result for said pair PR). Accordingly, a decision class CL in association with a probability score P are produced in the neural net analysis S18 for each pair PR of consecutive strokes ST in each of the ordered lists L1a-L4a (set SLa) and L1b-L4b (set SLb) obtained in S14 and S16, respectively. As a result, a list or sequence of duplets (CL, P) corresponding to each pair PR of consecutive strokes ST is generated (S18) for each ordered list of the first and second sets SLa, SLb (FIG. 12). As already indicated, this process S18 of generating lists of duplets (CL, P) is repeated for each strip SP.

In the present example, the neural net analysis S18 is performed by one or more artificial neural nets (ANNs), also called neural nets. Neural nets (or neural networks) are well known to the skilled person and will therefore not be described in detailed in the present disclosure.

In each of the first and second sets SLa, SLb of ordered lists, the timely-ordered list L1a (respectively L1b) may be analyzed by a first specialized neural net and each spatially-ordered list L2a-L4a (respectively L2b-L4b) may be analyzed by a distinct, second specialized neural network. The first neural net may be dedicated to temporally-ordered lists while the second neural net may be dedicated to spatially-ordered lists. Each specialized neural net may comprise two sub-neural nets which process in parallel the respective ordered-lists starting from the two ends, respectively.

In a particular embodiment, the neural net analysis S18 (FIGS. 6 and 12) comprises:

    • computing, by at least one artificial neural net, probability scores P representing the probability that the strokes ST, in each pair PR of consecutive strokes ST included in the ordered lists of the first and second sets SLa, SLb, belong to a same text line LN; and
    • determining, as a decision class CL for each pair PR of consecutive strokes, that the strokes ST of said pair PR belong to a same text line LN if the probability score P reaches at least a probability threshold.

The neural net analysis S18 may be based on feature extractions which are performed to characterize each pair PR according to various criteria, including temporal and spatial criteria. For instance, the computing device 100 may use at least one of a temporal criterion and a spatial criterion, or a combination thereof. More particularly, the features extractions performed in S18 may be based on a temporal order in which the two strokes of each pair PR of consecutive strokes in the ordered lists of the sets SLa, SLb have been input and/or based on the inter-stroke space (or inter-stroke distance) between the two strokes ST in each pair PR of consecutive strokes in the ordered lists of the sets SLa, SLb. Various implementations of feature extractions may be contemplated to achieve the neural net analysis S18.

During the neural net analysis S18, metric values may be computed (e.g., barycentre distances, global shapes, stroke size and area, length, main orientation) used to compute the decision class CL and associated probability score P for each pair PR. Before being used, these metric values may be normalized based on various local (e.g., stroke size) and/or global (e.g., strip width) criteria.

In a particular embodiment, in the neural net analysis S18, the one or more artificial neural net analyze sequentially each pair PR of consecutive strokes ST in each ordered list of said first and second sets SLa, SLb to determine the respective decision class CL and probability score P, based on spatial and temporal information related to the strokes ST contained in the ordered list of said pair PR.

In a selection step S20 (FIGS. 6 and 12), the computing device 100 then selects, for each individual pair PR of consecutive strokes ST included in at least one ordered list of said first and second sets SLa, SLb generated (S14, S16) for all the strips SP, the decision class CL determined with the highest probability score P for said pair PR during the neural net analysis S18. In the present example, this selection S20 is thus made based on the probability lists PL generated in the neural net analysis S18 for all the strips SP. For instance, if a specific pair PR of consecutive strokes (STa, STb) occurs only once overall within the probability lists PL obtained in S18, then the associated decision class CL obtained in S18 for this pair PR is selected (S20). If, however, a specific pair PR of consecutive strokes (STa, STb) has a plurality of occurrences within the probability lists PL obtained in S18 for all the strips SP, then the decision class CL with the highest probability score P is selected (S20) for said pair PR from the probability lists PL.

The computing device 100 may thus compare the decision classes CL obtained for a same pair PR of consecutive strokes ST using different ordering criteria (temporal order TO and spatial criteria CR) during the ordering step S14, either from SLa or from SLb, and may only retain the best decision class CL having the highest probability score P, namely the decision class CL which is the most likely to represent the correct result for the pair PR. In particular, the computing device 100 may compare the probability score P obtained for a same pair PR present in at least two different strips SP to determine the decision class CL obtained with the highest probability score. By selecting only the best decision class CL among various probability lists obtained based on different (temporal and spatial) criteria, efficient text line extraction can be achieved.

Various implementations are possible to perform the selection S20 of the decision classes CL of highest probability score P. In the present example depicted in FIG. 12, in the selecting step S20, the computing device 100 compiles into a probability matrix PM the selected decision class CL, in association with the respective probability score P, for each pair PR of consecutive strokes ST included (or present) in at least one ordered list of the first and second sets SLa, SLb generated in S14 and S16 for all the strips SP. This global probability matrix PM is thus common for all the strips SP. This means that the entries of the probability matrix PM define duplets (CL, P) representing each pair PR of consecutive strokes ST having at least one occurrence in the ordered lists L1a-L4a and L1b-L4b produced for all the strips SP.

In a particular example, the probability matrix PM may contain more generally an entry (identified by an index) for each possible pair of strokes in a given strip SP (including pairs of strokes which are not adjacent strokes in at least one of the ordered lists of the first and second sets SLa, SLb). In this case, each entry of the probability matrix PM may remain at (CL=0, P=0) if they correspond to a pair of strokes which has no occurrence as a pair PR of consecutive strokes in at least one of the ordered lists L1a-L4a and L1b-L4b generated for all the strips SP.

After the selecting step S20, the computing device 100 defines (S22, FIGS. 6 and 12) text lines LN by combining strokes ST into line hypotheses LH based on the decision class CL with highest probability score P selected in the selection step S20 for each pair PR of consecutive strokes ST present in at least one of the ordered lists L1a-L4a and L1b-L4b.

As shown in FIG. 13, the computing device 100 determines in S22 the respective text line LN to which each stroke ST detected in S2 belongs. In the present example, two text lines LN are recognized, i.e. the text lines LN1 and LN2 corresponding respectively to the phrases “Vertical ordering” and “can bring chaos”. These text lines LN1, LN2 correspond to two distinct line hypotheses LH obtained during the text line definition step S22.

Various implementations can be contemplated to define the line hypotheses LH (S22). In a particular embodiment described herebelow, the text line definition step S22 comprises a transformation step S22a and a line hypothesis analysis S22b, as described below.

More particularly, during the text line definition step S22, the computing device 100 may transform (S22a, FIG. 12) the probability matrix PM generated in S20 into a vector list LT of entries defining (or including) the decision class CL and associated probability score P (i.e. a duplet (CL, P)) for respectively each pair PR of consecutive strokes ST included in said probability matrix PM. As already indicated, each duplet (CL, P) included in the probability matrix PM corresponds to the decision class CL of the highest probability score P that was obtained for a particular pair PR of consecutive strokes during the neural net analysis S18 of all the strips SP.

The vector list LT may be arranged according to an order of decreasing values of the probability scores P of each pair PR. In a particular example, only entries of the probability matrix PM corresponding to pairs PR of consecutive strokes ST which have at least one occurrence in the first and second sets SLa, SLb of all the strips are retained into the vector list LT. In this case, any other entry of the probability matrix PM (e.g. entry with the values (CL=0, P=0)) corresponding to a pair of strokes which are not adjacent in any of the ordered lists L1a-L4a and L1b-L4b generated for each strip SP are not included into the vector list LT.

Still during the text line definition step S22, the computing device 100 may perform a line hypothesis analysis S22b (FIG. 12) to determine sequentially for each pair PR of consecutive strokes ST in the vector list LT, from the highest to lowest associated probability score P, a respective line hypothesis LH assigned to each stroke ST of said pair PR. Each line hypothesis LH constitutes a group of at least one stroke ST of a same text line LN. Each line hypothesis LH which is finally obtained, once all strokes ST of the vector list LT have been assigned (S22) to a respective line hypothesis LH, defines a respective text line LN as a result of the text line extraction S10 (FIG. 6). In other words, once the strokes ST present in all pairs PR of consecutive strokes ST in the first and second sets SLa, SLb generated for all the strips SP have been assigned to a respective line hypothesis LH, the resulting line hypotheses constitute text lines LN which collectively form the text handwriting IN detected in S2.

In a particular example, during the text line definition step S22, the computing device 100 combines the two strokes ST of a pair PR of consecutive strokes ST included in the vector list LT into a same line hypothesis LH corresponding to a same text line LN if the decision class CL previously selected in S20 with the highest probability score P for said pair PR indicates that the two consecutive strokes ST belong to a same text line LN and if the associated probability score P reaches at least (equals to or is higher than) a final threshold TH1. This way, line hypotheses LH can be gradually built (S22b) by deciding sequentially, for each of the two strokes ST of each pair PR in the vector list LT, whether or not the two strokes ST should be assigned to a same line hypothesis LH and by determining the allocated line hypotheses LH based on this decision and on the content of any previously generated line hypothesis LH during this step S22b.

An example is now described below with reference to FIGS. 14-15 to show how the line hypothesis analysis S22b (FIG. 12) may be performed according to a particular embodiment. Other implementations may however be contemplated.

In the present example, the computing device 100 determines (S22b, FIG. 12) sequentially for each pair PR of consecutive strokes ST in the vector list LT, from the highest to lowest associated probability score P, a respective line hypothesis LH assigned to each stroke ST of said pair PR. At the beginning of the line hypothesis analysis S22b, it is considered that each stroke ST constitutes a separate line hypothesis LH, although other implementations are possible. At this initial stage, it is assumed in this example that at least 3 strokes ST1, ST2 and ST3 constitute 3 respective initial line hypotheses LH1, LH2 and LH3. These strokes ST1, ST2 and ST3 have each at least one occurrence in the pairs PR of consecutive strokes ST present in the vector list LT.

It is first assumed that computing device 100 starts analyzing the vector list LT and selects (S23, FIG. 14) a first pair PR-noted PR1—of consecutive strokes (ST1, ST2) having the highest associated probability score P in the vector list LT. The computing device 100 then performs the following steps S24-S28 to determine for this current pair PR1 a respective line hypothesis LH to be assigned to each of the strokes ST1 and ST2 of the pair PR. The strokes ST1 and ST2 may remain in in their separate initial line hypotheses LH1, LH2 or be merged into a global line hypothesis depending on the probability score P associated with said pair (ST1, ST2).

In the present example, the computing device 100 determines (S24, FIG. 14) whether the following condition A) is met for the current pair PR1:

    • A) the decision class CL previously selected in S20 with the highest probability score P for the current pair PR indicates that the two consecutive strokes ST of said pair PR belong to a same text line LN with a probability score P reaching at least a final threshold TH1 (condition A).

In the present case, the condition A) is thus met if the duplet (CL, P) present in the vector list LT for the current pair PR1 indicates that the two consecutive strokes ST1, ST2 of the current pair PR belong to a same text line LN with a probability score P equal or above the final threshold TH1. If the condition A) is met, the method proceeds with step S26 (FIG. 14). Otherwise, the method proceeds with step S25.

In step S25, it is determined that the strokes ST1, ST2 of the current pair PR1 do not belong to the same text line LN and thus remain in their separate line hypotheses LH1, LH2 respectively. In other words, if the condition A) is not met, the existing line hypotheses LH remain unchanged and the method proceeds with step S23 to select a next current pair PR to be processed in the vector list LT.

In the present case, it is assumed for instance that the duplet (CL, P) for the current pair PR1 indicates that the two consecutive strokes ST1 and ST2 belong to a same text line LN with a probability score P of 95%. Assuming that the final threshold TH1 is set at 60% for instance, it is determined (S24) that the probability score P is above the final threshold TH1 and, therefore, the method proceeds with step S26.

In step S26 (FIG. 14), the computing device 100 determines whether the following condition B) is met for the current pair PR1:

    • B) at least one stroke ST of the current pair PR is already in a line hypothesis LH comprising at least two strokes ST (condition B).

In the present case, the condition B) is thus met in step S26 if either stroke ST1 or stroke ST2 (or both) are already in a line hypothesis LH comprising at least two strokes ST. If the condition B) is not met, the method proceeds with the merging step S28, otherwise the method proceeds with a decision process in step S27 to determine whether the merging step S28 should be executed (FIG. 14).

In the present example, it is considered at this stage that the strokes ST1 and ST2 are contained respectively in the distinct line hypotheses LH1 and LH2 which are both line hypotheses of a single stroke ST. Accordingly, the decision process S27 is not necessary and the method proceeds directly with the merging step S28.

In the merging step S28, the computing device 100 determines that the strokes ST1 and ST2 both belong to a same line hypothesis noted LH5 which is obtained by merging the line hypotheses LH1 and LH2 (LH5=ST1, ST2). The method then proceeds with step S23 to select a next current pair PR to be processed in the vector list LT.

The computing device 100 thus goes on with analyzing (steps S23-S28) successively each pair PR of consecutive strokes ST of the vector list LT in a decreasing order of probability score P. Line hypotheses LH are gradually built by assigning the two consecutive strokes ST of each successive pair PR to a respective line hypothesis LH based on the decision class CL and probability score P associated with the pair PR and also based on the line hypotheses LH previously created during the line hypothesis analysis S22b.

In the present example, it is assumed that the computing device 100 now selects (S23) a new, so-called current, pair PR2 of consecutive strokes (ST2, ST3) within the vector list LT, moving still in a decreasing order of probability score P from the previously analyzed pair PR1 (ST1, ST2) in the vector list LT. At this stage, the line hypothesis LH5 contains the strokes ST1 and ST2 while the line hypothesis LH3 contains the single stroke ST3 (FIG. 16).

It is assumed in this example that the computing device 100 detects in step S24 that the condition A) is met for the pair PR2 and thus proceeds with step S26 (FIG. 14). In step S26, the computing device 100 detects that the condition B) is not met (since stroke ST2 is already part of line hypothesis LH5 which contains more than one stroke, i.e. the two strokes ST1 and ST2). As a result, the method now proceeds with the decision process in step S27.

This decision process S27 is configured to determine if two existing line hypotheses (i.e. LH5 and LH3 in this case) should be combined when it is detected that conditions A) and B) are met for a current pair PR of consecutive strokes. Various ways of performing the decision process S27 are possible. Some examples are provided below for illustrative purpose only.

A first example of implementing the decision process-referred to more specifically as S27a in this example—is now described with reference to FIGS. 15A and 16. In this first example, this decision process S27a is based on a computation of line scores LS. More particularly, in the present example, the computing device 100 performs the steps S30, S32, S34 and S36 during the decision process S27a, as described below.

Different implementations of the computing of the line scores LS are possible. As indicated further below, the line score may for instance be calculated using the logarithm of the probability scores (PL) of each pair PR of strokes ST present in a given line hypothesis LH and the logarithm of the inverse probability scores (1−PL=PB) of each pair PR for which only one of the two constitutive strokes ST belongs to the LH.

In step S30, the computing device 100 computes a first line score LS5 of the first line hypothesis LH5 based on the probability scores P of each pair PR (i.e. PR1) of consecutive strokes ST already assigned to the first line hypothesis LH5, this first line score LS5 representing a likelihood that each stroke ST (i.e. ST1 and ST2) of this first line hypothesis LH5 is part of a same text line LN and that this text line LN is defined as complete by said line hypothesis LH5.

In this context, a text line LN is defined as complete by a line hypothesis LH if all the strokes ST that should belong to the same text line LN according to the probability scores P are effectively in said line hypothesis LH. In other words, a line score LS ensures that the probability scores P for each pair PR of consecutive strokes belonging to the same line hypothesis LH are associated with a decision class CL=“same line” and that all other pairs PR involving only one stroke ST belonging to this line hypothesis LH are associated with a class CL=“break line”. In the present example, the line scores LS computed by the computing device 100 are values which represent a likelihood as mentioned above.

In step S32 (FIG. 15A), the computing device 100 computes a second line score LS3 of the second line hypothesis LH3 based on the probability scores P of each pair PR of consecutive strokes ST already assigned to the second line hypothesis LH3, this second line score LS3 representing a likelihood that each stroke ST (i.e. ST3) of this second line hypothesis LH3 are part of a second text line LN. At this stage, the line hypothesis LH3 only contains a single stroke, namely ST3. Although there is no pair of strokes having both strokes STa and STb in the same LH, there are pairs of strokes that involve stroke ST3 with other strokes outside the respective line hypothesis LH. Here the relevant pairs would be (ST3, ST1) and (ST3, ST2). When computing the line score LS3 of the line hypothesis LH3, there are no pairs that contribute to this same line hypothesis (i.e. no calculation of logPL) but there are still pairs that can be used to calculate the “Break Line” (or different line) hypothesis, i.e. the logPb, and more particularly the logPb (ST3, STx) with STx belonging to LH2.

In step S34, the computing device 100 computes a third line score LS6 based on the probability scores P of each pair PR (i.e. PR1, PR2) of consecutive strokes ST assigned to a third line hypothesis LH6 combining each stroke ST of the first and second line hypotheses LH5, LH3, this third line score LS6 representing the likelihood that each stroke of these first and second line hypotheses LH5, LH3 are part of a third text line LN.

In step S36, the computing device 100 determines whether the first and second line hypotheses LH5, LH3 should be merged into this third line hypothesis LH6 based on a comparison of a sum S1 of the first line score LS5 and second line score LS3 (S1=LS5+LS3) with the third line score LS6.

The line scores LS5, LS3 and LS6 represent how well the constitutive strokes ST of each respective line hypothesis LH5, LH3 and LH6 fit together to form collectively a text line LN. The line scores LS5, LS3 and LS6 mentioned above may be calculated in different manners, implementation details being at the discretion of the skilled person. The computing device 100 merges the first and second line hypotheses LH5, LH3 into the third line hypothesis LH6 corresponding to a third text line if it is determined during the decision process S27a that the third line score LS6 exceeds the total S1 of the first and second line scores LS5, LS3 (i.e. if LS6>S1, or in other words, if the ratio LS6/S1>1). To be more accurate, the first and second line hypotheses LH5, LH3 may be merged into the third line hypothesis LH6 if LS6>S1−CP, where CP is a common part in the score computation shared by the first and second line hypotheses LH5, LH3. This common part CP corresponds to the line score subpart resulting from pairs PR having one stroke ST in the first line hypothesis LH5 and another in the second line hypothesis LH3. These stroke pair contributions are computed in LS5 and LS3 but only once in LS6.

The probability scores P used in the computation of LS5, LS3 and LS6 can be derived from the probability matrix PM obtained in the selecting step S20.

If it is determined in S36 that the line hypotheses LH5, LH3 should be merged, the computing device 100 merges these line hypotheses (FIG. 14). The method then proceeds once again at step S23 to select a next current pair PR in the vector list LT and the process S23-S28 is repeated until all the pairs PR of the vector list LT have been processed to build the line hypotheses LH.

In another example, the decision process S27 (FIG. 14)—referred to more specifically as S27b in this example—is now described with reference to FIG. 15B. As already mentioned, the decision process S27b allows to determine whether the line hypotheses LH3 and LH5 should be merged. In step S30b, a merge score LSa between LH3 and LH5 is computed. In step S32b, a no-merge score LSb between LH3 and LH5 is computed. The decision of merging or not two line hypotheses (i.e. LH5 and LH3 in this example) rely only on the pairs PR for which the first stroke STa belongs to the first line hypothesis LH (here LH3) and the second stroke STb belongs to the second line hypothesis LH (here LH5). In the present example, only the probabilities P of pairs PR (ST1, ST3) and (ST2, ST3) are relevant for determining whether LH3 and LH5 should be merged. Accordingly, the two following line scores are computed: the merge score LSa which defines how well the two probability scores P associated with the two pairs PR suit a merge; and the second score LSb which defines how well the two probability scores associated with the two pairs PR suit a merge refusal (thus having better line hypotheses with LH3 and LH5 than a merged line hypothesis LH6). The first line score LSa is defined as the combination of logarithm of probability P for class CL=“same line” (so called PL) for all relevant pairs PR (here (ST1, ST3) and (ST2, ST3)). The second line score LSb is defined as the combination of logarithm of probability P for class CL=“break line” (so called PB) for all relevant pairs PR (here (ST1, ST3) and (ST2, ST3)). In step S36b, the two scores LSa and LSb are then compared to decide if the line hypotheses LH5 and LH3 should be merged or not. In this example, if the merge score LSa is greater than the non-merge score LSb, this means that the line hypotheses LH3 and LH5 should be merged into a better line hypothesis LH6. If the decision is yes, then a majority of (ideally all) the probability scores P for pairs PR involving the strokes ST1, ST2 and ST3 (and only those three) should be associated with a decision class CL=“same line”. If however the decision is that the line hypotheses LH3 and LH5 should not be merged, then a majority of (ideally all) the probability scores P of pairs PR involving one stroke from the first line hypothesis LH3 and one other stroke from the second line hypothesis LH5 should be associated with a decision class CL=“break line”.

It should be noted that if the pairs (ST1, ST3) and (ST2, ST3) both exist in the probability matrix PM, then the computation of the merge score LSa involves the combination of two probabilities (PL (ST1, ST3) and PL (ST2, ST3)) and the computation of the non-merge score LSb involves the combination of two probabilities as well (PB (ST1, ST3) and PB (ST2, ST3)), having PL=1−PB and PB=1−PL for each pair. This can be see as another way of describing the computation of the line scores as mentioned earlier.

It should be noted that two types of probability score P may be used in the present invention:

    • a “same line” probability score-noted PL-representing the probability that a pair PR of consecutive strokes ST belong to a same text line LN (e.g. probability score associated with the decision class CL=“same line”); and/or
    • a “break line” probability score-noted PB-representing a probability that a pair PR of consecutive strokes ST do not belong to a same text line LN (e.g. probability score associated with the decision class CL=“break line”).

In one example the line score is calculated using the logarithm of the probabilities (PL) of each pair PR of strokes ST present in a given line hypothesis LH and the logarithm of the inverse probability (1−PL=PB) of each pair PR for which only one of the two constitutive strokes ST belongs to the LH.

In the present example, the entries included in the probability matrix PM may define either a same line probability scores PL or break line probability scores PB but it is the same line probability score PL which are used to compute the line scores. Accordingly, any break line probability score PB which may be derived from the probability matrix PM is converted into a corresponding same line probability score PL (PL=1-PB). Various implementations are possible, using either same line probability scores PL, or break line probability scores PB, or a combination of the two in the probability matrix PM.

As shown in FIG. 6, once the generation S22 of the line hypotheses LH is completed, the computing device 100 may perform (S24) any appropriate post-processing depending on the implemented configuration. This post-processing step S24 may be used for instance to fix some obvious mistakes that may happen in the text line extraction S10 due to very particular stroke contexts, for instance when one neural net makes a mistake that cannot be compensated for or corrected by another neural net.

The present invention allows for an efficient and reliable text line extraction when handwriting recognition is performed on text handwriting by a computing device. As indicated earlier, line extraction is a key step in text recognition and it may not always produce satisfactory results, especially regarding some types of strokes such as diacritics, punctuation marks and the like. More generally, errors may arise during text line extraction when text handwriting is input in a non-chronological order.

The invention relies on several aspects which functionally interact with each other to achieve efficient text line extraction, as described earlier in particular embodiments. In particular, slicing text handwriting IN allows the computing device 100 to take decisions in different contexts with respect to each stroke ST of digital ink. The slicing step facilitates processing during the neural net analysis. If no slicing of the text input into multiple strips were performed, all the text strokes ST of the text handwriting IN would be contained in a single region as shown in FIG. 17A. The temporal and spatial reordering would thus be carried out globally on the entire text as a unique region. Temporal ordering would follow the natural user's handwriting order, as shown for instance in FIG. 17B. The spatial ordering along Y orientation would result in a more chaotic path, in particular regarding the position of consecutive strokes ST in the handwriting orientation X, as shown in FIG. 17C. The spatial sequences of strokes would appear random in terms of X positions.

Text slicing as performed in the present invention leads to a less chaotic spatial ordering, as shown in FIG. 17D. Slicing a document reduces the span in the handwriting orientation X along which the random pattern mentioned above appears.

As can be seen in FIGS. 17B and 17C, line breaks-noted LB-occur rarely when temporal or spatial ordering is performed without text slicing (typically only one line break LB between one pair of strokes). During a neural net analysis, a neural net would thus have only one opportunity to detect this break (or separation) between two text lines LN. Slicing the text handwriting IN in K slices (K>1) gives rise to at most K chances for the neural net to detect the break between two text lines LN, as shown in FIG. 17D where slicing into 5 strips leads to 5 different line breaks LB that may each be detected during neural net analysis.

Another advantage of text slicing is that it brings variability of stroke context for some strokes ST. Without slicing, a large stroke ST for instance may only be linked to one stroke in a text line above and to one stroke in a text line below. By slicing the document, this large stroke ST can be included in several slices, while other smaller strokes will not appear in all the same slices.

FIG. 18A illustrates for instance a case where a large stroke ST10 (a fraction bar) extends horizontally in the handwriting orientation X. By dividing the text handwriting IN into multiples slices SP as shown in FIG. 18B, this long stroke ST10 can be more efficiently processed during line text extraction as it will be included in different slices SP and thus treated in different stroke contexts. Assuming that each digit 1-9 is made of one stroke ST, it can be seen that, without slicing, a vertical ordering gives the stroke order [1, 2, 3, 4, 5, ST10, 6, 7, 8, 9] (FIG. 18A). The fraction bar ST10 will be detected and treated in the text line extraction process only in two pairs PR of consecutives strokes, namely: (5, ST10) and (ST10, 6). However, with a slicing in 3 strips as shown in FIG. 18B, 3 spatial stroke orders can be generated, namely: [1, 2, ST1, 6, 7]; [3, 4, ST10, 8, 9]; and [5, ST10 bar, 9]. The fraction bar ST10 can thus be detected and treated in 6 different pairs of strokes ST during the text line extraction process.

Finally, generating stroke orders in restricted strips allows limiting line break oscillations between two text lines LN. A stroke order without break oscillation is a stroke order where the strokes of each text line LN are grouped in the ordered lists (all strokes from text line LN1, then all strokes from text line LN2, and so on). Oscillations occur for instance when a stroke from a previous text line LN appears in an ordered list in the middle of another text line LN. For example, oscillation occurs in an ordered list comprising successively some strokes ST from a line LN1, then one or several strokes ST from a line LN2 and again some strokes ST from text line LN1, and so on. Such oscillating orders are more difficult to analyze by a neural net. By slicing the text handwriting as described earlier, oscillations in the ordered lists can be limited.

By configuring the strip SP so that they overlap with each other as described earlier, the process of text line extraction can be improved even further. Implementations are the strips SP do not overlap are however also possible. Setting for instance a 75% overlap between each pair PR of adjacent strips SP ensures that each stroke ST will be found in several different stroke context by the computing device 100 during the text line extraction (FIGS. 7-8).

As shown in FIG. 19, the width WD of the strips SP may be defined based on the scale of the strokes ST contained in the handwriting input IN. In the particular embodiment shown in FIG. 19, the computing device 100 determines (S52) during the slicing step S12 (FIGS. 6-8) a width WD of the strips SP based on the scale (or size) of the strokes ST forming the text handwriting IN. The scale of the strokes ST are previously determined (S50) according to any suitable technique known to the skilled person. The computing device 100 then slices (S54) the display area containing the strokes ST, as already described, and assigns each stroke ST to a least two respective strips SP.

In a particular embodiment, the slicing S12 (FIGS. 6-8) is configured so that each pair PR of adjacent strips SP partially overlap with each other to share between 50% and 85% of their respective area.

By generating multiple stroke orders per slice in an overlapping slice environment, it is highly probable that a pair of consecutive stroke ST will be found several times by the computing device 100, thereby producing as many probability scores for a same pair PR of consecutive stroke ST during the neural net analysis. By selecting only the neural net decision that gives the higher probability score P, efficient text line extraction can be achieved.

Further, as described earlier, the computing device 100 may generate a first set SLa of ordered lists of strokes during the ordering step S14 (FIGS. 6, 9 and 12). By generating for each strip SP multiples ordered lists according to various criteria (temporal order and spatial criteria), the line breaks LB can be identified even more easily since one given line break can be identified in a particular pair PR of consecutive strokes in each ordered list. Generating different stroke orders can be seen as analyzing the problem of text line extraction through different points of view to assist the computing device in finding, for each pair PR of consecutive strokes, the best stroke context that will result in the best decision.

More specifically, by generating a temporal order of strokes for each vertical slice, temporal orders easier to process than a global one can be generated. It limits the delayed stroke gap. Additionally, strokes from user corrections or the like may be processed temporally closer to their stroke context. The spatial analysis is also facilitated in a sliced environment, since reordering strokes based on a spatial order helps discovering the local gaps between strokes that can be inter-line space. The stroke distribution on the X axis (along the handwriting orientation) may sometimes be chaotic. The text slicing performed in the present invention allows limiting this stroke distribution chaos and facilitates processing by the neural net.

The more slices and the more ordered lists per slices, the more likely the computing device 100 will detect several times the same pair PR of consecutive strokes ST during the process of text line extraction. A trade-off should however achieved between the number of opportunities to identify line breaks LB and the required resources and time to implement the text line extraction. It has been observed for instance that generating 4 different ordered lists per strip according to for different criteria affords goods results. It has also been observed that generating a temporally-ordered list of strokes and at least one spatially-ordered ordered list of strokes, as described earlier, allows for a highly efficient text line extraction, although other implementations are possible.

Still further, as described earlier, the computing device may also generate a second set SLb of ordered lists by filtering out relatively small strokes ST from the ordered lists of the first set SLa (step S16, FIGS. 6, 11 and 12). As already discussed, relatively small strokes such as diacritics and the like can cause errors during text line extraction. Removing these relatively small strokes from the ordered lists during the text line extraction allows comparing the decision classes and associated probability scores obtained for each pair PR of consecutive strokes with and without these relatively small strokes ST. The decision classes with the best level of confidence (with highest probability scores) can be retained and used for building text line hypotheses, thereby allowing for an efficient and reliable text line extraction.

In the present invention, one or more neural nets can be used to deal with the temporal and spatial aspects of handwriting, as described earlier. The system may automatically decide to follow the temporal or the spatial aspect depending on the stroke context.

As also described, two specialized neural networks can be used to deal respectively with temporal and spatial orderings, although this is only one example among the possible implementations. Recurrent Neural Networks (RNN) may be particularly well suited in some cases to perform the neural net analysis.

It should be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, or blocks may be executed in an alternative order, depending upon the functionality involved. For instance, the line scores contemplated with reference to FIGS. 14, 15A and 15B may be computed in different orders.

In a particular embodiment, in addition to the ordered lists L1a-L4a and L1b-L4b generated (S14, S16, FIG. 6) for each strip SP, the computing device 100 is configured to also generate two additional temporally ordered lists L5a and L5b for the entire area formed by all the strips SP. More particularly, the computing device 100 orders together all the strokes ST of the text handwriting IN (FIG. 4) to generate a so-called third timely-ordered list La of strokes ST arranged in a temporal order (TO) and also forms a so-called fourth timely-ordered list L5b by filtering out from the third timely-ordered list L5a the strokes ST below a size threshold. These timely-ordered lists L5a and L5b are generated in an analogous manner to the respective timely-ordered lists L1a and L1b as described earlier, with the difference that the lists La and L5b correspond to all the strips SP together instead of corresponding to a single one of the strips SP. The neural net analysis S18 (FIGS. 6 and 12) is also applied on these timely-ordered lists La and L5b. Namely, during the neural net analysis S18, the computing device 100 also determines as a decision class CL, for each pair PR of consecutive strokes ST in the third and fourth timely-ordered lists La, L5b, whether the strokes of said pair belong to a same text line LN, in association with a probability score P for said decision class. These timely-ordered lists L5a and L5b are also be taken into account (together with all the ordered lists L1a-L4a and L1b-L4b generated for each strip SP) when selecting for each pair PR a decision class CL with the highest probability score P during step S20 (FIGS. 6 and 12). In other words, during the selection step S20, the computing device 100 may select the decision class CL determined with the highest probability score P during the neural net analysis S18 for each pair of consecutive strokes included (or present) in at least one of the ordered lists L1a-L4a, the ordered lists L1b-LAb, the third timely-ordered list L5a and the fourth timely-ordered list L5b. This particular embodiment allows to further improve the definition of text lines LN in some specific cases.

Further, text block extraction is a sequential gathering process, it can be considered as a bottom-up approach including spatially gathering text lines to create text block hypotheses and assess the most coherent text block set according to a cost calculation, as further detailed below.

Following the text line extraction (S10, FIG. 6) to extract text lines from the strokes ST detected in the text handwriting IN, the text block extraction process receives as input the extracted text lines. The text block extraction (S60, FIG. 20) comprises the steps S62-S74 as described further below in the present example.

The computing device 100 then performs an iterative method to extract text blocks by generating all possible text block hypotheses and evaluating resulting text block sets according to a calculated cost. A text block is a structured text section containing at least one text line arranged according to a guideline pattern. The guideline pattern comprises a plurality of guidelines (or base lines) along which the text lines are positioned. The guideline pattern may impose constraints of position, orientation and size on the text input displayed in the display area. A set of text blocks includes at least one text block hypothesis or a combination of text block hypotheses resulting from combining and/or including the extracted text lines into several text block hypotheses.

All the extracted text lines need to be ordered to define an input sequence.

More specifically, in an ordering step S62, the extracted text lines are ordered vertically based on the vertical position of the base lines of each text line.

As an iterative process, the text block extraction includes preliminary steps (S62-S70, FIG. 20) for initializing a current text block and a current text block set further updated and evaluated in the process.

In a generating step S64, an initial text block is implemented as including a first text line of the ordered text lines.

In a generating step S66, an initial text block set is implemented as including the initial text block.

In a setting step S68, the current text block is initialized with the initial text block.

In a setting step S70, the current text block set is initialized with the initial text block set.

Iteratively, a next text line is added to the text block sets until the last ordered text line. All possible text block sets are evaluated according to a cost function and sorted out according to cost criteria.

More specifically, in an updating step S72, the current text block set is updated by generating a certain number of next text block sets, wherein the certain number of next text block sets is the number of the at least one current text block set plus the number of the at least one current text block of the at least one current text block set.

In a generating step S722, the next text block sets are generated by combining the next text line with each of the at least one current text block of the at least one current text block set and including the next text line as one next text block in one of the next text block set.

In a calculating step S724, a cost of each next text block set is calculated, wherein a cost of the next text block set comprises one or more of calculated sub-costs. Calculating the sub-costs may for example include one or more or a combination of the following: calculating a global alignment of the combined text lines; calculating a text height coherence of the combined text lines; calculating interline distances between the combined text lines; calculating gap distances between the combined text lines with respect to the average text height of the combined text lines.

The process evaluates the possible combination of merging spatially ordered text line hypotheses by computing the cost of all possible next text block combination defining the certain number of next text block sets. The current text block sets are then replaced by the next text block sets fulfilling one or more cost criteria.

In a replacing step S726, the at least one current text block set is replaced by the at least one next text block set of the certain number of the next text block sets that fulfils one or more cost criteria. The one or more cost criteria comprises for example value thresholds for each sub-cost and/or, a value threshold of the cost of the next text block set and/or, the next text block sets may be classified according to an ascending order to select the sets with the lowest costs, for example the first ten sets with the lowest costs.

The updating of the current text block sets is completed when the last ordered text line has been combined or included in the next text block set.

In an extracting step S74, the text blocks are extracted from a text block set from the current set. The current text block set from which the text blocks are extracted has the lowest cost of the at least one current text block set.

FIGS. 21A-21E schematically illustrate an example of a text block extraction of three extracted text lines TL1, TL2, TL3 according to the method explained in FIG. 20.

FIG. 21A shows the three lines TL1, TL2 and TL3 representing text content as displayed on the display area. The three text lines are displayed apart from each other, and the base line of each text line is vertically ordered along the (Y) orientation of the page.

FIG. 21B shows an illustration of the initialization steps (S62-S70, FIG. 20) outcome. An initial text block is created TB1 including the first ordered text line TL1 and an initial text block set S1 is created including the initial text block TB1. Additionally, a current text block CTB is set as the initial text block TB1 and a current text block set CS is set as the initial text block set S1. From this initial configuration an iterative procedure is triggered for evaluating all possible combinations of text block hypotheses and selecting a final configuration to be displayed as an optimized set of text blocks.

FIG. 21C shows the outcome of a first iteration of updating the current text block set S1 of FIG. 21B with the second ordered text line TL2. The second text line TL2 is combined with each of the text blocks of the current text block set S1 which contains one current text block TB1; and the second text line TL2 is included as one text block, to generate a certain number of next text block sets S10 and S20. The certain number of text block sets is deduced from the number of current text block set, i.e. one text block set S1 of FIG. 21B, plus the number of current text blocks of the current text block set, i.e. one text block TB1 of FIG. 21B. The first iteration of updating the current text block set S1 of FIG. 21B is therefore generating two next text block sets as shown in FIG. 21C. Costs of each next text block sets S10 and S20 are calculated and when the cost of the next text block set fulfils cost criteria, the cost fulfilling next text block set replaces the current text block set S1. In this example the cost of the next text block sets S10 and S20 are each fulfilling the cost criteria, therefore the current text blocks set S1 is replaced by the two text block sets S10 and S20. In another example, only one of the next text block sets may have an acceptable cost, for example a cost lower than a pre-defined threshold, then only one of the next text block replaces the current text block set.

FIG. 21D shows the outcome of a second iteration of updating the current text block sets S101 and S102 of FIG. 21C with the third ordered text line TL3. The third text line TL3 is combined with each of the text blocks of the current text block sets S10 and S20 which contain one text block TB10 and two text blocks TB21 and TB22, respectively. The second text line is also included as one text block to generate the certain number of next text block sets S101, S102, 201, S202 and S203. The certain number of text block sets is deduced from the number of current text block sets, two text block set S10 and S20 of FIG. 21C, plus the number of current text blocks of the current text block sets, three text blocks TB10, TB21 and TB22. The second iteration of updating the current text block sets S10 and S20 of FIG. 21C is therefore generating five next text block sets as shown in FIG. 21E. The costs of each next text block sets S101, S102, 201, S202 and S203 are calculated and when the cost of the next text block set fulfils the cost criteria, the cost fulfilling next text block set replaces the current text block sets S10. In this example the cost of the next text block sets S101, S102, 201, S202 and S203 are each fulfilling the cost criteria, therefore the current text blocks sets S10 and S20 are replaced by the five text block sets. In another example, only the next text block sets with a cost lower than a pre-defined threshold are considered as acceptable, therefore only the cost acceptable next text block replaces the current text block set. A maximum number of cost acceptable next text blocks may be kept, for example only up to ten next text block sets with the lowest costs are kept to extract the best text block set hypothesis.

As the last ordered text line TL3 is reached, the iterative process ends and the method further extracts the final text blocks from one of the last iterated current text block sets S101, S102, 201, S202 and S203. For example, the extracted text blocks results from the current text block set with the lowest cost. In the present example, the text blocks extracted from the three text lines are two text blocks TB104 and TB105 of the current text block set S201 with the lowest cost.

The cost calculation may comprise calculation of several sub-costs that evaluates an acceptable text block set. For example, such sub-costs may assess how the text lines are globally aligned (on the left side or on the right side); how a text line overlaps with the previous one; a text height coherence of the text lines; an interline distance coherence of the text lines; a gap coherence between the text lines with respect to the average text heigh of the combined text lines. Additionally, the non-text strokes may have an impact on the text block construction and two text lines shall not be combined in a same text block if there is a non-text stroke in between.

More specifically, a first sub-cost may calculate an alignment as a function of a left alignment and a right alignment with a border of a line inside a block hypothesis. Such function may keep a minimum value between the left and right alignment since paragraphs are normally aligned on one side only. A left or right alignment may be measured as an offset of the line from a left or right border, respectively. If, however, the horizontal overlap (vertical projection) between the last added line in the hypothesis and the rest of the paragraph is big enough (e.g. bigger than 75%) then the alignment cost may be forced to zero.

A second sub-cost may comprise a calculation of coherence of the text heights as a function of a maximum height, a minimum height and an average or mean height inside the text block hypothesis. For example, such calculation may be equal to the difference between the maximum and the minimum height divided by the mean height.

A third sub-cost may comprise calculation of interline distances, i.e. distance between two consecutive baselines. For example, such calculation may be equal to the difference between the maximum and minimum interline distance divided by the mean interline distance between two consecutive lines.

Another sub-cost may comprise a calculation of a “gap” or space between two lines. Such calculation may be equal to the maximum vertical space divided by the mean height of the line. The vertical space may be calculated as a vertical distance from the baseline and until the highest (closest) point of the line below, assuming the lines are horizontally parallel.

When there are multiple sub-scores taken into account the global paragraph cost may be a function of the multiple sub-scores, e.g. it may be equal to a square root of the sum of the multiple sub costs.

The present invention having been described in particular embodiments, it is clear that it is susceptible to numerous modifications and embodiments within the ability of those skilled in the art, in accordance with the scope of the appending claims. In particular, the skilled person may contemplate any and all combinations and variations of the various embodiments described in this document that fall within the scope of the appended claims.

In another aspect of the invention, a method for recognizing handwriting input from handwriting strokes of digital ink is proposed to increase the overall speed of the recognition processing. The method allows to extract handwritten input elements such that the input elements are processed by different processing units in parallel. Parallel processing of the recognition, as allowed by the present method, improves the execution speed and accuracy of the handwriting recognition. The improved method is illustrated in FIG. 23 and further described below.

The flow diagram of FIG. 22 shows a receiving step S210, an extracting step S220, a grouping step S230, a recognizing step S240 and a compiling step S250.

In the receiving step S210, the computing device is receiving handwriting input representing text content (e.g., alphanumeric characters) or non-text content (e.g. shapes, drawings).

The handwriting digital ink IN is input in the input area of the display. The computing device displays the plurality of input strokes of the handwriting input IN on the display unit.

Each text or non-text input is formed by one or more strokes of digital ink defining different handwriting input elements. Input elements may comprise for instance text elements in text context such as words, text lines and text blocks and non-text elements in graphic or geometric context. Non-text content defines graphic or geometric formations in linear or non-linear configurations, including containers, drawings, common shapes (e.g. arrows, blocks, etc.) or the like. In an unconstrained canvas for instance, text content may be contained in containers or shapes (a rectangle, ellipse, oval shape . . . ). Non-text elements include drawing or diagram elements such as containers and connectors and text elements in non-text context such as diagram annotations, mathematical equations and so on.

In the element extraction step S220, the computing device is performing element extraction from the plurality of input strokes detected from the digital ink to extract a plurality of elements. The element extraction comprises a disambiguation process to distinguish between text and non-text content according to any suitable technique known to the skilled person.

In a particular example, the computing device DV may be configured to apply, to e.g. a page of strokes of digital ink, a two-step process to identify and output text blocks. As a first step of this two-step process, a text versus non-text classification may be performed to attribute a label to each stroke indicating if it's a textual stroke or a non-textual stroke. Text strokes aim to be recognized and transcribed at some point. The non-textual strokes are actually any other strokes that do not correspond to text. The non-textual strokes can be any type of strokes, such as drawings, table structures, recognizable shapes, etc.

In a preprocessing stage, the HWR system 114 is configured to perform the disambiguation process. The preprocessor does this by classifying the elements of the digital ink into different classes or categories, being non-text (i.e., shape), text and a mixture of shape and text. The classified digital ink is then parsed to the recognizer for suitable recognition processing depending on the classification. The present system and method automatically detect and differentiate the input of the different handwritten objects of shapes and text, so that they are processed further by the HWR system as described below with suitable recognition techniques, e.g., the strokes of the detected shapes are processed using a shape language model and the strokes of the detected text are processed using a text language model.

The preprocessing stage of the HWR system 114 is configured to perform the disambiguation process. The preprocessor does this by classifying the elements of the digital ink into different classes or categories, being non-text (i.e., shape), text and a mixture of shape and text.

The text elements may be words, text lines or text blocks. Text line extraction and text block extraction may be performed according to the methods presented in the flow charts of FIG. 6 and FIG. 20 as described above. The non-text elements may be diagrams including containers and shapes, drawings, image data, as well as characters, strings or symbols used in non-text contexts.

In another embodiment, the element extraction step S220 comprises analyzing the plurality of elements S230 to generate groups of elements.

The grouping of the plurality of elements S230 is establishing semantic meaningful groups of elements.

The semantic groups establish a language intelligible context for a group of elements. The identified semantic groups may extend over a line, a paragraph or even the complete handwriting content depending on each case. The identified semantic groups may truncate an extracted text line or an extracted drawing.

Semantic connections are identified by applying predefined semantic rules to each element of the plurality of elements.

The predefined semantic rules thus allow discovering semantically related text or non-text elements. The predefined semantic rules may be applied to all elements identified in S220.

The predefined semantic rules intend at reconstituting meaningful sentences, diagrams or drawings. For example, a meaningful sentence is built according to semantic patterns of language convention including, for example, capitalization, punctuation, paragraphing and indentation. In another example, a meaningful diagram is built according to geometrical convention of shapes, connectors and spatial arrangements.

Therefore, the generation of semantic group of elements according to the semantic predefined rules may comprise merging elements according to merging predefined rules and splitting elements according to splitting predefined rules.

In one embodiment, a first merging predefined rule is applied to at least two extracted text lines ordered according to the input sequence wherein a junction pattern is detected as the last symbol of one text line. Consequently, a merged text line is generated by merging the one text line with the subsequent extracted text line of the input sequence.

In one embodiment, the junction pattern is a punctuation mark parsing words into separate units for example into subsequent text lines, for example, the junction pattern is a punctuation mark such as a hyphen.

In another embodiment, a second merging predefined rule is applied to at least two extracted text lines wherein a special formatting of a first element is detected and at least a second element having the same special formatting is detected in the vicinity of the first element. Special formatting of text refers to text styled or arranged in a particular way to enhance its visual appearance. Special formatting of text is usually performed to put emphasis on meaningful text. Special formatting includes for example bolding, italicizing, underlining, or highlighting. Consequently, a merged element of the special formatted text is generated by merging the first and the at least second elements detected in close vicinity, e.g. one element right before or after the other. A first and a second text elements in close vicinity are, for example, consecutive text elements of the sequence of input.

In one embodiment, the detected special formatting of the first and the at least second elements may be bolding, italicizing, underlining or coloring of the elements.

In another embodiment, a splitting predefined rule is applied on an element wherein a splitting pattern is detected within the element. Consequently, a first split element and a second split element are generated by splitting the element according to the splitting pattern.

For example, detecting a splitting pattern comprises detecting of a punctuation mark within one text line. In another example, detecting a splitting pattern comprises detecting of a line break within one text block.

In another embodiment, a splitting predefined rule is applied to one element wherein a first and a second formatting is detected within one element. The first and the second formatting may each be a special formatting such as bolding, italicizing, underlining, highlighting or coloring. One of the first or the second formatting may be a non-formatted element distinguished from the special formatting of a contiguous formatted element.

In the recognizing step S240, the computing device is sending at least two elements of the plurality of elements to at least two processing units, respectively for recognition in parallel of the elements by the handwriting recognition module 114 as described above.

Therefore, the computing device may identify the total number of elements and the available number of processing units. If the total number of elements is higher than the available number of processing units, the computing device may send a first number of the plurality of elements to the available processing units, the first number of elements being equal to the available number of processing units.

If the total number of elements is lower than, or equal to, the number of available processing units, the computing device is sending the plurality of elements to the available processing units simultaneously.

Subsequently, the computing device may send successively the remaining elements of the plurality of elements, to the processing units for recognition in parallel as the processing units become available.

In one embodiment, the computing device may calculate a complexity score for each element of the extracted plurality of elements for sending the elements of the plurality of elements to the processing units in an orderly and efficient manner. The complexity score may be any model of scoring allowing the classification of the elements and taking into account, for example, a total number of strokes, a scale, a number of curvature and derivatives of the curvature, a total number of characters, a total number of words and/or any features describing the elements spatially or linguistically. The complexity score may be implemented with appropriate neural networks.

The complexity score calculated for each element may be used for ordering each of the elements, therefore the computing device may send the ordered elements of the plurality of elements to the processing units as the processing units become available. The sending of the ordered elements to the processing units allows the processing time of each simultaneous recognition to be minimized therefore the total processing time is optimized.

In one embodiment, the sent elements are semantic groups of elements.

Each element or group of elements is then parsed to the HWR module 114 for suitable recognition processing depending on the classification.

The processor 106 may be a multicore processor with at least two separate processing units or cores, each of which reads and executes program instructions. The processor 106 can run instructions on separate cores at the same time, increasing overall speed of the recognition of the handwriting input. The handwriting input which is processed according to semantic groups of elements can be processed by the separate cores in parallel, therefore speeding up the recognition processing time of the overall handwriting input.

The recognition of the handwriting input IN involves running the recognizing steps of the different groups of elements simultaneously. In fact, the recognition of handwriting input involves the analysis of context dependent information which may impact the outcome of the recognition.

For example, when processing digital ink classified as text, the HWR module 114 employs a segmentation expert to segment individual strokes of the text to determine segmentation graphs, the recognition expert to assign probabilities to the graph nodes using a classifier, and a language expert to find the best path through the graphs using, for example, a text-based lexicon of linguistic information.

The language expert generates linguistic meaning for the different paths in the segmentation graph using language models (e.g., grammar or semantics). The language expert checks the candidates suggested by the other experts according to linguistic information. The linguistic information can include one or more lexicons, regular expressions, etc. The language expert aims at finding the best recognition path. In one example, the language expert does this by exploring a language model such as final state automaton (aka determinist FSA) representing the content of linguistic information. In addition to the lexicon constraint, the language expert may use statistical information modeling for how frequent a given sequence of elements appears in the specified language or is used by a specific user to evaluate the linguistic likelihood of the interpretation of a given path of the segmentation graph.

On the other hand, when processing digital ink classified as non-text, the HWR module 114 employs the segmentation expert to segment the strokes of the shape, the recognition expert to determine segmentation graphs using the classifier, and the language expert to find the best path through the graphs using a shape-based lexicon of the linguistic information.

The mixed content classification is treated as ‘junk’ and will result in low probability of recognition when parsed to the HWR module 114. Shapes that are parsed to the recognizer and not recognized because, for example, they are out-of-lexicon shapes are treated as doodles, being unrecognized content.

Therefore, parallelization improves the processing time of recognition only if the recognition context of each group of elements is preserved to optimize the operation of the language expert resources.

In the compiling step S250, the computing device is compiling the plurality of recognized elements to generate the recognized handwriting input.

FIG. 23 shows an example processed according to the method described above and in FIG. 22.

FIG. 23A shows a handwriting input 1N1 as displayed on the input area 200 of the computing device. The handwriting input 1N1 is a text block TB1 and a non-text block, here a block diagram BD. The plurality of ink strokes of the handwriting input 1N1 is detected by the computing device. The text-block TB1 and the block diagram BD are extracted according to the method described above.

FIG. 23B shows the sending of the two elements, the first text block TB1 and the block diagram BD, to two processing units PU1 and PU2 of the computing device for executing the handwriting recognition. In this example, the computing device comprises a processor 106 with two available processing units PU1 and PU2, therefore the two elements TB1 and BD can be sent in simultaneously to the first and the second processing unit respectively. Therefore, the processing time for recognizing the entire handwriting input IN1 is reduced by having two parallel processing threads available for the recognizing of two elements simultaneously.

The outcome of the recognition of the two elements TB1 and BD is shown in FIG. 23C as a recognized text RT and a recognized diagram RD. The resulting text and diagram are displayed as converted text and non-text elements on the display 102 of the computing device.

FIG. 24A-24J show an example processed according to the method described above and in FIG. 22.

FIG. 24A shows a handwriting input IN2 as displayed on the input area 200 of the computing device. The handwriting input is composed of six text line elements TL1, TL2, TL3, TL4, TL5 and TL6. The plurality of ink strokes of the handwriting input is detected by the computing device and the six text lines are extracted according to the method described above.

Further FIG. 24B shows updated text line elements of FIG. 24A analyzed according to the method of FIG. 22 and stored in the memory 108 of computing device, wherein the computing device applied semantic predefined rules to each extracted text line TL1-TL6. In this example, the extracted text line TL1-TL6 are split according to split predefined rules to generate updated split text lines shown as TL21, TL22, TL31, TL32, TL41, TL41, TL51 and TL52. The split predefined rules comprise the detection of a split pattern including the detection of specific punctuation marks such as a point, a comma, a colon, a semicolon, a question mark or an exclamation mark. A first punctuation mark P1, being a point, is detected in the text line TL2 therefore the text line TL2 is split to generate a first split text line TL21 and a second split text line TL22. A second punctuation mark P2, being a comma, is detected in the text line TL3 therefore the text line TL3 is split to generate a third split line TL31 and a fourth split line TL32. A third punctuation mark P3, being a semicolon, is detected in the text line TL4 therefore the text line TL4 is split to generate a fifth split line TL41 and a sixth split line TL42. A fourth punctuation mark P4, being a semicolon, is detected in the text line TL5 therefore the text line TL5 is split to generate a seventh split line TL51 and an eighth split line TL52. No split predefined rules are detected within the text lines TL1 and TL6.

FIG. 24C shows the updated text line elements of FIG. 24B analyzed according to the method of FIG. 22 and stored in the memory 108, wherein the computing device is applying semantic predefined rules to each updated text lines TL1, TL21, TL22, TL31, TL32, TL41, TL41, TL51, TL52 and TL6. In this example, the updated text lines are merged according to merging predefined rules to generate updated text lines shown as SL1, SL2, SL3 and SL4. The merging predefined rules are applied onto a sequence of the updated text lines. For example, the text line TL1 and the updated split text line TL21 make a first semantic group of elements S1, here a first sentence, starting from a first capital letter ‘S’ of the first text line TL1 and ending at the punctuation mark of the updated split text line TL21, here a first point. The updated split text lines TL22, TL31, TL32 and TL41 make a second semantic group of elements S2, here an independent clause of a second sentence, starting from a second capital letter ‘T’ of TL22 and ending by a punctuation mark of TL41, here a first semicolon. The updated split text lines TL42 and TL51 make a third semantic group of elements S3, here another independent clause of the second sentence, starting from the updated split text line TL42 and ending by a punctuation mark of TL51, here a second semicolon. The updated split text lines TL52 and TL6 mark a fourth semantic group of elements S4, here a final independent clause of the second sentence, starting from the updated split text line TL52 and ending by a punctuation mark of TL6, here a second point.

FIG. 24D shows the sending of two elements of the updated text lines SL1, SL2, SL3 and SL4 to two available processing units PU1 and PU2 of the computing device for executing the handwriting recognition of the input elements in parallel. In this example, the computing device comprises a processor with two available processing units, PU1 and PU2, therefore two text lines SL1 and SL2 are sent to the first and the second processing unit simultaneously. The recognition of SL1 and SL2 is executed in parallel. The recognition of the first text line SL1 is completed first. The outcome of the first recognized text line is shown in FIG. 24E wherein the first recognized text line RL1 is stored in the memory 108, the recognition of the element SL2 is still processing therefore the second text line is still shown as unrecognized element SL2 at this stage and the third and fourth text lines, which are not sent yet to the processing units, remain as unrecognized elements SL3 and SL4 in the memory.

Following completion of the recognition of SL1, the processing unit PU1 becomes available, therefore a next text line SL3 is sent to the PU1. FIG. 24F shows the text line SL3 sent to the first processing unit PU1 for recognition, whereas the text line SL2 is still in process for recognition at the second processing unit PU2.

FIG. 24G shows converted text lines RL1, RL2 and RL3 and the text line SLA stored in the memory 108. RL1, RL2 and RL3 are the outcome of the parallel recognition of SL1, SL2 and SL3 sent successively to the two processing units as each was becoming available. In this example as shown in FIG. 24D, the first and the second text lines SL1 and SL2 were initially sent to the available processing units PU1 and PU2. Then, the recognition of SL1 was completed first at the first processing unit, making the first processing unit PU1 available. The availability of PU1 triggered the sending of the third text line SL3 to the available processing unit PU1, as shown in FIG. 24F. The recognition of SL2 was completed second at the second processing unit, making the second processing unit PU2 available. The availability of PU2 triggers the sending of the fourth text line SL4 to the available processing unit PU2, shown in FIG. 24H. The recognition of SL3 is completed third at the first processing unit. Therefore, at this transitory stage, FIG. 24G shows, as stored in the memory 108, the recognized text lines RL1, RL2 and RL3 as recognized elements, whereas the text line SL4, still under processing at the processing unit PU2, remains unrecognized.

FIG. 24H shows the sending of the fourth text line SL4 to the processing unit PU2 of the computing device for executing the handwriting recognition. The recognition of SL1, SL3 and SL2 were executed, previously in parallel, as described above.

FIG. 24I shows, after completion of the recognition of the fourth text line SL4, the outcome of the parallel recognition of four text lines as stored in the memory 108 as recognized text lines RL1, RL2, RL3 and RL4.

Therefore, the processing time for recognizing the entire handwriting input IN is reduced by having two parallel processing threads available for the recognizing of the extracted semantic text lines. Additionally, the processing time may be optimized further by calculating complexity scores for each of the semantic text lines. The complexity scores of each semantic text line allow the computing device to order each of the semantic text lines SL1, SL2, SL3 and SL4 according to an ordering sequence defined by the complexity scores. The first two semantic text line SL1 and SL2 are similarly complex according to, for example, the number of words of each text lines. Therefore, SL1 and SL2 are processed simultaneously by the two processing units. After the completion of the recognition of the handwriting text line SL1 and SL2, the following two semantic text lines SL3 and SL4 of the sequence of ordered elements, may be processed by the two processing units as they become available.

The recognition steps of each ordered text line may be parallelized and executed according to similarly complex handwritten input. The outcome of the recognition of text lines SL1, SL2, SL3 and SL4 is shown in FIG. 24I as recognized text lines RL1, RL2, RL3 and RL4.

As shown in FIG. 24J, the resulting text lines RL1, RL2, RL3 and RL4 are compiled and displayed as one recognized text block RTB on the display area 102.

The preservation of the linguistic context of each semantic lines SL1, SL2, SL3 and SL4 which are sent individually to the available processing units allows to maintain a good recognition accuracy rate. The linguistic contribution of the language expert to the recognition process of each semantic text line is equivalent to the linguistic contribution of the whole text block TB. The recognition accuracy rate of the compiled text lines is equivalent to the recognition rate of the text block, which would be sent as one element. The recognition processing time of the compiled text lines is faster than the recognition processing time of the text block, which would be sent as one element.

The present invention having been described in particular embodiments, it is clear that it is susceptible to numerous modifications and embodiments within the ability of those skilled in the art, in accordance with the scope of the appending claims. In particular, the skilled person may contemplate any and all combinations and variations of the various embodiments described in this document that fall within the scope of the appended claims.

Claims

1. A method for recognizing handwriting input from handwriting strokes of digital ink, on a computing device, the computing device comprising a processor with at least two processing units configured to process data in parallel, a memory and at least one non-transitory computer readable medium for recognizing input under control of the processor, the method, comprising:

receiving the handwriting strokes of digital ink;
performing element extraction from said strokes to extract a plurality of elements;
recognizing the plurality of elements in parallel by: sending at least two elements of the extracted plurality of elements to at least two processing units, respectively; sending successively the remaining elements of the extracted plurality of elements to the processing units as the processing units become available;
compiling the plurality of recognized elements to generate the recognized handwriting input.

2. The method of claim 1, wherein the elements are text or non-text elements.

3. The method of claim 2, wherein the text elements are words, lines, paragraphs, or mathematical expressions.

4. The method of claim 3, wherein the non-text elements are shapes, drawings or image data including characters, strings or symbols used in non-text contexts.

5. The method of claim 1, wherein sending elements to the processing units comprises sending semantic groups of elements to the processing units.

6. The method of claim 5, comprising grouping the plurality of elements to generate the semantic groups of elements according to semantic predefined rules, wherein the grouping of the plurality of elements comprises:

merging at least two elements according to merging predefined rules to update the plurality of elements; and/or
splitting at least one element according to splitting predefined rules to update the plurality of elements.

7. The method of claim 6, wherein applying one merging predefined rule, to at least two consecutive elements of a sequence of text lines, comprises:

detecting one text line of the sequence of text lines including a junction pattern of the one text line;
generating a merged text line comprising the detected text line merged with a subsequent text line of the sequence of text lines.

8. The method of claim 7, wherein the junction pattern is a merging punctuation mark as the last symbol of the one text lines such as a hyphen.

9. The method of claim 6, wherein applying one merging predefined rule to at least two elements comprises:

detecting a special formatting of a first element;
detecting the special formatting of at least a second element in the vicinity of the first element;
generating a merged element comprising the first and the at least second elements.

10. The method of claim 9, the special formatting of the first and at least second elements is bolding, italicizing, underlining, or coloring.

11. The method of claim 6, wherein applying one splitting predefined rule on one element, comprises:

detecting a split pattern of the one element;
generating a first split element and a second split element according to the splitting pattern.

12. The method of claim 11, wherein the split pattern is a splitting punctuation mark or a line break.

13. The method of claim 6, wherein applying another splitting predefined rule on one element comprises:

detecting a first and a second formatting within the one element;
generating at least two split groups of elements wherein a first split group of contiguous elements of the first formatting and at least a second split group of contiguous elements of the second formatting.

14. The method of claim 1, wherein the sending of the at least two elements of the plurality of elements comprises:

identifying the total number of elements;
identifying the available number of processing units;
if the total number of elements is higher than the available number of processing units: sending a first number of the plurality of elements to the available processing units, the first number of elements being equal to the available number of processing units.

15. The method of claim 14, wherein

if the total number of elements is lower than, or equal to, the number of available processing units:
sending the plurality of elements to the available processing units simultaneously.

16. The method of claim 1, wherein the recognizing of the plurality of elements in parallel comprises:

calculating a complexity score for each element of the plurality of elements;
ordering each element according to the complexity score;
sending the plurality of elements to the processing units using the ordering sequence.

17. A method for performing text block extraction to extract text blocks from handwriting strokes on a computing device, the computing device comprising:

a processor having multiple processing units, a memory and at least one non-transitory computer readable medium for recognizing input under control of the processor, the method, comprising: displaying, in a display area, the handwriting strokes of digital ink which are input substantially along a handwriting orientation;
performing text line extraction to extract a number of text lines from said strokes;
ordering the extracted text lines vertically;
generating an initial text block including the first ordered text line;
generating an initial text block set including the initial text block;
setting at least one current text block set as the initial text block set;
setting at least one current text block as the initial text block;
updating iteratively, until the last ordered text line, the at least one current text block set by: generating a certain number of next text block sets, wherein the certain number of next text block sets is the number of the at least one current text block set plus the number of the at least one current text block of the at least one current text block set, by: combining the next text line with each of the at least one current text block of the at least one current text block set to generate a first subset of the certain number of the next text block sets; and including the next text line as one next text block in one of the next text block sets to generate a second subset of the certain number of next text block sets;
calculating costs of the certain number of next text block sets;
replacing, the at least one current text block set, with the at least one next text block set of the certain number of the next text block sets that fulfils one or more cost criteria;
extracting the text blocks from one of the at least one current text block sets;
identifying a first number of available processing units at the processor;
sending a corresponding number of extracted text blocks to the identified processing units to be processed in parallel for recognition.

18. The method of claim 17, further comprising:

sending successively the remaining extracted text blocks to the processing units to be processed for recognition as the processing units become available.

19. A method for processing text handwriting on a computing device, the computing device comprising:

a processor having multiple processing units, a memory and at least one non-transitory computer readable medium for recognizing input under control of the processor, the method comprising: displaying, in a display area, strokes of digital ink which are input substantially along a handwriting orientation; performing text line extraction to extract text lines from said strokes, said text line extraction comprising: slicing said display area into strips extending transversally to the handwriting orientation, wherein adjacent strips partially overlap with each other so that each stroke is contained in at least two adjacent strips; ordering, for each strip, the strokes at least partially contained in said strip to generate a first timely-ordered list of strokes arranged in a temporal order and at least one first spatially-ordered list of strokes ordered according to at least one respective spatial criterion, thereby forming a first set of ordered lists; forming, for each strip, a second set of ordered lists comprising a second timely-ordered list of strokes and at least one second spatially-ordered list of strokes by filtering out strokes below a size threshold from said first timely-ordered list and from said at least one first spatially-ordered list respectively; performing a neural net analysis to determine as a decision class, for each pair of consecutive strokes in each ordered list of said first and second set, whether the strokes of said pair belong to a same text line, in association with a probability score for said decision class; selecting, for each pair of consecutive strokes included in at least one ordered list of said first and second sets, the decision class determined with the highest probability score during the neural net analysis; and defining text lines by combining strokes into line hypotheses based on the decision class with highest probability score selected for each pair of consecutive strokes; identifying a first number of available processing units at the processor; sending a corresponding number of defined text lines to the identified processing units to be processed in parallel.

20. The method of claim 19, further comprising:

sending successively the remaining defined text lines to the processing units to be processed for recognition as the processing units become available;
compiling in order the recognized text lines when all the text lines are recognized.
Patent History
Publication number: 20240331428
Type: Application
Filed: Feb 29, 2024
Publication Date: Oct 3, 2024
Inventors: Stéphane Guyetant (Nantes), David Hébert (Nantes)
Application Number: 18/591,818
Classifications
International Classification: G06V 30/32 (20060101);