Context Analysis Or Word Recognition (e.g., Character String) Patents (Class 382/229)

Trigrams or digrams (Class 382/230)

Checking spelling for recognition (Class 382/231)

Character display device and character display method

Patent number: 6927774

Abstract: A character display device and method therefor are adapted to obtain a proximal reference point of each character comprising a character series and calculate display coordinates of each character from said proximal reference point and the display angle and display reference position of the character series.

Type: Grant

Filed: December 8, 2000

Date of Patent: August 9, 2005

Assignee: Mitsubishi Denki Kabushiki Kaisha

Inventor: Fumiko Yano
Image interpretation method and apparatus

Patent number: 6922489

Abstract: A method of interpreting an image using a statistical or probabilistic interpretation model is disclosed. The image has associated therewith contextual information. The method comprises the following steps: providing the contextual information associated with the image for analysis; analyzing the additional contextual information to identify predetermined features relating to the image; and biasing the statistical or probabilistic interpretation model in accordance with the identified features.

Type: Grant

Filed: October 29, 1998

Date of Patent: July 26, 2005

Assignees: Canon Kabushiki Kaisha, Canon Information Systems Research Australia Pty. Ltd.

Inventors: Alison Joan Lennon, Delphine Anh Dao Le
Handwriting recognition by word separation into silhouette bar codes and other feature extraction

Patent number: 6917708

Abstract: A method of automatically recognizing text. The text is divided into whole words which are each recognize. Each whole word is characterized according to its silhouette. The silhouette is characterized by features in the silhouette such as upwardly extending “polls” and downwardly extending “holes”. The silhouette may also be characterized by its first syllable blends. Numbers are assigned to each of the different characteristics, and numbers may also be assigned based on analysis of a database of different kinds of cursive words. Recognition may be automatically carry out prior recognizing system which recognizes in this way.

Type: Grant

Filed: January 19, 2001

Date of Patent: July 12, 2005

Assignee: California Institute of Technology

Inventors: Rodney M. Goodman, Donal J. Woods, Patricia A. Keaton, Joseph Chen
Detecting and utilizing add-on information from a scanned document image

Patent number: 6909805

Abstract: A scanned document image, including add-on information such as handwritten annotations in addition to printed text lines, is processed by a handwriting detection method. First, at least one projection histogram is generated from the scanned document image. A regular pattern that correlates to the printed text lines is determined from the projection histogram. Second, connected component analysis is applied to the scanned document image to generate at least one merged text line. Each merged text line relates to at least one of the handwritten annotation and the printed text line. By comparing the merged text lines to the regular pattern of the projection histograms, the printed text lines are discriminated from the handwritten annotations.

Type: Grant

Filed: January 31, 2001

Date of Patent: June 21, 2005

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Yue Ma, Jinhong Katherine Guo
Technique to identify interesting print articles for later retrieval and use of the electronic version of the articles

Patent number: 6904171

Abstract: The present invention provides a method and system for efficient information storage and retrieval of information. The method includes the steps of: scanning/selecting/capturing a selected portion of text of the information wherein the selected portion of text scanned is typically a close-to-unique identifier of the text from which the portion was excerpted and serves as a key when the information is accessed electronically; and placing the key in an electronically available index/directory to facilitate retrieval of the information. The method may further include retrieving and storing the information associated with the key and using it to index, organize, and make available for search and retrieval the full information originally viewed by the user.

Type: Grant

Filed: December 15, 2000

Date of Patent: June 7, 2005

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Pieter J. van Zee
Method and apparatus for statistical text filtering

Patent number: 6879722

Abstract: Disclosed herein is a method for automatically filtering a corpus of documents containing textual and non-textual information of a natural language. According to the method, through a first dividing step (101), the document corpus is divided into appropriate portions. At a following determining step (105), for each portion of the document corpus, there is determined a regularity value (VR) measuring the conformity of the portion with respect to character sequences probabilities predetermined for the language considered. At a comparing step (107), each regularity value (VR) is then compared with a threshold value (VT) to decide whether the conformity is sufficient. Finally, at a rejecting step (111), any portion of the document corpus whose conformity is not sufficient is rejected and removed from the corpus. An apparatus for carrying out such a method is also disclosed.

Type: Grant

Filed: June 29, 2001

Date of Patent: April 12, 2005

Assignee: International Business Machines Corporation

Inventor: Hubert Crepy
Efficient method and system for determining parameters in computerized recognition

Patent number: 6879718

Abstract: In computerized recognition having multiple experts, a method and system is described that obtains an optimum value for an expert tuning parameter in a single pass over sample tuning data. Each tuning sample is applied to two experts, resulting in scores from which ranges of parameters that correct incorrect recognition errors without changing correct results for that sample are determined. To determine the range data for a given sample, the experts return scores for each prototype in a database, the scores separated into matching and non-matching scores. The matching and non-matching scores from each expert are compared, providing upper and lower bounds defining ranges. Maxima and minima histograms track upper and lower bound range data, respectively. An analysis of the histograms based on the full set of tuning samples provides the optimum value. For tuning multiple parameters, each parameter may be optimized by this method in isolation, and then iterated.

Type: Grant

Filed: November 6, 2001

Date of Patent: April 12, 2005

Assignee: Microsoft Corp.

Inventor: Gregory N. Hullender
Character recognition method and computer-readable storage medium

Patent number: 6876765

Abstract: A character recognition method carries out a character recognition using a cross section sequence graph which describes features of a character image. The character recognition method includes the steps of (a) extracting the cross section sequence graph from a character string image, (b) analyzing a singular region of the cross section sequence graph and generating a virtual boundary point sequence in the singular region based on an analyzed result, (c) generating character candidates by combining structural elements of the cross section sequence graph and recognizing one character by supplying the virtual boundary point sequence with respect to the generated character candidates if necessary, and (d) recognizing a character string based on an adjacency relationship of the character candidates which are recognized as one character in the step (c).

Type: Grant

Filed: March 29, 2001

Date of Patent: April 5, 2005

Assignee: Ricoh Company, Ltd.

Inventor: Toshihiro Suzuki
Method and apparatus for compressing data string

Patent number: 6876774

Abstract: The present invention provides a data compression method in which a plurality of consecutive characters of a data string to be compressed are set as a character string to be searched for. Bits of a bit string representing the set character string are allocated to at least two codewords. Thus, first and second searching codewords are generated. These first and second codewords are used as array addresses. First and second array tables are prepared, in which information on the past occurrence positions of the set character string is previously entered as the contents thereof. When the first and second codewords are generated from the character string to be compressed, the first and second array tables are looked up by using these codewords as the addresses of the arrays. When results of looking up these tables match with each other, it is found that the set character string occurred in the past.

Type: Grant

Filed: August 29, 2002

Date of Patent: April 5, 2005

Assignee: Fujitsu Limited

Inventors: Noriko Satoh, Shigeru Yoshida
Method and system for mapping strings for comparison

Patent number: 6873986

Abstract: A method and system for mapping a number of characters in a string, wherein the string comprises a combination of characters representing indexed expressions and a combination of characters representing non-indexed expressions. One embodiment produces a weight array that can be utilized to compare a first and second string having indexed and non-indexed expressions. In one embodiment, a method generates a set of special weights for characters that represent indexed and non-indexed expressions. The method then associates a weight value of an indexed expression with the specific group of characters representing a specific non-indexed expression, and generates a weight array by retrieving a plurality of special weights associated with the specific group of characters representing the specific non-indexed expression and the associated weight value of the indexed expression.

Type: Grant

Filed: October 29, 2001

Date of Patent: March 29, 2005

Assignee: Microsoft Corporation

Inventors: John McConnell, Julie Bennett, Yung-Shin Lin
Word recognizing apparatus for dynamically generating feature amount of word and method thereof

Patent number: 6859556

Abstract: A word recognizing apparatus extracts the feature amount from a given image, and dynamically composes the feature amount of a candidate word to be recognized which is registered in a word list, using feature amounts of characters registered in an individual character dictionary. Then, the apparatus collates the composed feature amount of the word with the feature amount extracted from the image, calculates the degree of similarity between the two feature amounts, and outputs a recognition result.

Type: Grant

Filed: May 11, 1999

Date of Patent: February 22, 2005

Assignee: Fujitsu Limited

Inventors: Hiroaki Takebe, Yoshinobu Hotta, Satoshi Naoi
Word recognition method and storage medium that stores word recognition program

Patent number: 6847734

Abstract: In word recognition using the character recognition result, recognition processing is performed for an input character string that corresponds to a word to be recognized, a probability at which characteristics obtained as the result of character recognition are generated by conditioning characters of words contained in a word dictionary that stores in advance candidates of words to be recognized. The thus obtained probability is divided by a probability at which characteristics obtained as the result of character recognition are generated, and each of the division results obtained relevant to the characters of the words contained in the word dictionary is multiplied relevant to all the characters. The recognition results of the above words are obtained based on the multiplication results.

Type: Grant

Filed: January 26, 2001

Date of Patent: January 25, 2005

Assignee: Kabushiki Kaisha Toshiba

Inventor: Tomoyuki Hamamura
E-mail terminal automatically converting character string of reception e-mail, and e-mail system

Patent number: 6839877

Abstract: An electronic mail terminal includes a display section, a conversion dictionary which stores sets of a character string and a pictograph, a receiving section and a control section. The conversion dictionary stores sets of a character string and a pictograph. The receiving section receives an electric mail including a sentence as a conversion object sentence in a reception mode. The control section automatically refers to the character string-pictograph conversion dictionary based on each of character strings of the conversion object sentence in the reception mode to retrieve a specific pictograph corresponding to the character string, when the pictograph corresponding to the character string is registered in the character string-pictograph conversion dictionary. Then, the control section converts the character string into the specific pictograph to produce a pictograph mixed sentence, and controls the display section to display the pictograph mixed sentence.

Type: Grant

Filed: December 1, 2000

Date of Patent: January 4, 2005

Assignee: NEC Corporation

Inventor: Shinichiro Iwata
System and method for object-oriented graphically integrated command shell

Publication number: 20040264782

Abstract: A system and method for providing object-oriented graphical integrated command shell (ICS) integrates the command shell into a graphical user interface (GUI) environment in order to provide a single graphical user interface, so that the user does not need to work in different environments for different tasks. To accomplish the integration, the ICS provides interpretation of output responses that occur as result of processing textual commands entered by a user. An output response from the command shell is typically one or more lines of text from an output stream such as standard error or standard output. The output response is interpreted by the ICS to determine a meaning. Interpretation may be by pattern matching with regular expressions. If interpreted lines of command output (e.g. indicating a file or folder) map to some other object model (e.g. a file subsystem) in the UI, appropriate object model objects are created. Mapping output produces integration between different subsystems (i.e.

Type: Application

Filed: November 6, 2003

Publication date: December 30, 2004

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: David McKnight, Jeffrey Turnham
Apparatus for rough classification of words, method for rough classification of words, and record medium recording a control program thereof

Patent number: 6834121

Abstract: To provide an apparatus for rough classification of words that allows features of words to be stored in a vocabulary storage division to be generated from character codes of the words so that the words can be efficiently selected. A candidate character selection division 1 detects areas likely to be characters from a word image, and a character recognition division 2 recognizes candidate characters generated in candidate character selection division 1 and converts them into character codes. A number-of-characters estimation division 3 estimates the number of characters of the entire word image and the number of characters of the areas between candidate characters, a word description division 4 generates word description equivalent to a state transition graph from the recognition results of candidate characters and the estimated number of characters of candidate character separations.

Type: Grant

Filed: December 20, 2000

Date of Patent: December 21, 2004

Assignee: NEC Corporation

Inventors: Didier Guillevic, Keiji Yamada
Data processing apparatus and method

Publication number: 20040223647

Abstract: A data processing apparatus for inputting data by writing characters on a touch sensitive display screen. The data processing apparatus comprises a character recognition processor operable to generate an estimate of a character hand written by a user on the touch sensitive screen. The data processing apparatus includes a processing unit operable to receive the estimated character, and a graphics display device operable to receive the estimated character from the processing unit. The graphics display device is operable to display the estimated characters within a text input window of the display screen. The processing unit is operable in combination with the graphics display driver to display the estimated character on the display screen, substantially at a position proximate to a location to where the user has written the character.

Type: Application

Filed: May 7, 2004

Publication date: November 11, 2004

Applicant: Orange SA

Inventors: Alan Blount, Todd Pinkerton
Implicit page breaks for digitally represented handwriting

Patent number: 6816615

Abstract: A logical separation between pages, such as an implicit page break, is introduced to separate text entered during one handwriting session from text entered during another handwriting session. If the user leaves more than a threshold amount of blank space at the bottom of the page immediately preceding the new page, then an implicit page break may be inserted at the beginning of the new page. The amount of blank space left at the end of the preceding page may be combined with other criteria to determine whether to insert an implicit page break. The amount of time elapsed since ink has been captured on the previous page is another factor that may be used by itself or combined with other factors to determine whether to insert an implicit page break into the new page. A change in context, such as a different date or different recognized subject matter labels, is also a factor that may be considered in determining whether to insert an implicit page break.

Type: Grant

Filed: February 28, 2001

Date of Patent: November 9, 2004

Assignee: Microsoft Corporation

Inventors: Charlton E. Lui, Anthony S. Smith, Dan W. Altman, Cynthia C. Tee, Evan M. Feldman
Systems and methods for providing content filtering of a print job

Publication number: 20040207878

Abstract: Systems and methods for using a print subsystem to implement an analysis of the content of a print job prior to despooling the print job to a printing device, and selectively rendering, providing a modified rendering or terminating the print job. A computer device is connected to a printing device to selectively render a print job and includes a print subsystem, such as spooler and optionally a printer driver and a print processor. A further implementation includes a print server having a print subsystem. Print data corresponding to a print job is provided from the print subsystem input processing to a content filtering process to analyze the content thereof prior to despooling the print job to the printing device. The analysis determines if some or all of the content should be rejected, removed, replaced or require acknowledgement.

Type: Application

Filed: April 21, 2003

Publication date: October 21, 2004

Inventor: Andrew Rodney Ferlitsch
Method and device for recognising a phonetic sound sequence or character sequence

Publication number: 20040199389

Abstract: The invention relates to a method for recognizing a phonetic sound sequence or a character sequence, e.g.

Type: Application

Filed: February 12, 2004

Publication date: October 7, 2004

Inventor: Hans Geiger
Text input system for ideographic and nonideographic languages

Patent number: 6801659

Abstract: Beginning with the first letter or stroke, this invention uses the relative frequency of the sequential groups of letters or strokes from which individual words or characters are gradually built in order to provide a better way of computer indexing languages for easier and more efficient access to both the frequently used words or characters and the less-frequently used. This makes possible a system of text input that is both more efficient and more intuitive than utilizing just word or character frequency, an input approach which eliminates typing transpositions, reduces word-spelling errors or character-stroke-order uncertainty, and provides an alternative to a standard keyboard which is especially helpful with wireless phones and hand-held computers, and similar devices lacking standard keyboards. This invention can make words and characters quite accessible in an intuitive way without requiring any direct input of words or letters, strokes or characters.

Type: Grant

Filed: June 4, 2001

Date of Patent: October 5, 2004

Assignee: ZI Technology Corporation Ltd.

Inventor: Robert B. O'Dell
Method and system for maintaining alternates in association with recognized words

Patent number: 6801660

Abstract: In a computing device that receives handwritten data, a method and system that maintains an association between alternates for a given ink word, regardless of the handwritten or text state of the word, and regardless of the position of the word as it may be edited in a document. Handwritten data is maintained in an ink word data structure, and once the word is recognized and an alternate is selected for it, the first character of the word remains as an ink word (in a text buffer) pointing to the data structure, with a flag set in the data structure indicating that the word is now recognized as text. In this state, the first character is displayed to the user as a recognized text letter instead of as the handwritten word. The other characters that make up the recognized word are inserted as text into the text buffer. Any alternates returned by the recognizer are thus stored with the ink word data structure displayed as this first character of a recognized word, which also maintains the ink data, e.g.

Type: Grant

Filed: August 22, 2000

Date of Patent: October 5, 2004

Assignee: Microsoft Corporation

Inventors: Peter H. Williamson, Dan W. Altman, Charlton E. Lui
Apparatus and method of program classification based on syntax of transcript information

Patent number: 6798912

Abstract: A method of program classification based on syntax of transcript information includes receiving transcript information associated with the program wherein the transcript information has a plurality of sentences, determining characteristics of at least one of the plurality of sentences of the transcript information to identify at least the type and subject of the sentence, comparing the characteristics of the at least one of the plurality of sentences with a list of sentence characteristics having associated therewith a plurality of program types, and based on the comparing step, selecting a classification of program which is most closely associated with the characteristics of the at least one of the plurality of sentences.

Type: Grant

Filed: December 18, 2000

Date of Patent: September 28, 2004

Assignee: Koninklijke Philips Electronics N.V.

Inventor: Kavitha Devara
Image processing device and image processing program

Patent number: 6798913

Abstract: An image processing device and a computer program product capable of accurately determining a user-desired region even when a region has been only roughly marked by a user, wherein a specific region within an image to be processed is detected; the image to be processed is allocated into a plurality of blocks; text included in the image to be processed is recognized; it is determined based on a result of text recognition that presence and absence of relevance between a first block which is partially included in the specific region and a second block which is entirely included in the specific region among the allocated blocks; and it is determined whether or not an image of the first block should be treated as an image belonging to the specific region in accordance with a result of determination as to the relevance.

Type: Grant

Filed: March 16, 2001

Date of Patent: September 28, 2004

Assignee: Minolta Co., Ltd.

Inventor: Hideyuki Toriyama
Pattern string matching apparatus and pattern string matching method

Publication number: 20040184663

Abstract: This invention is to compare each character of a first character string with each character of a second character string, vote for a matrix having two sides corresponding to the characters of the first character string and the characters of the second character string and calculate values of the voting result for respective components arranged in an oblique direction of the matrix. The matching result is determined based on the calculated values of the voting result. As a result, a high-speed and highly precise matching process which is noise-resistant and takes the character arrangement into consideration can be attained.

Type: Application

Filed: March 30, 2004

Publication date: September 23, 2004

Applicant: Kabushiki Kaisha Toshiba

Inventor: Takuma Akagi
System and method for meta-pattern discovery

Patent number: 6785663

Abstract: Periodic patterns in time series data can be hierarchical in nature, where a higher level pattern may comprise repetitions of lower level patterns. In the presence of noises, these repetitions of lower level patterns may not be perfect. A novel model, namely a meta-pattern, is provided in accordance with the present invention to capture these higher level patterns. The meta-pattern can not only provide a more compact representation of patterns but also capture the regularities of pattern evolutions, which may not be expressed by previous models due to the presence of noise. A method is provided to mine meta-patterns in an iterative manner by discovering meta-patterns and their supporting subsequences in the form of lists of segments of contiguous repetitions of a meta-pattern. The number of pattern repetitions in each said segment is at least a predefined threshold min_rep and the distance between any two adjacent segments is at most a predefined threshold max_dis.

Type: Grant

Filed: December 28, 2000

Date of Patent: August 31, 2004

Assignee: International Business Machines Corporation

Inventors: Wei Wang, Jiong Yang, Philip Shi-Lung Yu
Method and system for searching for words in ink word documents

Patent number: 6785417

Abstract: In a computing device that receives handwritten data, a method and system for finding matches for recognized handwritten words, by comparing a given search word (and possibly its alternates) with the words in a document, including recognized ink words and any possible alternates for those recognized words as returned by a recognizer. One described test looks for an exact match between an entered search word (and possibly its alternates) and the recognized words and their alternates stored in a handwritten document. Other tests are possible because of the use of alternates, which also may be returned with a probability ranking. For example, one scheme looks for a percentage of matching characters, with a user-determined threshold percentage. Other variations include giving different weight to certain characters, and/or factoring in the relative number of syllables and/or the relative lengths of the words.

Type: Grant

Filed: August 22, 2000

Date of Patent: August 31, 2004

Inventors: Peter H. Williamson, Charlton E. Lui
Learning-based automatic commercial content detection

Publication number: 20040161154

Abstract: Systems and methods for learning-based automatic commercial content detection are described. In one aspect, program data is divided into multiple segments. The segments are analyzed to determine visual, audio, and context-based feature sets that differentiate commercial content from non-commercial content. The context-based features are a function of single-side left and/or right neighborhoods of segments of the multiple segments.

Type: Application

Filed: February 18, 2003

Publication date: August 19, 2004

Inventors: Xian-Sheng Hua, Lie Lu, Mingjing Li, Hong-Jiang Zhang
Data sheet identification device

Patent number: 6778712

Abstract: A data sheet identification device of the invention includes: a character/graphics extracting section, an identical shape deciding section, a graphics collating section, an identification code/data sheet ID identifying section for collating characters that have been decided to have the same shape with an identification code/data sheet ID database in which a plurality of characters showing features of a plurality of data sheets respectively have been registered, and an identifying section for uniquely identifying the data sheet based on a result of the collation by the graphics collating section and a result of the collation by the identification code/data sheet ID identifying section.

Type: Grant

Filed: August 29, 2000

Date of Patent: August 17, 2004

Assignee: Fujitsu Limited

Inventors: Maki Yabuki, Shinichi Eguchi, Kouichi Kanamoto, Katsutoshi Kobara, Koichi Chiba, Toshiyuki Waida, Kazunori Yamamoto, Yutaka Katsumata
Method and system for extending ink word data structures while maintaining version compatibility

Patent number: 6771817

Abstract: In a computing device that receives handwritten data, a method and data structure that enables extended data to be added to an existing ink word data structure without compromising backwards-compatibility. A flag in the header data structure is indicates to new ink processing programs the presence or absence of the extended data, and the size information maintained in the header is adjusted ensure that earlier versions of ink programs do not lose the extended data. The extended data is then added by including it in a copy of the existing ink word data structure, along with a tail structure that includes information describing the extended data and the tail structure to the new ink code, e.g., version and offset information. The tail structure can be used to locate a list of alternate word choices for an ink word that are maintained within the extended data.

Type: Grant

Filed: August 22, 2000

Date of Patent: August 3, 2004

Assignee: Microsoft Corporation

Inventors: Peter H. Williamson, Charlton E. Lui, Dan W. Altman
Method and system for interactive ground-truthing of document images

Patent number: 6768816

Abstract: A method and a system by which a document image is analyzed for the purposes of establishing a searchable data structure characterizing ground-truthed contents of the document represented by the document image operates by segmenting a document image into a set of image objects, and linking the image objects with fields that store metadata. Image objects identified by segmenting the document image are grouped into subsets. The image objects are grouped according to characteristics suggesting that the image objects may have common ground-truthed metadata. By grouping the image objects into subsets, the image objects may be indexed to facilitate the ground-truthing process. In some embodiments, the index of representative image objects is presented to the user in a table form. A database of image objects with ground-truthed metadata is formed. Interactive tools and processes facilitate ground-truthing based on paired image objects and metadata.

Type: Grant

Filed: June 13, 2002

Date of Patent: July 27, 2004

Assignee: Convey Corporation

Inventors: Floyd Steven Hall, Jr., Cameron Telfer Howie
Text selection from images of documents using auto-completion

Patent number: 6766069

Abstract: A user-interface for selecting text from images of documents using auto-completion is described. The auto-completion process may be used to complete words (or text sequences), phrases, sentences, paragraphs, or other groupings of words. In response to user input, the OCR results for one or more images of documents are searched. The user input may include typing in a partial word (or the initial characters in a text sequence) via an input device or alternatively, annotations made by a user on a hardcopy document prior to scanning the document. One or more word matches are presented to the user for acceptance until the user accepts a word match or until all word matches have been presented to the user. Once a user accepts a word match, the word match is copied into an electronic document such as a word processing document, spreadsheet document, or other electronic document created by an application program.

Type: Grant

Filed: December 21, 1999

Date of Patent: July 20, 2004

Assignee: Xerox Corporation

Inventors: Christopher R. Dance, William M. Newman, Alex S. Taylor, Stuart A. Taylor
Grammar-determined handwriting recognition

Publication number: 20040126017

Abstract: A system (10) for recognizing handwriting includes an input/output device (12) and a second computer (24 or 28). The system (10) converts handwritten symbols to text by using a grammar (50) that is comprised of the text (60) that is expected to be entered into a text display/text input area (17) of an input/output device (12). The grammar (50) and handwriting-to-text conversion can be performed in either the input/output device (12) or a remote computer (24, 28).

Type: Application

Filed: December 30, 2002

Publication date: July 1, 2004

Inventors: Giovanni Seni, Fabio Valente, Guo Jin
System and method for recognizing word patterns based on a virtual keyboard layout

Publication number: 20040120583

Abstract: A system augments stylus keyboarding with shorthand gesturing. The system defines a shorthand symbol for each word according to its movement pattern on an optimized stylus keyboard. The system recognizes word patterns by identifying an input as a stroke, and then matching the stroke to a stored list of word patterns. The system then generates and displays the matched word to the user.

Type: Application

Filed: December 20, 2002

Publication date: June 24, 2004

Applicant: International Business Machines Corporation

Inventor: Shumin Zhai
Systems and methods for rendering image-based data

Patent number: 6754391

Abstract: Systems and methods for rendering image-based data are disclosed. A representative system includes a data interface that receives a remotely-generated data stream; a data manager coupled to the data interface, the data manager configured to translate the remotely-generated data stream into a plurality of word blocks, wherein the data manager determines for each word block of interest whether an active line can accommodate an entire word block of interest prior to registering the word block with the active line and wherein the data manager increments the active line in response to a determination that the word block of interest would not be accommodated on the active line; and a display device coupled to the data manager, the display device configured to render the plurality of word blocks.

Type: Grant

Filed: June 25, 2002

Date of Patent: June 22, 2004

Assignee: Hewlett-Packard Development Company, LP.

Inventor: Frank P Carau, Sr.
Method and system of matching ink processor and recognizer word breaks

Patent number: 6754386

Abstract: In a computing device that receives handwritten data, a method and system that corrects for parser segmentation errors by sending an entire line of ink to a recognizer, and then comparing, on a word-by-word basis, the initial segmentation guesses of the parser with the more-thoroughly recognized segmentation results of the handwriting recognition engine. In the correction process, the ink words are efficiently adjusted with relatively little data manipulation. As the recognizer is fed a series of strokes on a line, the recognizer returns segmentation information. For ink word breaks that are the same for any given set of data, the existing ink word is unchanged. For ink words that are recognized differently relative to their initial segmentation, one or more new ink words are created and the handwriting (including stroke) data of the parser's ink word is manipulated to create a new ink processor word (or words) to match the recognizer output.

Type: Grant

Filed: August 22, 2000

Date of Patent: June 22, 2004

Assignee: Microsft Corporation

Inventors: Peter H. Williamson, Charlton E. Lui, Dan W. Altman
Image retrieval system

Patent number: 6754675

Abstract: An image retrieval system contains a database with a large number of images. The system retrieves images from the database that are similar to a query image entered by the user. The images in the database are grouped in clusters according to a similarity criterion so that mutually similar images reside in the same cluster. Each cluster has a cluster center which is representative for the images in it. A first step of the search to similar images selects the clusters that may contain images similar with the query image, by comparing the query image with the cluster centers of all clusters. A second step of the search compares the images in the selected clusters with the query image in order to determine their similarity with the query image.

Type: Grant

Filed: July 16, 2001

Date of Patent: June 22, 2004

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Mohammed S. Abdel-Mottaleb, Santhana Krishnamachari
Statistical representation and coding of light field data

Publication number: 20040114807

Abstract: A method of representing light field data by capturing a set of images of at least one object in a passive manner at a virtual surface where a center of projection of an acquisition device that captures the set of images lies and generating a representation of the captured set of images using a statistical analysis transformation based on a parameterization that involves the virtual surface.

Type: Application

Filed: December 13, 2002

Publication date: June 17, 2004

Inventors: Dan Lelescu, Frank Jan Bossen
User interface for initiating the export of an optimized scanned document using drag and drop

Patent number: 6751780

Abstract: A user interface method for launching an optimized final scan of a selected region of interest selected from a preview scan of a document. A user may drag the selected region of interest, presented in a preview scan of a document in a scanner window, and drop it on a software application, the desktop, or a writeable folder, which launches an optimized final scan of the selected region of interest. The image data resulting from the optimized final scan automatically resides in the software application, the desktop, or the writeable folder. In selecting a region of interest from the preview scan, scanner software parameters are updated with information about the region of interest which optimize the final scan. The image data from the optimized final scan is then formatted in the format requested and delivered to the software application, the desktop, or the writeable folder.

Type: Grant

Filed: October 1, 1998

Date of Patent: June 15, 2004

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Theodore W. Neff, Jeffrey P. Lee, Patricia D. Lopez
Method and apparatus for formatting OCR text

Patent number: 6741745

Abstract: Following scanning of a document image, and optical character recognition (OCR) processing, the outputted OCR text is processed to determine a text format (typeface and font size) to match the OCR text to the originally scanned image. The text format is identified by matching word sizes rather than individual character sizes. In particular, for each word and for each of a plurality of candidate typefaces, a scaling factor is calculated to match a typeface rendering of the word to the width of the word in the originally scanned image. After all of the scaling factors have been calculated, a cluster analysis is performed to identify close clusters of scaling factors for a typeface, indicative of a good typeface fit at a constant scaling factor (font size).

Type: Grant

Filed: December 18, 2000

Date of Patent: May 25, 2004

Assignee: Xerox Corporation

Inventors: Christopher R. Dance, Mauritius Seeger
Compiliable language for extracting objects from an image using a primitive image map

Patent number: 6741744

Abstract: The invention features a method wherein a recognition environment utilizes pseudo-English as a programming language to extract simple and complex objects with image-and/or map-data as inputs. Based on this human/computer interface in which pseudo-English is a programming language, the object-recognition system has three major logic modules: (1) an input data module; (2) an information-processing module, coupled with the above-noted human computer interface (HCI) module; and (3) an output module that has a feedback mechanism back to the main information-processing and the input-data module. A physical phenomenon (i.e., one that is visible, audible, tactile, etc.) is analyzed by the information-processing module to determine whether it is susceptible to description or articulation. If not, the phenomenon is matched or compared, via the output module, to a known articulatable, physical-phenomenon model and recognizable features are extracted.

Type: Grant

Filed: April 17, 1999

Date of Patent: May 25, 2004

Inventor: Shin-yi Hsu
Pattern string matching apparatus and pattern string matching method

Patent number: 6738515

Abstract: This invention is to compare each character of a first character string with each character of a second character string, vote for a matrix having two sides corresponding to the characters of the first character string and the characters of the second character string and calculate values of the voting result for respective components arranged in an oblique direction of the matrix. The matching result is determined based on the calculated values of the voting result. As a result, a high-speed and highly precise matching process which is noise-resistant and takes the character arrangement into consideration can be attained.

Type: Grant

Filed: July 27, 2000

Date of Patent: May 18, 2004

Assignee: Kabushiki Kaisha Toshiba

Inventor: Takuma Akagi
Lattice and method for identifying and normalizing orthographic variations in Japanese text

Patent number: 6731802

Abstract: A lattice data structure suitable for storage on a computer-readable medium is provided which represents a plurality of orthographic forms of a Japanese lexical entry. The lattice includes a plurality of data fields each adapted to hold data representing a word element of the entry. Each data field includes a first subfield containing data representing a primary form of the corresponding word element and a second field containing data representing an alternate form of the corresponding word element. Also provided is a method of normalizing Japanese lexical entries to produce a normalized form that includes the primary form of each word-element representation of the lattice and does not include the alternate forms. Also provided are methods of segmenting text using the disclosed lattice.

Type: Grant

Filed: May 2, 2000

Date of Patent: May 4, 2004

Assignee: Microsoft Corporation

Inventors: Gary Kacmarcik, Christopher J. Brockett
Systems and methods for processing text-based electronic documents

Publication number: 20040037470

Abstract: Systems and methods for processing text-based electronic documents are provided. Briefly described, one embodiment of a method for processing a text-based electronic document comprises the steps of: comparing at least one word in a text-based electronic document to a native language dictionary to determine whether the at least one word conforms to a predefined rule; for each of the at least one word that does not conform to the predefined rule, fragmenting the at least one word into word fragments; combining at least two consecutive word fragments; and comparing the combination of the word fragments to the native language dictionary.

Type: Application

Filed: August 23, 2002

Publication date: February 26, 2004

Inventor: Steven J. Simske
Document image decoding systems and methods using modified stack algorithm

Publication number: 20040028278

Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improve decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.

Type: Application

Filed: August 9, 2002

Publication date: February 12, 2004

Applicant: XEROX CORPORATION

Inventors: Daniel H. Greene, Justin K. Romberg, Tze-Lei Poo, Ashok C. Popat
Document image decoding systems and methods using modified stack algorithm

Publication number: 20040028279

Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improve decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.

Type: Application

Filed: August 9, 2002

Publication date: February 12, 2004

Applicant: XEROX CORPORATION

Inventors: Daniel H. Greene, Justin K. Romberg, Ashok C. Popat
Document image decoding systems and methods using modified stack algorithm

Publication number: 20040028280

Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improved decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.

Type: Application

Filed: August 9, 2002

Publication date: February 12, 2004

Applicant: XEROX CORPORATION

Inventors: Daniel H. Greene, Tze-Lei Poo, Ashok C. Popat
Parameterized word segmentation of unsegmented text

Patent number: 6678409

Abstract: The present invention segments a non-segmented input text. The input text is received and segmented based on parameter values associated with parameterized word formation rules. In one illustrative embodiment, the input text is processed into a form which includes parameter indications, but which preserves the word-internal structure of the input text. Thus, the parameter values can be changed without entirely re-processing the input text.

Type: Grant

Filed: January 14, 2000

Date of Patent: January 13, 2004

Assignee: Microsoft Corporation

Inventors: Andi Wu, Zixin Jiang
Document image decoding using an integrated stochastic language model

Patent number: 6678415

Abstract: A text recognition system represents the decoded message of a document image as a path through an image network. A method for integrating a language model into the network selectively expands the network to accommodate the language model only for certain ones of the paths in the network, effectively managing the memory storage requirements and computational complexities of integrating the language model efficiently into the network. The language model generates probability distributions indicating the probability of a certain character occurring in a string, given one or more previous characters in the string. Selectively expanding the image network is achieved by initially using upper bounds on the language model probabilities on the branches of an unexpanded image network. A best path search operation is then performed to determine an estimated best path through the image network using these upper bound scores.

Type: Grant

Filed: May 12, 2000

Date of Patent: January 13, 2004

Assignee: Xerox Corporation

Inventors: Ashok C. Popat, Dan S. Bloomberg, Daniel H. Greene
Method, system, and program for determining boundaries in a string using a dictionary

Patent number: 6671856

Abstract: Disclosed is a system, method, and program for determining boundaries in a string of characters using a dictionary, wherein the substrings in the dictionary may comprise words. A determination is made of all possible initial substrings of the string in the dictionary. One initial substring is selected such that all the characters following the initial substring can be divided into at least one substring in the dictionary. The boundaries follow each of the initial substring and the at least one substring that includes all the characters following the initial substring.

Type: Grant

Filed: September 1, 1999

Date of Patent: December 30, 2003

Assignee: International Business Machines Corporation

Inventor: Richard Theodore Gillam
Character matching process for text converted from images

Patent number: 6668085

Abstract: An improved method of deriving the correct text from text with errors converted from a character recognition device includes the need for significantly less human intervention for correction of the converted text.

Type: Grant

Filed: August 1, 2000

Date of Patent: December 23, 2003

Assignee: Xerox Corporation

Inventor: William D. Evans

prev … 9 10 11 12 13 14 15 16 17 next