Context Analysis Or Word Recognition (e.g., Character String) Patents (Class 382/229)
  • Patent number: 6927774
    Abstract: A character display device and method therefor are adapted to obtain a proximal reference point of each character comprising a character series and calculate display coordinates of each character from said proximal reference point and the display angle and display reference position of the character series.
    Type: Grant
    Filed: December 8, 2000
    Date of Patent: August 9, 2005
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Fumiko Yano
  • Patent number: 6922489
    Abstract: A method of interpreting an image using a statistical or probabilistic interpretation model is disclosed. The image has associated therewith contextual information. The method comprises the following steps: providing the contextual information associated with the image for analysis; analyzing the additional contextual information to identify predetermined features relating to the image; and biasing the statistical or probabilistic interpretation model in accordance with the identified features.
    Type: Grant
    Filed: October 29, 1998
    Date of Patent: July 26, 2005
    Assignees: Canon Kabushiki Kaisha, Canon Information Systems Research Australia Pty. Ltd.
    Inventors: Alison Joan Lennon, Delphine Anh Dao Le
  • Patent number: 6917708
    Abstract: A method of automatically recognizing text. The text is divided into whole words which are each recognize. Each whole word is characterized according to its silhouette. The silhouette is characterized by features in the silhouette such as upwardly extending “polls” and downwardly extending “holes”. The silhouette may also be characterized by its first syllable blends. Numbers are assigned to each of the different characteristics, and numbers may also be assigned based on analysis of a database of different kinds of cursive words. Recognition may be automatically carry out prior recognizing system which recognizes in this way.
    Type: Grant
    Filed: January 19, 2001
    Date of Patent: July 12, 2005
    Assignee: California Institute of Technology
    Inventors: Rodney M. Goodman, Donal J. Woods, Patricia A. Keaton, Joseph Chen
  • Patent number: 6909805
    Abstract: A scanned document image, including add-on information such as handwritten annotations in addition to printed text lines, is processed by a handwriting detection method. First, at least one projection histogram is generated from the scanned document image. A regular pattern that correlates to the printed text lines is determined from the projection histogram. Second, connected component analysis is applied to the scanned document image to generate at least one merged text line. Each merged text line relates to at least one of the handwritten annotation and the printed text line. By comparing the merged text lines to the regular pattern of the projection histograms, the printed text lines are discriminated from the handwritten annotations.
    Type: Grant
    Filed: January 31, 2001
    Date of Patent: June 21, 2005
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Yue Ma, Jinhong Katherine Guo
  • Patent number: 6904171
    Abstract: The present invention provides a method and system for efficient information storage and retrieval of information. The method includes the steps of: scanning/selecting/capturing a selected portion of text of the information wherein the selected portion of text scanned is typically a close-to-unique identifier of the text from which the portion was excerpted and serves as a key when the information is accessed electronically; and placing the key in an electronically available index/directory to facilitate retrieval of the information. The method may further include retrieving and storing the information associated with the key and using it to index, organize, and make available for search and retrieval the full information originally viewed by the user.
    Type: Grant
    Filed: December 15, 2000
    Date of Patent: June 7, 2005
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Pieter J. van Zee
  • Patent number: 6879722
    Abstract: Disclosed herein is a method for automatically filtering a corpus of documents containing textual and non-textual information of a natural language. According to the method, through a first dividing step (101), the document corpus is divided into appropriate portions. At a following determining step (105), for each portion of the document corpus, there is determined a regularity value (VR) measuring the conformity of the portion with respect to character sequences probabilities predetermined for the language considered. At a comparing step (107), each regularity value (VR) is then compared with a threshold value (VT) to decide whether the conformity is sufficient. Finally, at a rejecting step (111), any portion of the document corpus whose conformity is not sufficient is rejected and removed from the corpus. An apparatus for carrying out such a method is also disclosed.
    Type: Grant
    Filed: June 29, 2001
    Date of Patent: April 12, 2005
    Assignee: International Business Machines Corporation
    Inventor: Hubert Crepy
  • Patent number: 6879718
    Abstract: In computerized recognition having multiple experts, a method and system is described that obtains an optimum value for an expert tuning parameter in a single pass over sample tuning data. Each tuning sample is applied to two experts, resulting in scores from which ranges of parameters that correct incorrect recognition errors without changing correct results for that sample are determined. To determine the range data for a given sample, the experts return scores for each prototype in a database, the scores separated into matching and non-matching scores. The matching and non-matching scores from each expert are compared, providing upper and lower bounds defining ranges. Maxima and minima histograms track upper and lower bound range data, respectively. An analysis of the histograms based on the full set of tuning samples provides the optimum value. For tuning multiple parameters, each parameter may be optimized by this method in isolation, and then iterated.
    Type: Grant
    Filed: November 6, 2001
    Date of Patent: April 12, 2005
    Assignee: Microsoft Corp.
    Inventor: Gregory N. Hullender
  • Patent number: 6876765
    Abstract: A character recognition method carries out a character recognition using a cross section sequence graph which describes features of a character image. The character recognition method includes the steps of (a) extracting the cross section sequence graph from a character string image, (b) analyzing a singular region of the cross section sequence graph and generating a virtual boundary point sequence in the singular region based on an analyzed result, (c) generating character candidates by combining structural elements of the cross section sequence graph and recognizing one character by supplying the virtual boundary point sequence with respect to the generated character candidates if necessary, and (d) recognizing a character string based on an adjacency relationship of the character candidates which are recognized as one character in the step (c).
    Type: Grant
    Filed: March 29, 2001
    Date of Patent: April 5, 2005
    Assignee: Ricoh Company, Ltd.
    Inventor: Toshihiro Suzuki
  • Patent number: 6876774
    Abstract: The present invention provides a data compression method in which a plurality of consecutive characters of a data string to be compressed are set as a character string to be searched for. Bits of a bit string representing the set character string are allocated to at least two codewords. Thus, first and second searching codewords are generated. These first and second codewords are used as array addresses. First and second array tables are prepared, in which information on the past occurrence positions of the set character string is previously entered as the contents thereof. When the first and second codewords are generated from the character string to be compressed, the first and second array tables are looked up by using these codewords as the addresses of the arrays. When results of looking up these tables match with each other, it is found that the set character string occurred in the past.
    Type: Grant
    Filed: August 29, 2002
    Date of Patent: April 5, 2005
    Assignee: Fujitsu Limited
    Inventors: Noriko Satoh, Shigeru Yoshida
  • Patent number: 6873986
    Abstract: A method and system for mapping a number of characters in a string, wherein the string comprises a combination of characters representing indexed expressions and a combination of characters representing non-indexed expressions. One embodiment produces a weight array that can be utilized to compare a first and second string having indexed and non-indexed expressions. In one embodiment, a method generates a set of special weights for characters that represent indexed and non-indexed expressions. The method then associates a weight value of an indexed expression with the specific group of characters representing a specific non-indexed expression, and generates a weight array by retrieving a plurality of special weights associated with the specific group of characters representing the specific non-indexed expression and the associated weight value of the indexed expression.
    Type: Grant
    Filed: October 29, 2001
    Date of Patent: March 29, 2005
    Assignee: Microsoft Corporation
    Inventors: John McConnell, Julie Bennett, Yung-Shin Lin
  • Patent number: 6859556
    Abstract: A word recognizing apparatus extracts the feature amount from a given image, and dynamically composes the feature amount of a candidate word to be recognized which is registered in a word list, using feature amounts of characters registered in an individual character dictionary. Then, the apparatus collates the composed feature amount of the word with the feature amount extracted from the image, calculates the degree of similarity between the two feature amounts, and outputs a recognition result.
    Type: Grant
    Filed: May 11, 1999
    Date of Patent: February 22, 2005
    Assignee: Fujitsu Limited
    Inventors: Hiroaki Takebe, Yoshinobu Hotta, Satoshi Naoi
  • Patent number: 6847734
    Abstract: In word recognition using the character recognition result, recognition processing is performed for an input character string that corresponds to a word to be recognized, a probability at which characteristics obtained as the result of character recognition are generated by conditioning characters of words contained in a word dictionary that stores in advance candidates of words to be recognized. The thus obtained probability is divided by a probability at which characteristics obtained as the result of character recognition are generated, and each of the division results obtained relevant to the characters of the words contained in the word dictionary is multiplied relevant to all the characters. The recognition results of the above words are obtained based on the multiplication results.
    Type: Grant
    Filed: January 26, 2001
    Date of Patent: January 25, 2005
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Tomoyuki Hamamura
  • Patent number: 6839877
    Abstract: An electronic mail terminal includes a display section, a conversion dictionary which stores sets of a character string and a pictograph, a receiving section and a control section. The conversion dictionary stores sets of a character string and a pictograph. The receiving section receives an electric mail including a sentence as a conversion object sentence in a reception mode. The control section automatically refers to the character string-pictograph conversion dictionary based on each of character strings of the conversion object sentence in the reception mode to retrieve a specific pictograph corresponding to the character string, when the pictograph corresponding to the character string is registered in the character string-pictograph conversion dictionary. Then, the control section converts the character string into the specific pictograph to produce a pictograph mixed sentence, and controls the display section to display the pictograph mixed sentence.
    Type: Grant
    Filed: December 1, 2000
    Date of Patent: January 4, 2005
    Assignee: NEC Corporation
    Inventor: Shinichiro Iwata
  • Publication number: 20040264782
    Abstract: A system and method for providing object-oriented graphical integrated command shell (ICS) integrates the command shell into a graphical user interface (GUI) environment in order to provide a single graphical user interface, so that the user does not need to work in different environments for different tasks. To accomplish the integration, the ICS provides interpretation of output responses that occur as result of processing textual commands entered by a user. An output response from the command shell is typically one or more lines of text from an output stream such as standard error or standard output. The output response is interpreted by the ICS to determine a meaning. Interpretation may be by pattern matching with regular expressions. If interpreted lines of command output (e.g. indicating a file or folder) map to some other object model (e.g. a file subsystem) in the UI, appropriate object model objects are created. Mapping output produces integration between different subsystems (i.e.
    Type: Application
    Filed: November 6, 2003
    Publication date: December 30, 2004
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David McKnight, Jeffrey Turnham
  • Patent number: 6834121
    Abstract: To provide an apparatus for rough classification of words that allows features of words to be stored in a vocabulary storage division to be generated from character codes of the words so that the words can be efficiently selected. A candidate character selection division 1 detects areas likely to be characters from a word image, and a character recognition division 2 recognizes candidate characters generated in candidate character selection division 1 and converts them into character codes. A number-of-characters estimation division 3 estimates the number of characters of the entire word image and the number of characters of the areas between candidate characters, a word description division 4 generates word description equivalent to a state transition graph from the recognition results of candidate characters and the estimated number of characters of candidate character separations.
    Type: Grant
    Filed: December 20, 2000
    Date of Patent: December 21, 2004
    Assignee: NEC Corporation
    Inventors: Didier Guillevic, Keiji Yamada
  • Publication number: 20040223647
    Abstract: A data processing apparatus for inputting data by writing characters on a touch sensitive display screen. The data processing apparatus comprises a character recognition processor operable to generate an estimate of a character hand written by a user on the touch sensitive screen. The data processing apparatus includes a processing unit operable to receive the estimated character, and a graphics display device operable to receive the estimated character from the processing unit. The graphics display device is operable to display the estimated characters within a text input window of the display screen. The processing unit is operable in combination with the graphics display driver to display the estimated character on the display screen, substantially at a position proximate to a location to where the user has written the character.
    Type: Application
    Filed: May 7, 2004
    Publication date: November 11, 2004
    Applicant: Orange SA
    Inventors: Alan Blount, Todd Pinkerton
  • Patent number: 6816615
    Abstract: A logical separation between pages, such as an implicit page break, is introduced to separate text entered during one handwriting session from text entered during another handwriting session. If the user leaves more than a threshold amount of blank space at the bottom of the page immediately preceding the new page, then an implicit page break may be inserted at the beginning of the new page. The amount of blank space left at the end of the preceding page may be combined with other criteria to determine whether to insert an implicit page break. The amount of time elapsed since ink has been captured on the previous page is another factor that may be used by itself or combined with other factors to determine whether to insert an implicit page break into the new page. A change in context, such as a different date or different recognized subject matter labels, is also a factor that may be considered in determining whether to insert an implicit page break.
    Type: Grant
    Filed: February 28, 2001
    Date of Patent: November 9, 2004
    Assignee: Microsoft Corporation
    Inventors: Charlton E. Lui, Anthony S. Smith, Dan W. Altman, Cynthia C. Tee, Evan M. Feldman
  • Publication number: 20040207878
    Abstract: Systems and methods for using a print subsystem to implement an analysis of the content of a print job prior to despooling the print job to a printing device, and selectively rendering, providing a modified rendering or terminating the print job. A computer device is connected to a printing device to selectively render a print job and includes a print subsystem, such as spooler and optionally a printer driver and a print processor. A further implementation includes a print server having a print subsystem. Print data corresponding to a print job is provided from the print subsystem input processing to a content filtering process to analyze the content thereof prior to despooling the print job to the printing device. The analysis determines if some or all of the content should be rejected, removed, replaced or require acknowledgement.
    Type: Application
    Filed: April 21, 2003
    Publication date: October 21, 2004
    Inventor: Andrew Rodney Ferlitsch
  • Publication number: 20040199389
    Abstract: The invention relates to a method for recognizing a phonetic sound sequence or a character sequence, e.g.
    Type: Application
    Filed: February 12, 2004
    Publication date: October 7, 2004
    Inventor: Hans Geiger
  • Patent number: 6801659
    Abstract: Beginning with the first letter or stroke, this invention uses the relative frequency of the sequential groups of letters or strokes from which individual words or characters are gradually built in order to provide a better way of computer indexing languages for easier and more efficient access to both the frequently used words or characters and the less-frequently used. This makes possible a system of text input that is both more efficient and more intuitive than utilizing just word or character frequency, an input approach which eliminates typing transpositions, reduces word-spelling errors or character-stroke-order uncertainty, and provides an alternative to a standard keyboard which is especially helpful with wireless phones and hand-held computers, and similar devices lacking standard keyboards. This invention can make words and characters quite accessible in an intuitive way without requiring any direct input of words or letters, strokes or characters.
    Type: Grant
    Filed: June 4, 2001
    Date of Patent: October 5, 2004
    Assignee: ZI Technology Corporation Ltd.
    Inventor: Robert B. O'Dell
  • Patent number: 6801660
    Abstract: In a computing device that receives handwritten data, a method and system that maintains an association between alternates for a given ink word, regardless of the handwritten or text state of the word, and regardless of the position of the word as it may be edited in a document. Handwritten data is maintained in an ink word data structure, and once the word is recognized and an alternate is selected for it, the first character of the word remains as an ink word (in a text buffer) pointing to the data structure, with a flag set in the data structure indicating that the word is now recognized as text. In this state, the first character is displayed to the user as a recognized text letter instead of as the handwritten word. The other characters that make up the recognized word are inserted as text into the text buffer. Any alternates returned by the recognizer are thus stored with the ink word data structure displayed as this first character of a recognized word, which also maintains the ink data, e.g.
    Type: Grant
    Filed: August 22, 2000
    Date of Patent: October 5, 2004
    Assignee: Microsoft Corporation
    Inventors: Peter H. Williamson, Dan W. Altman, Charlton E. Lui
  • Patent number: 6798912
    Abstract: A method of program classification based on syntax of transcript information includes receiving transcript information associated with the program wherein the transcript information has a plurality of sentences, determining characteristics of at least one of the plurality of sentences of the transcript information to identify at least the type and subject of the sentence, comparing the characteristics of the at least one of the plurality of sentences with a list of sentence characteristics having associated therewith a plurality of program types, and based on the comparing step, selecting a classification of program which is most closely associated with the characteristics of the at least one of the plurality of sentences.
    Type: Grant
    Filed: December 18, 2000
    Date of Patent: September 28, 2004
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Kavitha Devara
  • Patent number: 6798913
    Abstract: An image processing device and a computer program product capable of accurately determining a user-desired region even when a region has been only roughly marked by a user, wherein a specific region within an image to be processed is detected; the image to be processed is allocated into a plurality of blocks; text included in the image to be processed is recognized; it is determined based on a result of text recognition that presence and absence of relevance between a first block which is partially included in the specific region and a second block which is entirely included in the specific region among the allocated blocks; and it is determined whether or not an image of the first block should be treated as an image belonging to the specific region in accordance with a result of determination as to the relevance.
    Type: Grant
    Filed: March 16, 2001
    Date of Patent: September 28, 2004
    Assignee: Minolta Co., Ltd.
    Inventor: Hideyuki Toriyama
  • Publication number: 20040184663
    Abstract: This invention is to compare each character of a first character string with each character of a second character string, vote for a matrix having two sides corresponding to the characters of the first character string and the characters of the second character string and calculate values of the voting result for respective components arranged in an oblique direction of the matrix. The matching result is determined based on the calculated values of the voting result. As a result, a high-speed and highly precise matching process which is noise-resistant and takes the character arrangement into consideration can be attained.
    Type: Application
    Filed: March 30, 2004
    Publication date: September 23, 2004
    Applicant: Kabushiki Kaisha Toshiba
    Inventor: Takuma Akagi
  • Patent number: 6785663
    Abstract: Periodic patterns in time series data can be hierarchical in nature, where a higher level pattern may comprise repetitions of lower level patterns. In the presence of noises, these repetitions of lower level patterns may not be perfect. A novel model, namely a meta-pattern, is provided in accordance with the present invention to capture these higher level patterns. The meta-pattern can not only provide a more compact representation of patterns but also capture the regularities of pattern evolutions, which may not be expressed by previous models due to the presence of noise. A method is provided to mine meta-patterns in an iterative manner by discovering meta-patterns and their supporting subsequences in the form of lists of segments of contiguous repetitions of a meta-pattern. The number of pattern repetitions in each said segment is at least a predefined threshold min_rep and the distance between any two adjacent segments is at most a predefined threshold max_dis.
    Type: Grant
    Filed: December 28, 2000
    Date of Patent: August 31, 2004
    Assignee: International Business Machines Corporation
    Inventors: Wei Wang, Jiong Yang, Philip Shi-Lung Yu
  • Patent number: 6785417
    Abstract: In a computing device that receives handwritten data, a method and system for finding matches for recognized handwritten words, by comparing a given search word (and possibly its alternates) with the words in a document, including recognized ink words and any possible alternates for those recognized words as returned by a recognizer. One described test looks for an exact match between an entered search word (and possibly its alternates) and the recognized words and their alternates stored in a handwritten document. Other tests are possible because of the use of alternates, which also may be returned with a probability ranking. For example, one scheme looks for a percentage of matching characters, with a user-determined threshold percentage. Other variations include giving different weight to certain characters, and/or factoring in the relative number of syllables and/or the relative lengths of the words.
    Type: Grant
    Filed: August 22, 2000
    Date of Patent: August 31, 2004
    Inventors: Peter H. Williamson, Charlton E. Lui
  • Publication number: 20040161154
    Abstract: Systems and methods for learning-based automatic commercial content detection are described. In one aspect, program data is divided into multiple segments. The segments are analyzed to determine visual, audio, and context-based feature sets that differentiate commercial content from non-commercial content. The context-based features are a function of single-side left and/or right neighborhoods of segments of the multiple segments.
    Type: Application
    Filed: February 18, 2003
    Publication date: August 19, 2004
    Inventors: Xian-Sheng Hua, Lie Lu, Mingjing Li, Hong-Jiang Zhang
  • Patent number: 6778712
    Abstract: A data sheet identification device of the invention includes: a character/graphics extracting section, an identical shape deciding section, a graphics collating section, an identification code/data sheet ID identifying section for collating characters that have been decided to have the same shape with an identification code/data sheet ID database in which a plurality of characters showing features of a plurality of data sheets respectively have been registered, and an identifying section for uniquely identifying the data sheet based on a result of the collation by the graphics collating section and a result of the collation by the identification code/data sheet ID identifying section.
    Type: Grant
    Filed: August 29, 2000
    Date of Patent: August 17, 2004
    Assignee: Fujitsu Limited
    Inventors: Maki Yabuki, Shinichi Eguchi, Kouichi Kanamoto, Katsutoshi Kobara, Koichi Chiba, Toshiyuki Waida, Kazunori Yamamoto, Yutaka Katsumata
  • Patent number: 6771817
    Abstract: In a computing device that receives handwritten data, a method and data structure that enables extended data to be added to an existing ink word data structure without compromising backwards-compatibility. A flag in the header data structure is indicates to new ink processing programs the presence or absence of the extended data, and the size information maintained in the header is adjusted ensure that earlier versions of ink programs do not lose the extended data. The extended data is then added by including it in a copy of the existing ink word data structure, along with a tail structure that includes information describing the extended data and the tail structure to the new ink code, e.g., version and offset information. The tail structure can be used to locate a list of alternate word choices for an ink word that are maintained within the extended data.
    Type: Grant
    Filed: August 22, 2000
    Date of Patent: August 3, 2004
    Assignee: Microsoft Corporation
    Inventors: Peter H. Williamson, Charlton E. Lui, Dan W. Altman
  • Patent number: 6768816
    Abstract: A method and a system by which a document image is analyzed for the purposes of establishing a searchable data structure characterizing ground-truthed contents of the document represented by the document image operates by segmenting a document image into a set of image objects, and linking the image objects with fields that store metadata. Image objects identified by segmenting the document image are grouped into subsets. The image objects are grouped according to characteristics suggesting that the image objects may have common ground-truthed metadata. By grouping the image objects into subsets, the image objects may be indexed to facilitate the ground-truthing process. In some embodiments, the index of representative image objects is presented to the user in a table form. A database of image objects with ground-truthed metadata is formed. Interactive tools and processes facilitate ground-truthing based on paired image objects and metadata.
    Type: Grant
    Filed: June 13, 2002
    Date of Patent: July 27, 2004
    Assignee: Convey Corporation
    Inventors: Floyd Steven Hall, Jr., Cameron Telfer Howie
  • Patent number: 6766069
    Abstract: A user-interface for selecting text from images of documents using auto-completion is described. The auto-completion process may be used to complete words (or text sequences), phrases, sentences, paragraphs, or other groupings of words. In response to user input, the OCR results for one or more images of documents are searched. The user input may include typing in a partial word (or the initial characters in a text sequence) via an input device or alternatively, annotations made by a user on a hardcopy document prior to scanning the document. One or more word matches are presented to the user for acceptance until the user accepts a word match or until all word matches have been presented to the user. Once a user accepts a word match, the word match is copied into an electronic document such as a word processing document, spreadsheet document, or other electronic document created by an application program.
    Type: Grant
    Filed: December 21, 1999
    Date of Patent: July 20, 2004
    Assignee: Xerox Corporation
    Inventors: Christopher R. Dance, William M. Newman, Alex S. Taylor, Stuart A. Taylor
  • Publication number: 20040126017
    Abstract: A system (10) for recognizing handwriting includes an input/output device (12) and a second computer (24 or 28). The system (10) converts handwritten symbols to text by using a grammar (50) that is comprised of the text (60) that is expected to be entered into a text display/text input area (17) of an input/output device (12). The grammar (50) and handwriting-to-text conversion can be performed in either the input/output device (12) or a remote computer (24, 28).
    Type: Application
    Filed: December 30, 2002
    Publication date: July 1, 2004
    Inventors: Giovanni Seni, Fabio Valente, Guo Jin
  • Publication number: 20040120583
    Abstract: A system augments stylus keyboarding with shorthand gesturing. The system defines a shorthand symbol for each word according to its movement pattern on an optimized stylus keyboard. The system recognizes word patterns by identifying an input as a stroke, and then matching the stroke to a stored list of word patterns. The system then generates and displays the matched word to the user.
    Type: Application
    Filed: December 20, 2002
    Publication date: June 24, 2004
    Applicant: International Business Machines Corporation
    Inventor: Shumin Zhai
  • Patent number: 6754391
    Abstract: Systems and methods for rendering image-based data are disclosed. A representative system includes a data interface that receives a remotely-generated data stream; a data manager coupled to the data interface, the data manager configured to translate the remotely-generated data stream into a plurality of word blocks, wherein the data manager determines for each word block of interest whether an active line can accommodate an entire word block of interest prior to registering the word block with the active line and wherein the data manager increments the active line in response to a determination that the word block of interest would not be accommodated on the active line; and a display device coupled to the data manager, the display device configured to render the plurality of word blocks.
    Type: Grant
    Filed: June 25, 2002
    Date of Patent: June 22, 2004
    Assignee: Hewlett-Packard Development Company, LP.
    Inventor: Frank P Carau, Sr.
  • Patent number: 6754386
    Abstract: In a computing device that receives handwritten data, a method and system that corrects for parser segmentation errors by sending an entire line of ink to a recognizer, and then comparing, on a word-by-word basis, the initial segmentation guesses of the parser with the more-thoroughly recognized segmentation results of the handwriting recognition engine. In the correction process, the ink words are efficiently adjusted with relatively little data manipulation. As the recognizer is fed a series of strokes on a line, the recognizer returns segmentation information. For ink word breaks that are the same for any given set of data, the existing ink word is unchanged. For ink words that are recognized differently relative to their initial segmentation, one or more new ink words are created and the handwriting (including stroke) data of the parser's ink word is manipulated to create a new ink processor word (or words) to match the recognizer output.
    Type: Grant
    Filed: August 22, 2000
    Date of Patent: June 22, 2004
    Assignee: Microsft Corporation
    Inventors: Peter H. Williamson, Charlton E. Lui, Dan W. Altman
  • Patent number: 6754675
    Abstract: An image retrieval system contains a database with a large number of images. The system retrieves images from the database that are similar to a query image entered by the user. The images in the database are grouped in clusters according to a similarity criterion so that mutually similar images reside in the same cluster. Each cluster has a cluster center which is representative for the images in it. A first step of the search to similar images selects the clusters that may contain images similar with the query image, by comparing the query image with the cluster centers of all clusters. A second step of the search compares the images in the selected clusters with the query image in order to determine their similarity with the query image.
    Type: Grant
    Filed: July 16, 2001
    Date of Patent: June 22, 2004
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Mohammed S. Abdel-Mottaleb, Santhana Krishnamachari
  • Publication number: 20040114807
    Abstract: A method of representing light field data by capturing a set of images of at least one object in a passive manner at a virtual surface where a center of projection of an acquisition device that captures the set of images lies and generating a representation of the captured set of images using a statistical analysis transformation based on a parameterization that involves the virtual surface.
    Type: Application
    Filed: December 13, 2002
    Publication date: June 17, 2004
    Inventors: Dan Lelescu, Frank Jan Bossen
  • Patent number: 6751780
    Abstract: A user interface method for launching an optimized final scan of a selected region of interest selected from a preview scan of a document. A user may drag the selected region of interest, presented in a preview scan of a document in a scanner window, and drop it on a software application, the desktop, or a writeable folder, which launches an optimized final scan of the selected region of interest. The image data resulting from the optimized final scan automatically resides in the software application, the desktop, or the writeable folder. In selecting a region of interest from the preview scan, scanner software parameters are updated with information about the region of interest which optimize the final scan. The image data from the optimized final scan is then formatted in the format requested and delivered to the software application, the desktop, or the writeable folder.
    Type: Grant
    Filed: October 1, 1998
    Date of Patent: June 15, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Theodore W. Neff, Jeffrey P. Lee, Patricia D. Lopez
  • Patent number: 6741745
    Abstract: Following scanning of a document image, and optical character recognition (OCR) processing, the outputted OCR text is processed to determine a text format (typeface and font size) to match the OCR text to the originally scanned image. The text format is identified by matching word sizes rather than individual character sizes. In particular, for each word and for each of a plurality of candidate typefaces, a scaling factor is calculated to match a typeface rendering of the word to the width of the word in the originally scanned image. After all of the scaling factors have been calculated, a cluster analysis is performed to identify close clusters of scaling factors for a typeface, indicative of a good typeface fit at a constant scaling factor (font size).
    Type: Grant
    Filed: December 18, 2000
    Date of Patent: May 25, 2004
    Assignee: Xerox Corporation
    Inventors: Christopher R. Dance, Mauritius Seeger
  • Patent number: 6741744
    Abstract: The invention features a method wherein a recognition environment utilizes pseudo-English as a programming language to extract simple and complex objects with image-and/or map-data as inputs. Based on this human/computer interface in which pseudo-English is a programming language, the object-recognition system has three major logic modules: (1) an input data module; (2) an information-processing module, coupled with the above-noted human computer interface (HCI) module; and (3) an output module that has a feedback mechanism back to the main information-processing and the input-data module. A physical phenomenon (i.e., one that is visible, audible, tactile, etc.) is analyzed by the information-processing module to determine whether it is susceptible to description or articulation. If not, the phenomenon is matched or compared, via the output module, to a known articulatable, physical-phenomenon model and recognizable features are extracted.
    Type: Grant
    Filed: April 17, 1999
    Date of Patent: May 25, 2004
    Inventor: Shin-yi Hsu
  • Patent number: 6738515
    Abstract: This invention is to compare each character of a first character string with each character of a second character string, vote for a matrix having two sides corresponding to the characters of the first character string and the characters of the second character string and calculate values of the voting result for respective components arranged in an oblique direction of the matrix. The matching result is determined based on the calculated values of the voting result. As a result, a high-speed and highly precise matching process which is noise-resistant and takes the character arrangement into consideration can be attained.
    Type: Grant
    Filed: July 27, 2000
    Date of Patent: May 18, 2004
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Takuma Akagi
  • Patent number: 6731802
    Abstract: A lattice data structure suitable for storage on a computer-readable medium is provided which represents a plurality of orthographic forms of a Japanese lexical entry. The lattice includes a plurality of data fields each adapted to hold data representing a word element of the entry. Each data field includes a first subfield containing data representing a primary form of the corresponding word element and a second field containing data representing an alternate form of the corresponding word element. Also provided is a method of normalizing Japanese lexical entries to produce a normalized form that includes the primary form of each word-element representation of the lattice and does not include the alternate forms. Also provided are methods of segmenting text using the disclosed lattice.
    Type: Grant
    Filed: May 2, 2000
    Date of Patent: May 4, 2004
    Assignee: Microsoft Corporation
    Inventors: Gary Kacmarcik, Christopher J. Brockett
  • Publication number: 20040037470
    Abstract: Systems and methods for processing text-based electronic documents are provided. Briefly described, one embodiment of a method for processing a text-based electronic document comprises the steps of: comparing at least one word in a text-based electronic document to a native language dictionary to determine whether the at least one word conforms to a predefined rule; for each of the at least one word that does not conform to the predefined rule, fragmenting the at least one word into word fragments; combining at least two consecutive word fragments; and comparing the combination of the word fragments to the native language dictionary.
    Type: Application
    Filed: August 23, 2002
    Publication date: February 26, 2004
    Inventor: Steven J. Simske
  • Publication number: 20040028278
    Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improve decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.
    Type: Application
    Filed: August 9, 2002
    Publication date: February 12, 2004
    Applicant: XEROX CORPORATION
    Inventors: Daniel H. Greene, Justin K. Romberg, Tze-Lei Poo, Ashok C. Popat
  • Publication number: 20040028279
    Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improve decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.
    Type: Application
    Filed: August 9, 2002
    Publication date: February 12, 2004
    Applicant: XEROX CORPORATION
    Inventors: Daniel H. Greene, Justin K. Romberg, Ashok C. Popat
  • Publication number: 20040028280
    Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improved decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.
    Type: Application
    Filed: August 9, 2002
    Publication date: February 12, 2004
    Applicant: XEROX CORPORATION
    Inventors: Daniel H. Greene, Tze-Lei Poo, Ashok C. Popat
  • Patent number: 6678409
    Abstract: The present invention segments a non-segmented input text. The input text is received and segmented based on parameter values associated with parameterized word formation rules. In one illustrative embodiment, the input text is processed into a form which includes parameter indications, but which preserves the word-internal structure of the input text. Thus, the parameter values can be changed without entirely re-processing the input text.
    Type: Grant
    Filed: January 14, 2000
    Date of Patent: January 13, 2004
    Assignee: Microsoft Corporation
    Inventors: Andi Wu, Zixin Jiang
  • Patent number: 6678415
    Abstract: A text recognition system represents the decoded message of a document image as a path through an image network. A method for integrating a language model into the network selectively expands the network to accommodate the language model only for certain ones of the paths in the network, effectively managing the memory storage requirements and computational complexities of integrating the language model efficiently into the network. The language model generates probability distributions indicating the probability of a certain character occurring in a string, given one or more previous characters in the string. Selectively expanding the image network is achieved by initially using upper bounds on the language model probabilities on the branches of an unexpanded image network. A best path search operation is then performed to determine an estimated best path through the image network using these upper bound scores.
    Type: Grant
    Filed: May 12, 2000
    Date of Patent: January 13, 2004
    Assignee: Xerox Corporation
    Inventors: Ashok C. Popat, Dan S. Bloomberg, Daniel H. Greene
  • Patent number: 6671856
    Abstract: Disclosed is a system, method, and program for determining boundaries in a string of characters using a dictionary, wherein the substrings in the dictionary may comprise words. A determination is made of all possible initial substrings of the string in the dictionary. One initial substring is selected such that all the characters following the initial substring can be divided into at least one substring in the dictionary. The boundaries follow each of the initial substring and the at least one substring that includes all the characters following the initial substring.
    Type: Grant
    Filed: September 1, 1999
    Date of Patent: December 30, 2003
    Assignee: International Business Machines Corporation
    Inventor: Richard Theodore Gillam
  • Patent number: 6668085
    Abstract: An improved method of deriving the correct text from text with errors converted from a character recognition device includes the need for significantly less human intervention for correction of the converted text.
    Type: Grant
    Filed: August 1, 2000
    Date of Patent: December 23, 2003
    Assignee: Xerox Corporation
    Inventor: William D. Evans