Context Analysis Or Word Recognition (e.g., Character String) Patents (Class 382/229)
-
Patent number: 6927774Abstract: A character display device and method therefor are adapted to obtain a proximal reference point of each character comprising a character series and calculate display coordinates of each character from said proximal reference point and the display angle and display reference position of the character series.Type: GrantFiled: December 8, 2000Date of Patent: August 9, 2005Assignee: Mitsubishi Denki Kabushiki KaishaInventor: Fumiko Yano
-
Patent number: 6922489Abstract: A method of interpreting an image using a statistical or probabilistic interpretation model is disclosed. The image has associated therewith contextual information. The method comprises the following steps: providing the contextual information associated with the image for analysis; analyzing the additional contextual information to identify predetermined features relating to the image; and biasing the statistical or probabilistic interpretation model in accordance with the identified features.Type: GrantFiled: October 29, 1998Date of Patent: July 26, 2005Assignees: Canon Kabushiki Kaisha, Canon Information Systems Research Australia Pty. Ltd.Inventors: Alison Joan Lennon, Delphine Anh Dao Le
-
Patent number: 6917708Abstract: A method of automatically recognizing text. The text is divided into whole words which are each recognize. Each whole word is characterized according to its silhouette. The silhouette is characterized by features in the silhouette such as upwardly extending “polls” and downwardly extending “holes”. The silhouette may also be characterized by its first syllable blends. Numbers are assigned to each of the different characteristics, and numbers may also be assigned based on analysis of a database of different kinds of cursive words. Recognition may be automatically carry out prior recognizing system which recognizes in this way.Type: GrantFiled: January 19, 2001Date of Patent: July 12, 2005Assignee: California Institute of TechnologyInventors: Rodney M. Goodman, Donal J. Woods, Patricia A. Keaton, Joseph Chen
-
Patent number: 6909805Abstract: A scanned document image, including add-on information such as handwritten annotations in addition to printed text lines, is processed by a handwriting detection method. First, at least one projection histogram is generated from the scanned document image. A regular pattern that correlates to the printed text lines is determined from the projection histogram. Second, connected component analysis is applied to the scanned document image to generate at least one merged text line. Each merged text line relates to at least one of the handwritten annotation and the printed text line. By comparing the merged text lines to the regular pattern of the projection histograms, the printed text lines are discriminated from the handwritten annotations.Type: GrantFiled: January 31, 2001Date of Patent: June 21, 2005Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Yue Ma, Jinhong Katherine Guo
-
Patent number: 6904171Abstract: The present invention provides a method and system for efficient information storage and retrieval of information. The method includes the steps of: scanning/selecting/capturing a selected portion of text of the information wherein the selected portion of text scanned is typically a close-to-unique identifier of the text from which the portion was excerpted and serves as a key when the information is accessed electronically; and placing the key in an electronically available index/directory to facilitate retrieval of the information. The method may further include retrieving and storing the information associated with the key and using it to index, organize, and make available for search and retrieval the full information originally viewed by the user.Type: GrantFiled: December 15, 2000Date of Patent: June 7, 2005Assignee: Hewlett-Packard Development Company, L.P.Inventor: Pieter J. van Zee
-
Patent number: 6879722Abstract: Disclosed herein is a method for automatically filtering a corpus of documents containing textual and non-textual information of a natural language. According to the method, through a first dividing step (101), the document corpus is divided into appropriate portions. At a following determining step (105), for each portion of the document corpus, there is determined a regularity value (VR) measuring the conformity of the portion with respect to character sequences probabilities predetermined for the language considered. At a comparing step (107), each regularity value (VR) is then compared with a threshold value (VT) to decide whether the conformity is sufficient. Finally, at a rejecting step (111), any portion of the document corpus whose conformity is not sufficient is rejected and removed from the corpus. An apparatus for carrying out such a method is also disclosed.Type: GrantFiled: June 29, 2001Date of Patent: April 12, 2005Assignee: International Business Machines CorporationInventor: Hubert Crepy
-
Patent number: 6879718Abstract: In computerized recognition having multiple experts, a method and system is described that obtains an optimum value for an expert tuning parameter in a single pass over sample tuning data. Each tuning sample is applied to two experts, resulting in scores from which ranges of parameters that correct incorrect recognition errors without changing correct results for that sample are determined. To determine the range data for a given sample, the experts return scores for each prototype in a database, the scores separated into matching and non-matching scores. The matching and non-matching scores from each expert are compared, providing upper and lower bounds defining ranges. Maxima and minima histograms track upper and lower bound range data, respectively. An analysis of the histograms based on the full set of tuning samples provides the optimum value. For tuning multiple parameters, each parameter may be optimized by this method in isolation, and then iterated.Type: GrantFiled: November 6, 2001Date of Patent: April 12, 2005Assignee: Microsoft Corp.Inventor: Gregory N. Hullender
-
Patent number: 6876765Abstract: A character recognition method carries out a character recognition using a cross section sequence graph which describes features of a character image. The character recognition method includes the steps of (a) extracting the cross section sequence graph from a character string image, (b) analyzing a singular region of the cross section sequence graph and generating a virtual boundary point sequence in the singular region based on an analyzed result, (c) generating character candidates by combining structural elements of the cross section sequence graph and recognizing one character by supplying the virtual boundary point sequence with respect to the generated character candidates if necessary, and (d) recognizing a character string based on an adjacency relationship of the character candidates which are recognized as one character in the step (c).Type: GrantFiled: March 29, 2001Date of Patent: April 5, 2005Assignee: Ricoh Company, Ltd.Inventor: Toshihiro Suzuki
-
Patent number: 6876774Abstract: The present invention provides a data compression method in which a plurality of consecutive characters of a data string to be compressed are set as a character string to be searched for. Bits of a bit string representing the set character string are allocated to at least two codewords. Thus, first and second searching codewords are generated. These first and second codewords are used as array addresses. First and second array tables are prepared, in which information on the past occurrence positions of the set character string is previously entered as the contents thereof. When the first and second codewords are generated from the character string to be compressed, the first and second array tables are looked up by using these codewords as the addresses of the arrays. When results of looking up these tables match with each other, it is found that the set character string occurred in the past.Type: GrantFiled: August 29, 2002Date of Patent: April 5, 2005Assignee: Fujitsu LimitedInventors: Noriko Satoh, Shigeru Yoshida
-
Patent number: 6873986Abstract: A method and system for mapping a number of characters in a string, wherein the string comprises a combination of characters representing indexed expressions and a combination of characters representing non-indexed expressions. One embodiment produces a weight array that can be utilized to compare a first and second string having indexed and non-indexed expressions. In one embodiment, a method generates a set of special weights for characters that represent indexed and non-indexed expressions. The method then associates a weight value of an indexed expression with the specific group of characters representing a specific non-indexed expression, and generates a weight array by retrieving a plurality of special weights associated with the specific group of characters representing the specific non-indexed expression and the associated weight value of the indexed expression.Type: GrantFiled: October 29, 2001Date of Patent: March 29, 2005Assignee: Microsoft CorporationInventors: John McConnell, Julie Bennett, Yung-Shin Lin
-
Patent number: 6859556Abstract: A word recognizing apparatus extracts the feature amount from a given image, and dynamically composes the feature amount of a candidate word to be recognized which is registered in a word list, using feature amounts of characters registered in an individual character dictionary. Then, the apparatus collates the composed feature amount of the word with the feature amount extracted from the image, calculates the degree of similarity between the two feature amounts, and outputs a recognition result.Type: GrantFiled: May 11, 1999Date of Patent: February 22, 2005Assignee: Fujitsu LimitedInventors: Hiroaki Takebe, Yoshinobu Hotta, Satoshi Naoi
-
Patent number: 6847734Abstract: In word recognition using the character recognition result, recognition processing is performed for an input character string that corresponds to a word to be recognized, a probability at which characteristics obtained as the result of character recognition are generated by conditioning characters of words contained in a word dictionary that stores in advance candidates of words to be recognized. The thus obtained probability is divided by a probability at which characteristics obtained as the result of character recognition are generated, and each of the division results obtained relevant to the characters of the words contained in the word dictionary is multiplied relevant to all the characters. The recognition results of the above words are obtained based on the multiplication results.Type: GrantFiled: January 26, 2001Date of Patent: January 25, 2005Assignee: Kabushiki Kaisha ToshibaInventor: Tomoyuki Hamamura
-
Patent number: 6839877Abstract: An electronic mail terminal includes a display section, a conversion dictionary which stores sets of a character string and a pictograph, a receiving section and a control section. The conversion dictionary stores sets of a character string and a pictograph. The receiving section receives an electric mail including a sentence as a conversion object sentence in a reception mode. The control section automatically refers to the character string-pictograph conversion dictionary based on each of character strings of the conversion object sentence in the reception mode to retrieve a specific pictograph corresponding to the character string, when the pictograph corresponding to the character string is registered in the character string-pictograph conversion dictionary. Then, the control section converts the character string into the specific pictograph to produce a pictograph mixed sentence, and controls the display section to display the pictograph mixed sentence.Type: GrantFiled: December 1, 2000Date of Patent: January 4, 2005Assignee: NEC CorporationInventor: Shinichiro Iwata
-
Publication number: 20040264782Abstract: A system and method for providing object-oriented graphical integrated command shell (ICS) integrates the command shell into a graphical user interface (GUI) environment in order to provide a single graphical user interface, so that the user does not need to work in different environments for different tasks. To accomplish the integration, the ICS provides interpretation of output responses that occur as result of processing textual commands entered by a user. An output response from the command shell is typically one or more lines of text from an output stream such as standard error or standard output. The output response is interpreted by the ICS to determine a meaning. Interpretation may be by pattern matching with regular expressions. If interpreted lines of command output (e.g. indicating a file or folder) map to some other object model (e.g. a file subsystem) in the UI, appropriate object model objects are created. Mapping output produces integration between different subsystems (i.e.Type: ApplicationFiled: November 6, 2003Publication date: December 30, 2004Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: David McKnight, Jeffrey Turnham
-
Patent number: 6834121Abstract: To provide an apparatus for rough classification of words that allows features of words to be stored in a vocabulary storage division to be generated from character codes of the words so that the words can be efficiently selected. A candidate character selection division 1 detects areas likely to be characters from a word image, and a character recognition division 2 recognizes candidate characters generated in candidate character selection division 1 and converts them into character codes. A number-of-characters estimation division 3 estimates the number of characters of the entire word image and the number of characters of the areas between candidate characters, a word description division 4 generates word description equivalent to a state transition graph from the recognition results of candidate characters and the estimated number of characters of candidate character separations.Type: GrantFiled: December 20, 2000Date of Patent: December 21, 2004Assignee: NEC CorporationInventors: Didier Guillevic, Keiji Yamada
-
Publication number: 20040223647Abstract: A data processing apparatus for inputting data by writing characters on a touch sensitive display screen. The data processing apparatus comprises a character recognition processor operable to generate an estimate of a character hand written by a user on the touch sensitive screen. The data processing apparatus includes a processing unit operable to receive the estimated character, and a graphics display device operable to receive the estimated character from the processing unit. The graphics display device is operable to display the estimated characters within a text input window of the display screen. The processing unit is operable in combination with the graphics display driver to display the estimated character on the display screen, substantially at a position proximate to a location to where the user has written the character.Type: ApplicationFiled: May 7, 2004Publication date: November 11, 2004Applicant: Orange SAInventors: Alan Blount, Todd Pinkerton
-
Patent number: 6816615Abstract: A logical separation between pages, such as an implicit page break, is introduced to separate text entered during one handwriting session from text entered during another handwriting session. If the user leaves more than a threshold amount of blank space at the bottom of the page immediately preceding the new page, then an implicit page break may be inserted at the beginning of the new page. The amount of blank space left at the end of the preceding page may be combined with other criteria to determine whether to insert an implicit page break. The amount of time elapsed since ink has been captured on the previous page is another factor that may be used by itself or combined with other factors to determine whether to insert an implicit page break into the new page. A change in context, such as a different date or different recognized subject matter labels, is also a factor that may be considered in determining whether to insert an implicit page break.Type: GrantFiled: February 28, 2001Date of Patent: November 9, 2004Assignee: Microsoft CorporationInventors: Charlton E. Lui, Anthony S. Smith, Dan W. Altman, Cynthia C. Tee, Evan M. Feldman
-
Publication number: 20040207878Abstract: Systems and methods for using a print subsystem to implement an analysis of the content of a print job prior to despooling the print job to a printing device, and selectively rendering, providing a modified rendering or terminating the print job. A computer device is connected to a printing device to selectively render a print job and includes a print subsystem, such as spooler and optionally a printer driver and a print processor. A further implementation includes a print server having a print subsystem. Print data corresponding to a print job is provided from the print subsystem input processing to a content filtering process to analyze the content thereof prior to despooling the print job to the printing device. The analysis determines if some or all of the content should be rejected, removed, replaced or require acknowledgement.Type: ApplicationFiled: April 21, 2003Publication date: October 21, 2004Inventor: Andrew Rodney Ferlitsch
-
Publication number: 20040199389Abstract: The invention relates to a method for recognizing a phonetic sound sequence or a character sequence, e.g.Type: ApplicationFiled: February 12, 2004Publication date: October 7, 2004Inventor: Hans Geiger
-
Patent number: 6801659Abstract: Beginning with the first letter or stroke, this invention uses the relative frequency of the sequential groups of letters or strokes from which individual words or characters are gradually built in order to provide a better way of computer indexing languages for easier and more efficient access to both the frequently used words or characters and the less-frequently used. This makes possible a system of text input that is both more efficient and more intuitive than utilizing just word or character frequency, an input approach which eliminates typing transpositions, reduces word-spelling errors or character-stroke-order uncertainty, and provides an alternative to a standard keyboard which is especially helpful with wireless phones and hand-held computers, and similar devices lacking standard keyboards. This invention can make words and characters quite accessible in an intuitive way without requiring any direct input of words or letters, strokes or characters.Type: GrantFiled: June 4, 2001Date of Patent: October 5, 2004Assignee: ZI Technology Corporation Ltd.Inventor: Robert B. O'Dell
-
Patent number: 6801660Abstract: In a computing device that receives handwritten data, a method and system that maintains an association between alternates for a given ink word, regardless of the handwritten or text state of the word, and regardless of the position of the word as it may be edited in a document. Handwritten data is maintained in an ink word data structure, and once the word is recognized and an alternate is selected for it, the first character of the word remains as an ink word (in a text buffer) pointing to the data structure, with a flag set in the data structure indicating that the word is now recognized as text. In this state, the first character is displayed to the user as a recognized text letter instead of as the handwritten word. The other characters that make up the recognized word are inserted as text into the text buffer. Any alternates returned by the recognizer are thus stored with the ink word data structure displayed as this first character of a recognized word, which also maintains the ink data, e.g.Type: GrantFiled: August 22, 2000Date of Patent: October 5, 2004Assignee: Microsoft CorporationInventors: Peter H. Williamson, Dan W. Altman, Charlton E. Lui
-
Patent number: 6798912Abstract: A method of program classification based on syntax of transcript information includes receiving transcript information associated with the program wherein the transcript information has a plurality of sentences, determining characteristics of at least one of the plurality of sentences of the transcript information to identify at least the type and subject of the sentence, comparing the characteristics of the at least one of the plurality of sentences with a list of sentence characteristics having associated therewith a plurality of program types, and based on the comparing step, selecting a classification of program which is most closely associated with the characteristics of the at least one of the plurality of sentences.Type: GrantFiled: December 18, 2000Date of Patent: September 28, 2004Assignee: Koninklijke Philips Electronics N.V.Inventor: Kavitha Devara
-
Patent number: 6798913Abstract: An image processing device and a computer program product capable of accurately determining a user-desired region even when a region has been only roughly marked by a user, wherein a specific region within an image to be processed is detected; the image to be processed is allocated into a plurality of blocks; text included in the image to be processed is recognized; it is determined based on a result of text recognition that presence and absence of relevance between a first block which is partially included in the specific region and a second block which is entirely included in the specific region among the allocated blocks; and it is determined whether or not an image of the first block should be treated as an image belonging to the specific region in accordance with a result of determination as to the relevance.Type: GrantFiled: March 16, 2001Date of Patent: September 28, 2004Assignee: Minolta Co., Ltd.Inventor: Hideyuki Toriyama
-
Publication number: 20040184663Abstract: This invention is to compare each character of a first character string with each character of a second character string, vote for a matrix having two sides corresponding to the characters of the first character string and the characters of the second character string and calculate values of the voting result for respective components arranged in an oblique direction of the matrix. The matching result is determined based on the calculated values of the voting result. As a result, a high-speed and highly precise matching process which is noise-resistant and takes the character arrangement into consideration can be attained.Type: ApplicationFiled: March 30, 2004Publication date: September 23, 2004Applicant: Kabushiki Kaisha ToshibaInventor: Takuma Akagi
-
Patent number: 6785663Abstract: Periodic patterns in time series data can be hierarchical in nature, where a higher level pattern may comprise repetitions of lower level patterns. In the presence of noises, these repetitions of lower level patterns may not be perfect. A novel model, namely a meta-pattern, is provided in accordance with the present invention to capture these higher level patterns. The meta-pattern can not only provide a more compact representation of patterns but also capture the regularities of pattern evolutions, which may not be expressed by previous models due to the presence of noise. A method is provided to mine meta-patterns in an iterative manner by discovering meta-patterns and their supporting subsequences in the form of lists of segments of contiguous repetitions of a meta-pattern. The number of pattern repetitions in each said segment is at least a predefined threshold min_rep and the distance between any two adjacent segments is at most a predefined threshold max_dis.Type: GrantFiled: December 28, 2000Date of Patent: August 31, 2004Assignee: International Business Machines CorporationInventors: Wei Wang, Jiong Yang, Philip Shi-Lung Yu
-
Patent number: 6785417Abstract: In a computing device that receives handwritten data, a method and system for finding matches for recognized handwritten words, by comparing a given search word (and possibly its alternates) with the words in a document, including recognized ink words and any possible alternates for those recognized words as returned by a recognizer. One described test looks for an exact match between an entered search word (and possibly its alternates) and the recognized words and their alternates stored in a handwritten document. Other tests are possible because of the use of alternates, which also may be returned with a probability ranking. For example, one scheme looks for a percentage of matching characters, with a user-determined threshold percentage. Other variations include giving different weight to certain characters, and/or factoring in the relative number of syllables and/or the relative lengths of the words.Type: GrantFiled: August 22, 2000Date of Patent: August 31, 2004Inventors: Peter H. Williamson, Charlton E. Lui
-
Publication number: 20040161154Abstract: Systems and methods for learning-based automatic commercial content detection are described. In one aspect, program data is divided into multiple segments. The segments are analyzed to determine visual, audio, and context-based feature sets that differentiate commercial content from non-commercial content. The context-based features are a function of single-side left and/or right neighborhoods of segments of the multiple segments.Type: ApplicationFiled: February 18, 2003Publication date: August 19, 2004Inventors: Xian-Sheng Hua, Lie Lu, Mingjing Li, Hong-Jiang Zhang
-
Patent number: 6778712Abstract: A data sheet identification device of the invention includes: a character/graphics extracting section, an identical shape deciding section, a graphics collating section, an identification code/data sheet ID identifying section for collating characters that have been decided to have the same shape with an identification code/data sheet ID database in which a plurality of characters showing features of a plurality of data sheets respectively have been registered, and an identifying section for uniquely identifying the data sheet based on a result of the collation by the graphics collating section and a result of the collation by the identification code/data sheet ID identifying section.Type: GrantFiled: August 29, 2000Date of Patent: August 17, 2004Assignee: Fujitsu LimitedInventors: Maki Yabuki, Shinichi Eguchi, Kouichi Kanamoto, Katsutoshi Kobara, Koichi Chiba, Toshiyuki Waida, Kazunori Yamamoto, Yutaka Katsumata
-
Patent number: 6771817Abstract: In a computing device that receives handwritten data, a method and data structure that enables extended data to be added to an existing ink word data structure without compromising backwards-compatibility. A flag in the header data structure is indicates to new ink processing programs the presence or absence of the extended data, and the size information maintained in the header is adjusted ensure that earlier versions of ink programs do not lose the extended data. The extended data is then added by including it in a copy of the existing ink word data structure, along with a tail structure that includes information describing the extended data and the tail structure to the new ink code, e.g., version and offset information. The tail structure can be used to locate a list of alternate word choices for an ink word that are maintained within the extended data.Type: GrantFiled: August 22, 2000Date of Patent: August 3, 2004Assignee: Microsoft CorporationInventors: Peter H. Williamson, Charlton E. Lui, Dan W. Altman
-
Patent number: 6768816Abstract: A method and a system by which a document image is analyzed for the purposes of establishing a searchable data structure characterizing ground-truthed contents of the document represented by the document image operates by segmenting a document image into a set of image objects, and linking the image objects with fields that store metadata. Image objects identified by segmenting the document image are grouped into subsets. The image objects are grouped according to characteristics suggesting that the image objects may have common ground-truthed metadata. By grouping the image objects into subsets, the image objects may be indexed to facilitate the ground-truthing process. In some embodiments, the index of representative image objects is presented to the user in a table form. A database of image objects with ground-truthed metadata is formed. Interactive tools and processes facilitate ground-truthing based on paired image objects and metadata.Type: GrantFiled: June 13, 2002Date of Patent: July 27, 2004Assignee: Convey CorporationInventors: Floyd Steven Hall, Jr., Cameron Telfer Howie
-
Patent number: 6766069Abstract: A user-interface for selecting text from images of documents using auto-completion is described. The auto-completion process may be used to complete words (or text sequences), phrases, sentences, paragraphs, or other groupings of words. In response to user input, the OCR results for one or more images of documents are searched. The user input may include typing in a partial word (or the initial characters in a text sequence) via an input device or alternatively, annotations made by a user on a hardcopy document prior to scanning the document. One or more word matches are presented to the user for acceptance until the user accepts a word match or until all word matches have been presented to the user. Once a user accepts a word match, the word match is copied into an electronic document such as a word processing document, spreadsheet document, or other electronic document created by an application program.Type: GrantFiled: December 21, 1999Date of Patent: July 20, 2004Assignee: Xerox CorporationInventors: Christopher R. Dance, William M. Newman, Alex S. Taylor, Stuart A. Taylor
-
Publication number: 20040126017Abstract: A system (10) for recognizing handwriting includes an input/output device (12) and a second computer (24 or 28). The system (10) converts handwritten symbols to text by using a grammar (50) that is comprised of the text (60) that is expected to be entered into a text display/text input area (17) of an input/output device (12). The grammar (50) and handwriting-to-text conversion can be performed in either the input/output device (12) or a remote computer (24, 28).Type: ApplicationFiled: December 30, 2002Publication date: July 1, 2004Inventors: Giovanni Seni, Fabio Valente, Guo Jin
-
Publication number: 20040120583Abstract: A system augments stylus keyboarding with shorthand gesturing. The system defines a shorthand symbol for each word according to its movement pattern on an optimized stylus keyboard. The system recognizes word patterns by identifying an input as a stroke, and then matching the stroke to a stored list of word patterns. The system then generates and displays the matched word to the user.Type: ApplicationFiled: December 20, 2002Publication date: June 24, 2004Applicant: International Business Machines CorporationInventor: Shumin Zhai
-
Patent number: 6754391Abstract: Systems and methods for rendering image-based data are disclosed. A representative system includes a data interface that receives a remotely-generated data stream; a data manager coupled to the data interface, the data manager configured to translate the remotely-generated data stream into a plurality of word blocks, wherein the data manager determines for each word block of interest whether an active line can accommodate an entire word block of interest prior to registering the word block with the active line and wherein the data manager increments the active line in response to a determination that the word block of interest would not be accommodated on the active line; and a display device coupled to the data manager, the display device configured to render the plurality of word blocks.Type: GrantFiled: June 25, 2002Date of Patent: June 22, 2004Assignee: Hewlett-Packard Development Company, LP.Inventor: Frank P Carau, Sr.
-
Patent number: 6754386Abstract: In a computing device that receives handwritten data, a method and system that corrects for parser segmentation errors by sending an entire line of ink to a recognizer, and then comparing, on a word-by-word basis, the initial segmentation guesses of the parser with the more-thoroughly recognized segmentation results of the handwriting recognition engine. In the correction process, the ink words are efficiently adjusted with relatively little data manipulation. As the recognizer is fed a series of strokes on a line, the recognizer returns segmentation information. For ink word breaks that are the same for any given set of data, the existing ink word is unchanged. For ink words that are recognized differently relative to their initial segmentation, one or more new ink words are created and the handwriting (including stroke) data of the parser's ink word is manipulated to create a new ink processor word (or words) to match the recognizer output.Type: GrantFiled: August 22, 2000Date of Patent: June 22, 2004Assignee: Microsft CorporationInventors: Peter H. Williamson, Charlton E. Lui, Dan W. Altman
-
Patent number: 6754675Abstract: An image retrieval system contains a database with a large number of images. The system retrieves images from the database that are similar to a query image entered by the user. The images in the database are grouped in clusters according to a similarity criterion so that mutually similar images reside in the same cluster. Each cluster has a cluster center which is representative for the images in it. A first step of the search to similar images selects the clusters that may contain images similar with the query image, by comparing the query image with the cluster centers of all clusters. A second step of the search compares the images in the selected clusters with the query image in order to determine their similarity with the query image.Type: GrantFiled: July 16, 2001Date of Patent: June 22, 2004Assignee: Koninklijke Philips Electronics N.V.Inventors: Mohammed S. Abdel-Mottaleb, Santhana Krishnamachari
-
Publication number: 20040114807Abstract: A method of representing light field data by capturing a set of images of at least one object in a passive manner at a virtual surface where a center of projection of an acquisition device that captures the set of images lies and generating a representation of the captured set of images using a statistical analysis transformation based on a parameterization that involves the virtual surface.Type: ApplicationFiled: December 13, 2002Publication date: June 17, 2004Inventors: Dan Lelescu, Frank Jan Bossen
-
Patent number: 6751780Abstract: A user interface method for launching an optimized final scan of a selected region of interest selected from a preview scan of a document. A user may drag the selected region of interest, presented in a preview scan of a document in a scanner window, and drop it on a software application, the desktop, or a writeable folder, which launches an optimized final scan of the selected region of interest. The image data resulting from the optimized final scan automatically resides in the software application, the desktop, or the writeable folder. In selecting a region of interest from the preview scan, scanner software parameters are updated with information about the region of interest which optimize the final scan. The image data from the optimized final scan is then formatted in the format requested and delivered to the software application, the desktop, or the writeable folder.Type: GrantFiled: October 1, 1998Date of Patent: June 15, 2004Assignee: Hewlett-Packard Development Company, L.P.Inventors: Theodore W. Neff, Jeffrey P. Lee, Patricia D. Lopez
-
Patent number: 6741745Abstract: Following scanning of a document image, and optical character recognition (OCR) processing, the outputted OCR text is processed to determine a text format (typeface and font size) to match the OCR text to the originally scanned image. The text format is identified by matching word sizes rather than individual character sizes. In particular, for each word and for each of a plurality of candidate typefaces, a scaling factor is calculated to match a typeface rendering of the word to the width of the word in the originally scanned image. After all of the scaling factors have been calculated, a cluster analysis is performed to identify close clusters of scaling factors for a typeface, indicative of a good typeface fit at a constant scaling factor (font size).Type: GrantFiled: December 18, 2000Date of Patent: May 25, 2004Assignee: Xerox CorporationInventors: Christopher R. Dance, Mauritius Seeger
-
Patent number: 6741744Abstract: The invention features a method wherein a recognition environment utilizes pseudo-English as a programming language to extract simple and complex objects with image-and/or map-data as inputs. Based on this human/computer interface in which pseudo-English is a programming language, the object-recognition system has three major logic modules: (1) an input data module; (2) an information-processing module, coupled with the above-noted human computer interface (HCI) module; and (3) an output module that has a feedback mechanism back to the main information-processing and the input-data module. A physical phenomenon (i.e., one that is visible, audible, tactile, etc.) is analyzed by the information-processing module to determine whether it is susceptible to description or articulation. If not, the phenomenon is matched or compared, via the output module, to a known articulatable, physical-phenomenon model and recognizable features are extracted.Type: GrantFiled: April 17, 1999Date of Patent: May 25, 2004Inventor: Shin-yi Hsu
-
Patent number: 6738515Abstract: This invention is to compare each character of a first character string with each character of a second character string, vote for a matrix having two sides corresponding to the characters of the first character string and the characters of the second character string and calculate values of the voting result for respective components arranged in an oblique direction of the matrix. The matching result is determined based on the calculated values of the voting result. As a result, a high-speed and highly precise matching process which is noise-resistant and takes the character arrangement into consideration can be attained.Type: GrantFiled: July 27, 2000Date of Patent: May 18, 2004Assignee: Kabushiki Kaisha ToshibaInventor: Takuma Akagi
-
Patent number: 6731802Abstract: A lattice data structure suitable for storage on a computer-readable medium is provided which represents a plurality of orthographic forms of a Japanese lexical entry. The lattice includes a plurality of data fields each adapted to hold data representing a word element of the entry. Each data field includes a first subfield containing data representing a primary form of the corresponding word element and a second field containing data representing an alternate form of the corresponding word element. Also provided is a method of normalizing Japanese lexical entries to produce a normalized form that includes the primary form of each word-element representation of the lattice and does not include the alternate forms. Also provided are methods of segmenting text using the disclosed lattice.Type: GrantFiled: May 2, 2000Date of Patent: May 4, 2004Assignee: Microsoft CorporationInventors: Gary Kacmarcik, Christopher J. Brockett
-
Publication number: 20040037470Abstract: Systems and methods for processing text-based electronic documents are provided. Briefly described, one embodiment of a method for processing a text-based electronic document comprises the steps of: comparing at least one word in a text-based electronic document to a native language dictionary to determine whether the at least one word conforms to a predefined rule; for each of the at least one word that does not conform to the predefined rule, fragmenting the at least one word into word fragments; combining at least two consecutive word fragments; and comparing the combination of the word fragments to the native language dictionary.Type: ApplicationFiled: August 23, 2002Publication date: February 26, 2004Inventor: Steven J. Simske
-
Publication number: 20040028278Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improve decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.Type: ApplicationFiled: August 9, 2002Publication date: February 12, 2004Applicant: XEROX CORPORATIONInventors: Daniel H. Greene, Justin K. Romberg, Tze-Lei Poo, Ashok C. Popat
-
Publication number: 20040028279Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improve decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.Type: ApplicationFiled: August 9, 2002Publication date: February 12, 2004Applicant: XEROX CORPORATIONInventors: Daniel H. Greene, Justin K. Romberg, Ashok C. Popat
-
Publication number: 20040028280Abstract: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improved decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.Type: ApplicationFiled: August 9, 2002Publication date: February 12, 2004Applicant: XEROX CORPORATIONInventors: Daniel H. Greene, Tze-Lei Poo, Ashok C. Popat
-
Patent number: 6678409Abstract: The present invention segments a non-segmented input text. The input text is received and segmented based on parameter values associated with parameterized word formation rules. In one illustrative embodiment, the input text is processed into a form which includes parameter indications, but which preserves the word-internal structure of the input text. Thus, the parameter values can be changed without entirely re-processing the input text.Type: GrantFiled: January 14, 2000Date of Patent: January 13, 2004Assignee: Microsoft CorporationInventors: Andi Wu, Zixin Jiang
-
Patent number: 6678415Abstract: A text recognition system represents the decoded message of a document image as a path through an image network. A method for integrating a language model into the network selectively expands the network to accommodate the language model only for certain ones of the paths in the network, effectively managing the memory storage requirements and computational complexities of integrating the language model efficiently into the network. The language model generates probability distributions indicating the probability of a certain character occurring in a string, given one or more previous characters in the string. Selectively expanding the image network is achieved by initially using upper bounds on the language model probabilities on the branches of an unexpanded image network. A best path search operation is then performed to determine an estimated best path through the image network using these upper bound scores.Type: GrantFiled: May 12, 2000Date of Patent: January 13, 2004Assignee: Xerox CorporationInventors: Ashok C. Popat, Dan S. Bloomberg, Daniel H. Greene
-
Patent number: 6671856Abstract: Disclosed is a system, method, and program for determining boundaries in a string of characters using a dictionary, wherein the substrings in the dictionary may comprise words. A determination is made of all possible initial substrings of the string in the dictionary. One initial substring is selected such that all the characters following the initial substring can be divided into at least one substring in the dictionary. The boundaries follow each of the initial substring and the at least one substring that includes all the characters following the initial substring.Type: GrantFiled: September 1, 1999Date of Patent: December 30, 2003Assignee: International Business Machines CorporationInventor: Richard Theodore Gillam
-
Patent number: 6668085Abstract: An improved method of deriving the correct text from text with errors converted from a character recognition device includes the need for significantly less human intervention for correction of the converted text.Type: GrantFiled: August 1, 2000Date of Patent: December 23, 2003Assignee: Xerox CorporationInventor: William D. Evans