Context Analysis Or Word Recognition (e.g., Character String) Patents (Class 382/229)
-
Patent number: 7532758Abstract: A method and apparatus for generating a template for use in handwriting recognition are provided. In the method and apparatus text is obtained, character strings in the text are identified, each character string being formed from a sequence of one or more characters and each character having a respective type, a sequence of character types is determined for each character string and a template is defined for each character type sequence.Type: GrantFiled: April 14, 2008Date of Patent: May 12, 2009Assignee: Silverbrook Research Pty LtdInventor: Jonathon Leigh Napper
-
Patent number: 7522771Abstract: Methods, systems, and computer-readable media for ascertaining neighborhood information in a dynamically changing environment, such as an electronic ink environment may include: (a) receiving data representing plural electronic ink strokes; (b) defining a first vertex associated with a first ink stroke; and (c) determining neighboring vertices to the first vertex, wherein the neighboring vertices are associated with ink stroke(s) other than the first ink stroke. Additional systems, methods, and computer-readable media may include: (a) receiving data representing plural electronic ink strokes; (b) defining plural vertices associated with the ink strokes; (c) receiving input indicating a selection of an ink component; and (d) determining at least one neighboring component by determining which ink component(s) located outside of the selection include one or more ink strokes having vertices that neighbor vertices included in the selection.Type: GrantFiled: March 17, 2005Date of Patent: April 21, 2009Assignee: Microsoft CorporationInventors: Herry Sutanto, Ming Ye, Sashi Raghupathy
-
Publication number: 20090092323Abstract: A system and method for character error correction is provided, useful for a user of mobile appliances to produce written text with reduced errors. The system includes an interface, a word prediction engine, a statistical engine, an editing distance calculator, and a selector. A string of characters, known as the inputted word, may be entered into the mobile device via the interface. The word prediction engine may then generate word candidates similar to the inputted word using fuzzy logic and user preferences generated from past user behavior. The statistical engine may then generate variable error costs determined by the probability of erroneously inputting any given character. The editing distance calculator may then determine the editing distance between the inputted word and each of the word candidates by grid comparison using the variable error costs. The selector may choose one or more preferred candidates from the word candidates using the editing distances.Type: ApplicationFiled: October 4, 2007Publication date: April 9, 2009Inventors: Weigen Qiu, Samuel Yin Lun Pun
-
Patent number: 7515751Abstract: In a computing device, a method and system for searching for matching ink words or phrases, by comparing a given search term of at least one word (and possibly alternates) with the words in a document, including recognized ink words and any possible alternates for those recognized words as returned by a recognizer. Various matching tests are possible because of the use of alternates, which also may have corresponding probability rankings that may influence the search. Searching may occur in actively edited ink documents, or the recognition results may be saved as saved search file data that can be searched independent of recognition.Type: GrantFiled: September 11, 2006Date of Patent: April 7, 2009Assignee: Microsoft CorporationInventors: Charlton E. Lui, Gregory H. Manto, Vikram Madan, Ryan E. Cukierman, Jon E. Clark
-
Patent number: 7512275Abstract: When first and second images are input, a partial image feature calculating unit calculates feature values of partial images of the two images. A maximum matching score position searching unit searches for a position of the second image that attains to the highest matching score with each of the partial images of the first image. A movement-vector-based similarity score calculating unit calculates similarity between the first and second images, using information related to that partial image whose movement vector has direction and length within a prescribed range, which movement vector representing positional relation between a reference position for measuring, for each of the partial images, the position of the partial image in the first image and the position of the maximum matching score corresponding to the partial image searched out by the maximum matching score position searching unit. The images as the object of collation may belong to the same category classified based on the feature values.Type: GrantFiled: October 19, 2004Date of Patent: March 31, 2009Assignee: Sharp Kabushiki KaishaInventors: Manabu Yumoto, Yasufumi Itoh, Takashi Horiyama, Manabu Onozaki, Toshiya Okamoto
-
Publication number: 20090074306Abstract: Word correlations are estimated using a content-based method, which uses visual features of image representations of the words. The image representations of the subject words may be generated by retrieving images from data sources (such as the Internet) using image search with the subject words as query words. One aspect of the techniques is based on calculating the visual distance or visual similarity between the sets of retrieved images corresponding to each query word. The other is based on calculating the visual consistence among the set of the retrieved images corresponding to a conjunctive query word. The combination of the content-based method and a text-based method may produce even better result.Type: ApplicationFiled: December 13, 2007Publication date: March 19, 2009Applicant: MICROSOFT CORPORATIONInventors: Jing Liu, Bin Wang, Zhiwei Li, Mingjing Li, Wei-Ying Ma
-
Patent number: 7505180Abstract: A method for performing optical character recognition (OCR) on an image of a document including text includes embedding a physical manifestation of digital information associated with the text on the document. When the document is scanned with a scanning device, the digital information and a digital text file are produced. The digital text file is proofed using the digital information.Type: GrantFiled: November 15, 2005Date of Patent: March 17, 2009Assignee: Xerox CorporationInventors: Dennis C. DeYoung, Devin J. Rosenbauer
-
Patent number: 7499588Abstract: A global optimization framework for optical character recognition (OCR) of low-resolution photographed documents that combines a binarization-type process, segmentation, and recognition into a single process. The framework includes a machine learning approach trained on a large amount of data. A convolutional neural network can be employed to compute a classification function at multiple positions and take grey-level input which eliminates binarization. The framework utilizes preprocessing, layout analysis, character recognition, and word recognition to output high recognition rates. The framework also employs dynamic programming and language models to arrive at the desired output.Type: GrantFiled: May 20, 2004Date of Patent: March 3, 2009Assignee: Microsoft CorporationInventors: Charles E. Jacobs, James R. Rinker, Patrice Y. Simard, Paul A. Viola
-
Patent number: 7496233Abstract: A user defines a job flow of desired service cooperation according to a GUI screen displayed on a client terminal where parallel processing of plural parallel-executable jobs can be set. According to the thus-defined job flow, an instruction data generation server generates instruction data defining the content of processes, a storage location of a document as a subject, and other items. When the user selects desired one of the instruction data, the selected instruction data is sent to a cooperative processing server.Type: GrantFiled: September 15, 2003Date of Patent: February 24, 2009Assignee: Fuji Xerox Co., Ltd.Inventors: Kazuko Kirihara, Yuji Hikawa, Yukio Tajima, Akihiro Enomoto, Hidekazu Ozawa
-
Publication number: 20090034851Abstract: Systems and methods for classifying content as adult content and, if desired, blocking content so classified from presentation to a user are provided. Received content is analyzed using a sequential series of classification techniques, each successive technique being implemented only if the previous technique did not result in classification of the content as adult content. In this way, adult content may be identified across a variety of different media types (e.g., text, images, video, etc.) and yet processing power may be reserved if one or more techniques requiring less power is sufficient to determine that the received content is, in fact, adult content. Content classification may be performed in-band (that is, in substantially real-time such that content may be identified and/or blocked at the time results of a user query are returned) or out-of-band (that is, prospectively as new content is received but not in association with a user query).Type: ApplicationFiled: August 3, 2007Publication date: February 5, 2009Applicant: MICROSOFT CORPORATIONInventors: Xiadong Fan, Richard Qian
-
Patent number: 7487461Abstract: A command pattern recognition system based on a virtual keyboard layout combines pattern recognition with a virtual, graphical, or on-screen keyboard to provide a command control method with relative ease of use. The system allows the user conveniently issue commands on pen-based computing or communication devices. The system supports a very large set of commands, including practically all commands needed for any application. By utilizing shortcut definitions it can work with any existing software without any modification. In addition, the system utilizes various techniques to achieve reliable recognition of a very large gesture vocabulary. Further, the system provides feedback and display methods to help the user effectively use and learn command gestures for commands.Type: GrantFiled: May 4, 2005Date of Patent: February 3, 2009Assignee: International Business Machines CorporationInventors: Shumin Zhai, Per-Ola Kristensson
-
Publication number: 20090028446Abstract: An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided into separate characters. Image features of each character image are extracted. Based on the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters, from a character image feature dictionary which stores the image features of character image in units of character, and a first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting a first column of the first index matrix, is subjected to a lexical analysis according to a language model, and whereby a second index matrix having a character string which makes sense is prepared. In the language model, statistics are taken and then, the lexical analysis is performed.Type: ApplicationFiled: January 10, 2008Publication date: January 29, 2009Inventors: Bo Wu, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia
-
Publication number: 20090016617Abstract: A mobile apparatus for receiving an electronic message that comprises a text message from a sender. The mobile device comprises a contact records repository that stores a number digital images, which are associated with a respective number of user identifiers. The mobile device further comprises a text analysis module that identifies predefined expressions in the text message, an image-editing module that matches one of the user identifiers with the sender and edits the associated digital image according to the identified predefined expression, and an output module for outputting the edited digital image.Type: ApplicationFiled: July 13, 2007Publication date: January 15, 2009Applicant: Samsung Electronics Co., Ltd.Inventors: Orna Bregman-Amitai, Nili Karmon
-
Publication number: 20090005078Abstract: A portable communication apparatus is provided which comprises an image-capturing means for capturing an image; a character recognition means for recognizing characters which appear in that captured image; a location means for identifying the current location of the portable communication apparatus; a data retrieval means for accessing one or more databases in order to retrieve data based on the recognized characters and on the current location of the portable communication apparatus.Type: ApplicationFiled: June 24, 2008Publication date: January 1, 2009Applicant: xSights Media Ltd.Inventor: Eran DARIEL
-
Publication number: 20080317359Abstract: A printer 1 has transportation paths for conveying media in two directions, that of a first transportation path P1 and that of a second transportation path P2 (or third transportation path P3) perpendicular to the first transportation path P1. With this printer 1 a single compact unit can be used for media processing by reading and printing the media, as well as for printing receipts and validation printing.Type: ApplicationFiled: August 28, 2008Publication date: December 25, 2008Inventors: TOSHIYUKI SASAKI, Masashi Fujikawa, Kunio Omura
-
Patent number: 7457464Abstract: A digital image is composed at a digital transmitter device from a hardcopy source. The digital image includes an optically scanned image. Indicia is detected on the hardcopy image. A substitute is made for the indicia in the composed digital image. A modified rendering of the digital image is output.Type: GrantFiled: August 29, 2003Date of Patent: November 25, 2008Assignee: Hewlett-Packard Development Company, L.P.Inventors: Chad A. Stevens, Robert Sesek, Travis J. Parry
-
Patent number: 7454063Abstract: The present invention is a method of optical character recognition. First, text is received. Next all words in the text are identified and associated with the appropriate line in the document. The directional derivative of the pixellation density function defining the text is then taken, and the highest value points for each word are identified from this equation. These highest value points are used to calculate a baseline for each word. A median anticipated baseline is also calculated and used to verify each baseline, which is corrected as necessary. Each word is then parsed into feature regions, and the features are identified through a series of complex analyses. After identifying the main features, outlying ornaments are identified and associated with appropriate features. The results are then compared to a database to identify the features and then displayed.Type: GrantFiled: September 22, 2005Date of Patent: November 18, 2008Assignee: The United States of America as represented by the Director National Security AgencyInventors: Kyle E Kneisl, Jesse Otero
-
Publication number: 20080273802Abstract: A form processing program which is capable of automatically extracting keywords. When the image of a scanned form is entered, a layout recognizer extracts a readout region of the form image, a character recognizer recognizes characters within the readout region. A form logical definition database stores form logical definitions defining strings as keywords according to logical structures which are common to forms of same type. A possible string extractor extracts as possible strings combinations of recognized characters each of which satisfies defined relationships of a string. A linking unit links the possible strings according to positional relationships, and determines a combination of possible strings as keywords.Type: ApplicationFiled: July 8, 2008Publication date: November 6, 2008Applicant: FUJITSU LIMITEDInventors: Hiroaki Takebe, Katsuhito Fujimoto
-
Patent number: 7446817Abstract: A method and apparatus for detecting text associated with video are provided. The method of detecting the text of the video includes reading a t-th frame (where t is a positive integer) among frames forming the video as a current frame, determining whether there is a text area detected from a previous frame which is a (t?N)-th (where N is a positive integer) frame among the frames forming the video, in the current frame, and upon determining that there is no text area detected from the previous frame in the current frame, detecting the text area in the entire current frame. Upon determining that there is the text area detected from the previous frame in the current frame, the text area is detected from a remaining area obtained by excluding from the current frame an area corresponding to the text area detected from the previous frame. Whether there is a text area in a next frame which is a (t+N)-th frame among the frames forming the video is verified.Type: GrantFiled: February 14, 2005Date of Patent: November 4, 2008Assignee: Samsung Electronics Co., Ltd.Inventors: Cheolkon Jung, Jiyeun Kim, Youngsu Moon
-
Patent number: 7443316Abstract: A method (300) for entering a character into an electronic device (100) is provided. The method (300) includes displaying (301) input character keys (204) on a touch sensitive region (202) of a display screen (105) of the device (100), the keys identifying an associated character. Next, a display step (309) shows at least one entered character in a display region (201) of the screen, the entered character having been selected by actuation of one of the character keys (204). Next, a group of potential subsequent characters that follow the entered character is predicted (311, 317). A second set of input character keys (205) identifying the potential subsequent characters is displayed (327). The second set of keys (205) are grouped together (323) such that their relative screen locations with respect to each other are different to that of corresponding keys in the first set of keys (204).Type: GrantFiled: September 1, 2005Date of Patent: October 28, 2008Assignee: Motorola, Inc.Inventor: Swee Ho Lim
-
Patent number: 7444021Abstract: The present invention provides a method of identifying a string formed from a number of hand-written characters, such as hand-written words. In order to achieve this, the method operates to determine character probabilities for each character in the string, as well as to determine the probability of the string corresponding to a predetermined form of template. In this regard, each template represents a respective combination of character types. The template and character probabilities are then combined to determine string probabilities, with the character string being identified in accordance with the determined string probabilities.Type: GrantFiled: October 15, 2002Date of Patent: October 28, 2008Assignee: Silverbrook Research Pty LtdInventor: Jonathon Leigh Napper
-
Patent number: 7437001Abstract: A method for recognition of a handwritten pattern comprises the steps of forming (4) a representation of the handwritten pattern, forming (6) at least two subconfigurations by dividing the representation of the handwritten pattern, and processing the subconfigurations. The step of processing comprises the steps of comparing (8) each subconfiguration with reference configurations, selecting (10) at least one subconfiguration candidate for each subconfiguration among the reference configurations based on said step of comparing, and determining (12) at least one candidate pattern consisting of one selected subconfiguration candidate for each subconfiguration. The method further comprises the steps of comparing (14) the representation of the handwritten pattern to the candidate pattern, and computing (16) a cost function in order to find a closest matching candidate pattern.Type: GrantFiled: June 5, 2007Date of Patent: October 14, 2008Assignee: ZI Decuma ABInventors: Jonas Morwing, Gunnar Sparr
-
Publication number: 20080240582Abstract: A method and an apparatus for character string recognition may be provided that enables prevention of a decrease in recognition accuracy for a character string even when distortion of an image appears in a direction perpendicular to a medium transfer direction.Type: ApplicationFiled: March 31, 2008Publication date: October 2, 2008Applicant: NIDEC SANKYO CORPORATIONInventor: Hiroshi NAKAMURA
-
Publication number: 20080212882Abstract: The present invention is related to a method and system providing a pattern-classifier encoded dictionary for use in language processing systems implemented in computer systems. The pattern encoded dictionary according to the present invention may be utilized in Optical Character Recognition systems or (OCR) or Automatic Speech Recognition systems (ASR) to retrieve reliably identified words used in an adaptive manner or as a tool to configure said OCR or ASR system.Type: ApplicationFiled: June 14, 2006Publication date: September 4, 2008Applicant: Lumex ASInventors: Hans Christian Meyer, Mats Stefan Carlin, Knut Tharald Fosseide
-
Patent number: 7420701Abstract: Systems and methods for accurately recognizing a language format of an input imaging data stream when no explicit language switch is present. A sniffer process is initiated when an imaging device receives an input imaging data stream. The sniffer process analyzes an initial sample of the input stream to determine the language format by enumerating through a set of language recognizers that are implemented as callback functions. The enumeration uses a dynamic heuristic approach to selecting the order in which to try the language recognizers. Each language recognizer has a sample size associated with it. For each language recognizer enumerated, the sniffer process pre-reads the associated sample size and invokes the associated callback function with the byte sample. The enumeration continues until a language recognizer acknowledges recognition of the language format or the set of language recognizers is exhausted.Type: GrantFiled: June 10, 2004Date of Patent: September 2, 2008Assignee: Sharp Laboratories of America, Inc.Inventor: Andrew Rodney Ferlitsch
-
Publication number: 20080208576Abstract: Character information recognition means (101) extracts, through a character recognition process, character information from a selection button included in an index image. Based on text data having been outputted from the character information recognition means (101), index dictionary creation means (102) creates an index dictionary usable for a speech recognition process performed by speech recognition means (104). The speech recognition means (104) performs the speech recognition process by using speech data (D1) retrieved through an ADC (7) and the index dictionary stored in storage means (107). Based on a result of the speech recognition process performed by the speech recognition means (104), reproduction control means (105) performs reproduction control of a chapter. Thus, a desired button can be selected by speech, from chapter selection buttons displayed on a chapter selection image of a DVD video.Type: ApplicationFiled: November 4, 2005Publication date: August 28, 2008Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.Inventors: Atsushi Iisaka, Atsushi Yamashita, Takuya Hirai
-
Patent number: 7418442Abstract: Providing the ability to search a document for content recorded as both ink characters and text characters. A character from a search query word is retrieved. A program retrieves the character in the electronic document. The program determines if the character in the electronic document is an ink or text character. For text characters, the character in the document content is compared to a character in the search query word to determine if the characters match. For ink characters, an ink alternate term is obtained. A character in the ink alternate is compared to the character of the search query word to determine if the characters match. Once all characters in the ink alternate word are compared, another ink alternate word is retrieved and compared to the search query word.Type: GrantFiled: September 30, 2003Date of Patent: August 26, 2008Assignee: Microsoft CorporationInventor: Nathaniel Marvin Myhre
-
Patent number: 7415137Abstract: A method of processing an image includes steps of identifying one candidate for a human face region within an image; calculating a probability that the candidate for human face region represents a human face; and saving the probability as attached information to the image. The method of processing an image can also include steps of identifying one candidate for human face region within an image; calculating a probability that the candidate for human face region represents a human face; judging whether or not the candidate for human face region represents a human face by comparing the probability with a threshold; and saving a result of the step of judging as attached information to the image. According to these methods, results of identifying candidates for human face regions will be saved to the image, and further processes to be conducted on the image can be facilitated.Type: GrantFiled: November 20, 2003Date of Patent: August 19, 2008Assignee: Canon Kabushiki KaishaInventors: Xinwu Chen, Xin Ji, Libing Wang, Yoshihiro Ishida
-
Publication number: 20080193021Abstract: A method and apparatus for generating a template for use in handwriting recognition are provided. In the method and apparatus text is obtained, character strings in the text are identified, each character string being formed from a sequence of one or more characters and each character having a respective type, a sequence of character types is determined for each character string and a template is defined for each character type sequence.Type: ApplicationFiled: April 14, 2008Publication date: August 14, 2008Inventor: Jonathon Leigh Napper
-
Patent number: 7406201Abstract: A method for encoding characters includes identifying one or more sequences of the character codes that are likely to be generated due a segmentation error in application of a pattern recognition process, and associating a respective extension character code with each of the sequences. The area of an image containing characters is divided into segments, such that each segment contains approximately one character. The pattern recognition process is applied to each of the segments in order to generate an input string of character codes. At least one of the identified sequences of the character codes in the input string is replaced with the respective extension character code so as to generate a modified string. The output string is determined by comparing the modified string to a directory of known strings.Type: GrantFiled: December 4, 2003Date of Patent: July 29, 2008Assignee: International Business Machines CorporationInventors: Andre Heilper, Eugene Walach
-
Patent number: 7403656Abstract: A character recognition method that is robust under an unknown illumination condition is provided. An apparatus for realizing such robust character recognition includes plural different binarization, means for synthesizing character sub-image candidates that have been obtained from the binarization units, and means for analyzing character sub-image candidates and for recognizing an image as a character string consisting of character sub-image candidates.Type: GrantFiled: February 4, 2005Date of Patent: July 22, 2008Assignee: Hitachi, Ltd.Inventor: Masashi Koga
-
Publication number: 20080170786Abstract: A technique that can contribute to a reduction in an operation burden in managing a processing result of semantic determination processing applied to objects included in an image is provided. An object included in an image of image data is extracted. A semantic of the object in a layout of the image data is determined. When it is determined that plural objects have an identical semantic, a display unit is caused to notify information concerning the plural objects, which are determined as having the semantic, in association with information concerning the semantic.Type: ApplicationFiled: December 28, 2007Publication date: July 17, 2008Applicants: KABUSHIKI KAISHA TOSHIBA, TOSHIBA TEC KABUSHIKI KAISHAInventors: Hajime Tomizawa, Akihiko Fujiwara
-
Publication number: 20080170075Abstract: A display controller includes a character display unit for displaying character information on a display unit; a keyword detecting unit for detecting a predetermined keyword from the character information displayed by the character display unit; an image information detecting unit for detecting image information including additional information corresponding to the keyword detected by the keyword detecting unit, from image information including predetermined additional information and stored in a storing unit; and a thumbnail image displaying unit for displaying on the display unit a thumbnail image(s) of the image information detected by the image information detecting unit.Type: ApplicationFiled: January 15, 2008Publication date: July 17, 2008Applicant: SONY ERICSSON MOBILE COMMUNICATIONS JAPAN, INC.Inventors: Seiji MURAMATSU, Yoshimitsu Funabashi, Mayu Irimajiri, Atsushi Imai, Keiko Hiraoka, Takamoto Tsuda, Takeshi Matsuzawa, Takeshi Tanigawa, Tomoharu Okamoto, Akihiko Adachi, Tatsuhiko Nishimura
-
Publication number: 20080166057Abstract: A video structuring device includes: character string extraction means for determining whether or not a character string is present in a frame image, and if it determines that a character string is present, generating character string position information for the character string present in a character string present frame image in which the character string is present, and outputting the character string position information, frame identifying information for identifying the character string present frame image, and the character string present frame image; video information storage means for storing frame identifying information, character string present frame image and character string position information in an index file all associated with one another; and structure information presentation means for associating character string display in the form of an image which is produced by cutting an area where the character string is present based on the character string present frame image and character stringType: ApplicationFiled: October 24, 2006Publication date: July 10, 2008Inventor: Noboru Nakajima
-
Publication number: 20080159635Abstract: A system for enabling user interaction with computer software which includes a computer system which transfers print data to a printer. The printer is responsive to the print data to print a form by printing information indicative of a text field coincident with coded data indicative of the text field, so that when a sensing device is moved relative to the text field the sensing device can sense the coded data and generate the indicating data indicative of its movement. The computer system uses the indicating data to determine the relative movement and then perform an action associated with the text field based on the movement. The computer system further determines the information, an identity indicative of the text field, and a layout defining an arrangement for coded data indicative of the identity and information, and generates the print data to be indicative of the identity, layout and information.Type: ApplicationFiled: March 17, 2008Publication date: July 3, 2008Inventors: Paul Lapstun, Kia Silverbrook
-
Patent number: 7391419Abstract: An information distribution system configured to deliver various types of content provided by an information distributor to information receivers through a network and transmitting the content to be distributed converted to colors, color values, or color digital values. By converting the content to colors, color values, or color digital values, it is possible to reduce the amount of information transmitted. Due to this, it becomes possible to shorten the time required for distribution of content and to improve practicality. Further, it becomes possible to reduce the distribution costs.Type: GrantFiled: May 22, 2002Date of Patent: June 24, 2008Assignee: Tani Electronics CorporationInventor: Okie Tani
-
Patent number: 7391527Abstract: A method is directed to using a multifunction printer to identify pages of a printed document that have a specified text string. The method comprises electronically converting, with the multifunction printer, a plurality of pages of the printed document to a plurality of electronic text pages corresponding to the printed pages. The multifunction printer electronically searches the plurality of electronic text pages to identify which electronic text pages include the specified text string. The multifunction printer then communicates the identified electronic text pages to the user.Type: GrantFiled: April 29, 2003Date of Patent: June 24, 2008Assignee: Hewlett-Packard Development Company, L.P.Inventors: Cory Irwin, Carl Price
-
Publication number: 20080137971Abstract: A method and system for character recognition are described. In one embodiment, it may use matched sequences rather than character shape to determine a computer legible result.Type: ApplicationFiled: April 1, 2005Publication date: June 12, 2008Inventors: Martin T. King, Dale L. Grover, Clifford A. Kushler, James Q. Stafford-Fraser
-
Publication number: 20080131006Abstract: A pure adversarial optical character recognition (OCR) approach in identifying text content in images. An image and a search term are input to a pure adversarial OCR module, which searches the image for presence of the search term. The image may be extracted from an email by an email processing engine. The OCR module may split the image into several character-blocks that each has a reasonable probability of containing a character (e.g., an ASCII character). The OCR module may form a sequence of blocks that represent a candidate match to the search term and calculate the similarity of the candidate sequence to the search term. The OCR module may be configured to output whether or not the search term is found in the image and, if applicable, the location of the search term in the image.Type: ApplicationFiled: August 16, 2007Publication date: June 5, 2008Inventor: Jonathan James Oliver
-
Publication number: 20080131005Abstract: An adversarial approach in detecting inappropriate text content in images. An expression from a listing of expressions may be selected. The listing of expressions may include words, phrases, or other textual content indicative of a particular type of message. Using the selected expression as a reference, the image is searched for a section that could be similar to the selected expression. The similarity between the selected expression and the section of the image may be in terms of shape. The section may be scored against the selected expression to determine how well the selected expression matches the section. The score may be used to determine whether or not the selected expression is present in the image.Type: ApplicationFiled: May 16, 2007Publication date: June 5, 2008Inventor: Jonathan James Oliver
-
Patent number: 7379603Abstract: Methods of organizing a series of sibling data entities in a digital computer are provided for preserving sibling ranking information associated with the sibling data entities and for attaching the sibling ranking information to a joint parent of the sibling data entities to facilitate on-demand generation of ranked parent candidates. A rollup function of the present invention builds a rollup matrix (126) that embodies information about the sibling entities and the sibling ranking information and provides a method for reading out the ranked parent candidates from the rollup matrix in order of their parent confidences (141). Parent confidences are based on the sibling ranking information, either alone or in combination with n-gram dictionary ranking or other ranking information.Type: GrantFiled: April 8, 2003Date of Patent: May 27, 2008Assignee: RAF Technology, Inc.Inventors: David Justin Ross, Stephen E. M. Billester, Brent R. Smith
-
Patent number: 7379596Abstract: An improved system and method for personalizing recognition of an input method is provided. A trainable handwriting recognizer may be personalized by using ink written by the user and text authored by the user. The system includes a personalization service engine and a framework with interfaces for collecting, storing, and accessing user ink and authored information for training recognizers. The trainers of the system may include a text trainer for augmenting a recognizer's dictionary using text content and a shape trainer for tuning generic recognizer components using ink data supplied by a user. The trainers may load multiple trainer clients, each capable of training one or more specific recognizers. Furthermore, a framework is provided for supporting pluggable trainers. Any trainable recognizer may be dynamically personalized using the harvested information authored by the user and ink written by the user.Type: GrantFiled: October 24, 2003Date of Patent: May 27, 2008Assignee: Microsoft CorporationInventors: Patrick Haluptzok, Ross Nathaniel Luengen, Benoit J. Jurion, Michael Revow, Richard Kane Sailor
-
Publication number: 20080118162Abstract: A mobile communications device with an integrated camera is directed towards text. A video stream is analyzed in real time to detect one or more words in a specified region of the video frames and to indicate the detected words on a display. Users can select a word in a video stream and subsequently move or extend the initial selection. It is thus possible to select multiple words. A subregion of the video frame comprising the detected word(s) is pre-processed and compressed before being sent to a remote optical character recognition (OCR) function which may be integrated in an online service such as an online search service.Type: ApplicationFiled: November 20, 2006Publication date: May 22, 2008Applicant: Microsoft CorporationInventor: Frank Siegemund
-
Publication number: 20080115070Abstract: Text analysis methods, text analysis apparatuses, and articles of manufacture are described according to some aspects. In one aspect, a text analysis method includes accessing information indicative of data content of a collection of text comprising a plurality of different topics, using a computing device, analyzing the information indicative of the data content, and using results of the analysis, identifying a presence of a new topic in the collection of text.Type: ApplicationFiled: November 10, 2006Publication date: May 15, 2008Inventors: Paul D. Whitney, Alan R. Willse, Charles A. Lopresti, Amanda M. White
-
Patent number: 7369704Abstract: In a circumstance where an image processing apparatus is connected to and capable of communicating with a plurality of processing servers each performing a specific data processing service, what kind of processing is performed on document image data read out from a document by image reading means is determined in accordance with the document image data. Then an address of a processing server capable of performing the processing thus determined is searched. Then at least a part of the document image data or character-string image extracted therefrom is supplied to the address thus searched, and the data processing service is requested. From this address, a result of the data processing service is obtained, and the obtained result of the data processing service is outputted.Type: GrantFiled: May 17, 2005Date of Patent: May 6, 2008Assignee: Sharp Kabushiki KaishaInventor: Tomoyuki Honma
-
Patent number: 7362902Abstract: Character data for a plurality of characters on which character recognition is being performed is received for processing. The character data includes character assignments and character locations. A reference location is defined in relation to a location of one of the characters, and the character assignments are resolved into one or more groupings according to a distance of the characters from the reference location.Type: GrantFiled: May 28, 2004Date of Patent: April 22, 2008Assignee: Affiliated Computer Services, Inc.Inventors: Billy S. Baker, Gary S. Smith
-
Patent number: 7356188Abstract: Described herein is a technology for recognizing the content of text documents. The technology determines one or more hash values for the content of a text document. Alternatively, the technology may generate a “sifted text” version of a document. In one implementation described herein, document recognition is used to determine whether the content of one document is copied (i.e., plagiarized) from another document. This is done by comparing hash values of documents (or alternatively their sifted text). In another implementation described herein, document recognition is used to categorize the content of a document so that it may be grouped with other documents in the same category. This abstract itself is not intended to limit the scope of this patent. The scope of the present invention is pointed out in the appending claims.Type: GrantFiled: April 24, 2001Date of Patent: April 8, 2008Assignee: Microsoft CorporationInventors: Ramarathnam Venkatesan, Michael T. Malkin
-
Publication number: 20080037879Abstract: An expansion of the construction and organization of the electronic literary macramé (ELM), the knowledge transfer tool (KTT), or any document of similar type to enrich the connections and associations for their readers, providing for manual author- or editor-defined links and directives for hypertext handling and navigation, easy-to-use indexing capabilities, structuring and presentation of information in a visually-organized form such as a table, list, matrix, tree, pyramid, or other two-dimensional arrangement, with all features integrated into an unobtrusive, and enriched referencing mechanism to assist authors, editors and readers of an ELM, KTT, or other electronic document of similar type.Type: ApplicationFiled: July 25, 2007Publication date: February 14, 2008Inventor: Dana W. Paxson
-
Publication number: 20080025618Abstract: A form processing apparatus extracts layout information and character information from a form document. A candidate extracting unit extracts word candidates from the character information. A frequency digitizing unit calculates emission probability of a word candidate from each element. A relation digitizing unit calculates transition probability that relationship between word candidates is established. An evaluating unit calculates an evaluation value indicative of a probability of appearance of word candidates in respective logical elements. A determining unit determines the element and a word candidate thereof as the element and a character string thereof in the form document, based on the evaluation value.Type: ApplicationFiled: November 15, 2006Publication date: January 31, 2008Inventors: Akihiro Minagawa, Hiroaki Takebe, Katsuhito Fujimoto
-
Patent number: 7317465Abstract: A method of displaying an image may include receiving image data for the image, and defining first and second sub-frames of the image. The first and second sub-frames may have corresponding pluralities of image elements, with each image element of the second sub-frame spatially offset an offset distance from a corresponding image element of the first sub-frame. The first sub-frame may be displayed in a first position, and the second sub-frame may be displayed in a second position. Each displayed image element of the second sub-frame may be spatially offset substantially the offset distance from the corresponding displayed image element of the first sub-frame.Type: GrantFiled: January 27, 2004Date of Patent: January 8, 2008Assignee: Hewlett-Packard Development Company, L.P.Inventors: Will Allen, Edward B. Anderson