Context Analysis Or Word Recognition (e.g., Character String) Patents (Class 382/229)
  • Patent number: 8131087
    Abstract: A form processing program which is capable of automatically extracting keywords. When the image of a scanned form is entered, a layout recognizer extracts a readout region of the form image, a character recognizer recognizes characters within the readout region. A form logical definition database stores form logical definitions defining strings as keywords according to logical structures which are common to forms of same type. A possible string extractor extracts as possible strings combinations of recognized characters each of which satisfies defined relationships of a string. A linking unit links the possible strings according to positional relationships, and determines a combination of possible strings as keywords.
    Type: Grant
    Filed: July 8, 2008
    Date of Patent: March 6, 2012
    Assignee: Fujitsu Limited
    Inventors: Hiroaki Takebe, Katsuhito Fujimoto
  • Patent number: 8120619
    Abstract: The present invention is intended to speed up rendering that is performed in the course of displaying a document that contains both icons and characters. A character display device that displays a document containing both icons and characters includes: a searching unit that searches the position in the document where an icon appears, a counter unit that counts the number of characters which successively appear until the position where the icon appears searched by the searching unit; and a rendering unit that renders the same number of characters, which successively appear, as the number of characters counted by the counter after designating the attributes shared by the characters, and then renders the icon after designating the attributes of the icon.
    Type: Grant
    Filed: August 23, 2006
    Date of Patent: February 21, 2012
    Assignee: Fujitsu Limited
    Inventor: Makoto Sugimoto
  • Patent number: 8121412
    Abstract: A number of regions and partitions may be created based on input handwritten atoms and a grammar parsing framework. Productions for tabular structures may be added to the grammar parsing framework to produce an extended grammar parsing framework. Each of the regions may be searched for a tabular structure. Upon finding a tabular structure, a type of tabular structure may be determined. Configuration partitions may be created, based on the added productions, and added to the created partitions. A set of configuration regions may be created based on the configuration partitions and added to the created regions. The productions for tabular structures and productions of the grammar parsing framework may be applied, as rewriting rules, to the atoms to produce possible recognition results. A best recognition result may be determined and displayed. A mechanism for correcting misrecognition errors, which may occur while recognizing tabular structures, may be provided.
    Type: Grant
    Filed: June 6, 2008
    Date of Patent: February 21, 2012
    Assignee: Microsoft Corporation
    Inventors: Goran Predovic, Bodin Dresevic
  • Patent number: 8107670
    Abstract: A computer system scans image files for pornographic image content by pre-filtering image files to detect the presence in copyright data fields of stored items of copyright information deemed to indicate that the image file is one of acceptable or unacceptable. On detecting such items of copyright information, a signal is output indicating that the image file does or does not contain pornographic image content without the need to analyse the image content of the image file.
    Type: Grant
    Filed: March 11, 2008
    Date of Patent: January 31, 2012
    Assignee: Symantec Corporation
    Inventor: Mark Songhurst
  • Publication number: 20120020578
    Abstract: Establishments are identified in geo-tagged images. According to one aspect, text regions are located in a geo-tagged image and text strings in the text regions are recognized using Optical Character Recognition (OCR) techniques. Text phrases are extracted from information associated with establishments known to be near the geographic location specified in the geo-tag of the image. The text strings recognized in the image are compared with the phrases for the establishments for approximate matches, and an establishment is selected as the establishment in the image based on the approximate matches. According to another aspect, text strings recognized in a collection of geo-tagged images are compared with phrases for establishments in the geographic area identified by the geo-tags to generate scores for image-establishment pairs. Establishments in each of the large collection of images as well as representative images showing each establishment are identified using the scores.
    Type: Application
    Filed: September 27, 2011
    Publication date: January 26, 2012
    Applicant: GOOGLE INC.
    Inventors: Tal Yadid, Yuval Netzer, Shlomo Urbach, Andrea Frome, Noam Beh-Haim
  • Publication number: 20120020577
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for communicating information about transcription progress from a unified messaging (UM) server to a UM client. In one embodiment, the transcription progress describes speech to text transcription of speech messages such as voicemail. The UM server authenticates and establishes a session with a UM client, then receives a get message list request from a UM client as of a first time, responds to the get message list request with a view of a state of messages and available transcriptions for transcribable messages in a list of messages associated with the get message list call at the first time, and, at a second time subsequent to the first time, transmits to the UM client a notification that provides an indication of progress for at least one transcription not yet complete in the list of messages. The messages can include video.
    Type: Application
    Filed: July 22, 2010
    Publication date: January 26, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Mehrad YASREBI, James JACKSON, John E. LEMAY
  • Patent number: 8103110
    Abstract: A method for classifying text comprises receiving data containing text and parsing a plurality of tokens out of the text. A plurality of metatokens are generated for each token, wherein the metatokens comprise strings of text and groupings of strings of text. The method further comprises calculating a probability that the data falls into a certain category, using the tokens and metatokens. The probability is compared to a threshold value and the data is classified into the certain category if the probability is greater than the threshold value.
    Type: Grant
    Filed: May 15, 2008
    Date of Patent: January 24, 2012
    Inventor: Stefan A. Berteau
  • Patent number: 8102284
    Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound word text input. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device.
    Type: Grant
    Filed: July 22, 2009
    Date of Patent: January 24, 2012
    Assignee: Research In Motion Limited
    Inventors: Vadim Fux, Michael Elizarov
  • Publication number: 20120014613
    Abstract: A method, apparatus, and system, for scanning a first portion of a data to generate a second portion of data is provided. A control parameter relating to a level of detail associated with filtering a first portion of data is received. The filtering of the first portion of data is performed based upon the control parameter. The filtering of the first portion of data includes a rule-based filtering, a context-based filtering, a statistical-based filtering, or a semantic-based filtering. Performing the filtering provides for a reduction of a portion of the first portion of data. A second portion of data that is smaller than the first portion of data is provided based upon the filtering of the first portion of data.
    Type: Application
    Filed: September 25, 2011
    Publication date: January 19, 2012
    Inventors: Devang K. Naik, Kim E. A. Silverman
  • Patent number: 8098939
    Abstract: An adversarial approach in detecting inappropriate text content in images. An expression from a listing of expressions may be selected. The listing of expressions may include words, phrases, or other textual content indicative of a particular type of message. Using the selected expression as a reference, the image is searched for a section that could be similar to the selected expression. The similarity between the selected expression and the section of the image may be in terms of shape. The section may be scored against the selected expression to determine how well the selected expression matches the section. The score may be used to determine whether or not the selected expression is present in the image.
    Type: Grant
    Filed: May 16, 2007
    Date of Patent: January 17, 2012
    Assignee: Trend Micro Incorporated
    Inventor: Jonathan James Oliver
  • Publication number: 20120002889
    Abstract: A method for producing a slide show video from a collection of hardcopy media, the method includes digitizing the media and detecting handwritten information and estimating the age of the media; determining an order of presentation for the slide show video based on the detected handwritten information and estimated ages; and producing a slide show video from the hardcopy media using the determined order of presentation.
    Type: Application
    Filed: June 30, 2010
    Publication date: January 5, 2012
    Inventors: Andrew C. Blose, Andrew C. Gallagher, Joseph A. Manico, Charles L. Parker
  • Patent number: 8090202
    Abstract: A document processing apparatus includes a region extracting unit that extracts a plurality of regions in a document image, a recognition unit that recognizes a character string, a conversion unit that converts the recognized character string, a setting unit that sets first boundary lines that surrounds the document image and at least one second boundary line in a space between adjacent regions of the plurality of regions, an enlargement/reduction unit that moves in parallel at least one line of the first and second boundary lines under a restraint condition that at least one line does not intersect any of the plurality of regions, and enlarges or reduces at least one of the regions in accordance with the parallel movement so long as each region does not get out of a cell; and an insertion unit that inserts the converted character string into each of the regions.
    Type: Grant
    Filed: March 18, 2009
    Date of Patent: January 3, 2012
    Assignee: Fuji Xerox Co., Ltd.
    Inventor: Yuya Konno
  • Publication number: 20110321103
    Abstract: Systems and methods are operable to identify videos of interest using information acquired by a portable electronic device. An exemplary embodiment receives the acquired information pertaining to a video of interest, wherein the acquired information was acquired by the portable electronic device; determines an identity of the video of interest based upon the acquired information; and communicates the video of interest to a media device.
    Type: Application
    Filed: June 23, 2010
    Publication date: December 29, 2011
    Applicant: ECHOSTAR BROADCASTING CORPORATION
    Inventor: Kevin Yao
  • Patent number: 8081188
    Abstract: A determining unit determines a vector conversion method for a character image based on the character image and model information of a terminal device that receives the character image. A processing unit performs a vector conversion on the character image by the vector conversion method determined by the determining unit.
    Type: Grant
    Filed: December 17, 2007
    Date of Patent: December 20, 2011
    Assignee: Ricoh Company, Limited
    Inventor: Yuka Kihara
  • Patent number: 8077983
    Abstract: A system and method for character error correction is provided, useful for a user of mobile appliances to produce written text with reduced errors. The system includes an interface, a word prediction engine, a statistical engine, an editing distance calculator, and a selector. A string of characters, known as the inputted word, may be entered into the mobile device via the interface. The word prediction engine may then generate word candidates similar to the inputted word using fuzzy logic and user preferences generated from past user behavior. The statistical engine may then generate variable error costs determined by the probability of erroneously inputting any given character. The editing distance calculator may then determine the editing distance between the inputted word and each of the word candidates by grid comparison using the variable error costs. The selector may choose one or more preferred candidates from the word candidates using the editing distances.
    Type: Grant
    Filed: October 4, 2007
    Date of Patent: December 13, 2011
    Assignee: Zi Corporation of Canada, Inc.
    Inventors: Weigen Qiu, Samuel Yin Lun Pun
  • Patent number: 8077975
    Abstract: Described is a bimodal data input technology by which handwriting recognition results are combined with speech recognition results to improve overall recognition accuracy. Handwriting data and speech data corresponding to mathematical symbols are received and processed (including being recognized) into respective graphs. A fusion mechanism uses the speech graph to enhance the handwriting graph, e.g., to better distinguish between similar handwritten symbols that are often misrecognized. The graphs include nodes representing symbols, and arcs between the nodes representing probability scores. When arcs in the first and second graphs are determined to match one another, such as aligned in time and associated with corresponding symbols, the probability score in the second graph for that arc is used to adjust the matching probability score in the first graph. Normalization and smoothing may be performed to correspond the graphs to one another and to control the influence of one graph on the other.
    Type: Grant
    Filed: February 26, 2008
    Date of Patent: December 13, 2011
    Assignee: Microsoft Corporation
    Inventors: Lei Ma, Yu Shi, Frank Kao-ping Soong
  • Patent number: 8077984
    Abstract: A computer implemented method and an apparatus for comparing spans of text are disclosed. The method includes computing a similarity measure between a first sequence of symbols representing a first text span and a second sequence of symbols representing a second text span as a function of the occurrences of optionally noncontiguous subsequences of symbols shared by the two sequences of symbols. Each of the symbols comprises at least one consecutive word and is defined according to a set of linguistic factors. Pairs of symbols in the first and second sequences that form a shared subsequence of symbols are each matched according to at least one of the factors.
    Type: Grant
    Filed: January 4, 2008
    Date of Patent: December 13, 2011
    Assignee: Xerox Corporation
    Inventors: Nicola Cancedda, Pierre Mahé
  • Patent number: 8068982
    Abstract: An on-vehicle navigation system reliably performs guidance of a necessary exiting authorized section while avoiding the guidance of a needless exiting authorized section on a plural-lane road having a normal lane and a special lane where advancing from the normal lane in a predetermined advancing authorized section and exiting to the normal lane in a predetermined exiting authorized section are authorized. After discriminating whether the vehicle is driving in the special lane of the plural-lane road based on whether a characteristic object has been detected by a characteristic object detecting section, the guidance of an exiting authorized section is avoided in the case where the vehicle deviated from a recommended route including a special lane, and the guidance of an exiting authorized section is performed in the case where the vehicle deviated from a recommended route not including a special lane and drives in a special lane.
    Type: Grant
    Filed: March 14, 2008
    Date of Patent: November 29, 2011
    Assignee: Alpine Electronics, Inc.
    Inventor: Takayuki Takada
  • Patent number: 8064703
    Abstract: A method of extracting data from a document image includes selecting a document classification for the document image that includes text. The classification is selected from a plurality of predetermined document classifications based on recognized text. The method also includes selecting rules from a database of rules based on the document classification. The rules define data elements to be populated based on recognized document text. The method also includes selecting target data elements from a database of data elements based on the selected document classification and the selected rules. The method also includes recognizing selected portions of the document image. The selected portions are determined by the selected rules. Recognizing selected portions of the document image generates one or more character strings based on recognized text. The method further includes comparing a specific character string to a target data element, thereby producing a match measure based on the comparison.
    Type: Grant
    Filed: March 17, 2006
    Date of Patent: November 22, 2011
    Assignee: Data Trace Information Services, LLC
    Inventors: Daniel A. Newcomer, Jon Scott Seely, Dennis Lee Branham, Paul Kosan
  • Patent number: 8059896
    Abstract: A character recognition processing system includes a character recognition confidence evaluating unit that evaluates whether confidence of character recognition of a plurality of areas are low or high, a character area classification unit that classifies a first area evaluated low by the character recognition confidence evaluating unit into a plurality of components, a character separation unit that separates the components classified by the character area classification unit into a character component and non-character components, according to information relating to a second area evaluated high by the character recognition confidence evaluating unit, and a first character recognition unit that performs character recognition processing for the character component separated by the character separation unit.
    Type: Grant
    Filed: February 23, 2007
    Date of Patent: November 15, 2011
    Assignee: Fuji Xerox Co., Ltd.
    Inventor: Etsuko Ito
  • Patent number: 8059868
    Abstract: A license plate recognition apparatus includes a detection unit configured to detect a plurality of quadrangles of license plate region candidates from input images, a character recognition unit configured to execute character recognition of a character region included in the license plate region candidate detected, and an output unit configured to select a license plate region candidate to be output from among the plurality of license plate candidates detected by the detection unit based on the character recognition result and information of the quadrangle of the respective license plate region candidates and output information relating to the license plate region candidates selected.
    Type: Grant
    Filed: February 28, 2008
    Date of Patent: November 15, 2011
    Assignee: Canon Kabushiki Kaisha
    Inventors: Hideaki Matsumoto, Ichiro Umeda
  • Patent number: 8045808
    Abstract: A pure adversarial optical character recognition (OCR) approach in identifying text content in images. An image and a search term are input to a pure adversarial OCR module, which searches the image for presence of the search term. The image may be extracted from an email by an email processing engine. The OCR module may split the image into several character-blocks that each has a reasonable probability of containing a character (e.g., an ASCII character). The OCR module may form a sequence of blocks that represent a candidate match to the search term and calculate the similarity of the candidate sequence to the search term. The OCR module may be configured to output whether or not the search term is found in the image and, if applicable, the location of the search term in the image.
    Type: Grant
    Filed: August 16, 2007
    Date of Patent: October 25, 2011
    Assignee: Trend Micro Incorporated
    Inventor: Jonathan James Oliver
  • Publication number: 20110255795
    Abstract: An apparatus and a method for character string recognition for correctly recognizing a character string placed on a medium, even in a recognition process system in which a plurality of formats are handled. An image processing area is set on a medium. The image processing area is divided in a placement direction of character strings so as to make up a plurality of segments. An image data projection in a direction of character strings is calculated for each segment. The number of character string lines for each segment is calculated according to the image data projection. The number of character string lines is determined for the image processing area as a whole, according to the number of character string lines for each segment, and it is judged whether or not the character strings are predetermined character strings.
    Type: Application
    Filed: April 18, 2011
    Publication date: October 20, 2011
    Inventor: Hiroshi NAKAMURA
  • Patent number: 8041127
    Abstract: One embodiment of the present invention provides a system that obscures critical information communicated over a network. During operation, the system receives a set of data and produces a file which represents a character in the data with at least one image, thereby avoiding representing the data in plain text and reducing the risk of scraping. The system then communicates the file to a client, thereby allowing the client to present the data using the embedded images.
    Type: Grant
    Filed: November 30, 2006
    Date of Patent: October 18, 2011
    Assignee: Intuit Inc.
    Inventor: James E. Whitelaw
  • Patent number: 8041126
    Abstract: A method, apparatus, and system, for scanning a first portion of a data to generate a second portion of data is provided. A control parameter relating to a level of detail associated with filtering a first portion of data is received. The filtering of the first portion of data is performed based upon the control parameter. The filtering of the first portion of data includes a rule-based filtering, a context-based filtering, a statistical-based filtering, or a semantic-based filtering. Performing the filtering provides for a reduction of a portion of the first portion of data. A second portion of data that is smaller than the first portion of data is provided based upon the filtering of the first portion of data.
    Type: Grant
    Filed: September 21, 2004
    Date of Patent: October 18, 2011
    Assignee: Apple Inc.
    Inventors: Devang K. Naik, Kim E. A. Silverman
  • Patent number: 8027832
    Abstract: A system and methods of language identification of natural language text are presented. The system includes stored expected character counts and variances for a list of characters found in a natural language. Expected character counts and variances are stored for multiple languages to be considered during language identification. At run-time, one or more languages are identified for a text sample based on comparing actual and expected character counts. The present methods can be combined with upstream analyzing of Unicode ranges for characters in the text sample to limit the number of languages considered. Further, n-gram methods can be used in downstream processing to select the most probable language from among the languages identified by the present system and methods.
    Type: Grant
    Filed: February 11, 2005
    Date of Patent: September 27, 2011
    Assignee: Microsoft Corporation
    Inventors: William D. Ramsey, Patricia M. Schmid, Kevin R. Powell
  • Patent number: 8023745
    Abstract: Methods, systems, and computer-readable media for ascertaining neighborhood information in a dynamically changing environment, such as an electronic ink environment may include: (a) receiving data representing plural electronic ink strokes; (b) defining a first vertex associated with a first ink stroke; and (c) determining neighboring vertices to the first vertex, wherein the neighboring vertices are associated with ink stroke(s) other than the first ink stroke. Additional systems, methods, and computer-readable media may include: (a) receiving data representing plural electronic ink strokes; (b) defining plural vertices associated with the ink strokes; (c) receiving input indicating a selection of an ink component; and (d) determining at least one neighboring component by determining which ink component(s) located outside of the selection include one or more ink strokes having vertices that neighbor vertices included in the selection.
    Type: Grant
    Filed: April 10, 2009
    Date of Patent: September 20, 2011
    Assignee: Microsoft Corporation
    Inventors: Herry Sutanto, Ming Ye, Sashi Raghupathy
  • Patent number: 8023741
    Abstract: Aspects of the present invention are related to systems and methods for determining the location of numerals in an electronic document image.
    Type: Grant
    Filed: May 23, 2008
    Date of Patent: September 20, 2011
    Assignee: Sharp Laboratories of America, Inc.
    Inventors: Ahmet Mufit Ferman, Richard John Campbell
  • Publication number: 20110222769
    Abstract: Page segmentation in an optical character recognition process is performed to detect textual objects and/or image objects. Textual objects in an input gray scale image are detected by selecting candidates for native lines which are sets of horizontally neighboring connected components (i.e., subsets of image pixels where each pixel from the set is connected with all remaining pixels from the set) having similar vertical statistics defined by values of baseline (the line upon which most text characters “sit”) and mean line (the line under which most of the characters “hang”). Binary classification is performed on the native line candidates to classify them as textual or non-textual through examination of any embedded regularity. Image objects are indirectly detected by detecting the image's background using the detected text to define the background. Once the background is detected, what remains (i.e., the non-background) is an image object.
    Type: Application
    Filed: March 10, 2010
    Publication date: September 15, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Sasa Galic, Bogdan Radakovic, Nikola Todic
  • Publication number: 20110222788
    Abstract: An information processing device includes a recognition section for recognizing a feature keyword representing a feature of at least part of text content, an additional information acquisition section for acquiring additional information related to the text content from an outside of the text content in response to the recognized feature keyword, and a control section for controlling the additional information acquired by the additional information acquisition section to be output along with the part of the text content.
    Type: Application
    Filed: February 9, 2011
    Publication date: September 15, 2011
    Applicant: Sony Corporation
    Inventors: Motoki TSUNOKAWA, Masaaki Hoshino, Kenichiro Kobayashi
  • Publication number: 20110222789
    Abstract: Reduction of a processing load, and shortening of a processing time, is realized by performing character string sensing processing on an image. A character string sensing device senses a character string including at least one character from an image. The character string sensing device includes a character information storage unit in which an evaluation value, expressing difficulty of false sensing of the character, is stored in each character. The character string sensing device also includes a search sequence determining unit that determines a search sequence of each character based on the evaluation value of each character included in a keyword input to the character string sensing device as the character string to be sensed. The evaluation value is stored in the character information storage unit. A character search unit searches each character included in the keyword according to the determined search sequence.
    Type: Application
    Filed: February 24, 2011
    Publication date: September 15, 2011
    Applicant: OMRON CORPORATION
    Inventor: Tomoyoshi Aizawa
  • Patent number: 8020119
    Abstract: A parsing system provides a parsed document to a user application labeling the document with indication symbols according to a scheme associated with the parsing results. Users are enabled to insert correction indicators such as handwritten gestures, icon selections, menu item selections, and the like in conjunction with the indication symbols. The document is re-analyzed performing the requested corrections such as line or block separations, line, block, word connections, etc. The operations provide support for the engine stack of the parsing system while accommodating independent user interfaces employed by the users. Insertion of correction indicators and subsequent re-analysis for correction may be performed upon user signal, in an iterative manner, or continuously.
    Type: Grant
    Filed: December 14, 2007
    Date of Patent: September 13, 2011
    Assignee: Microsoft Corporation
    Inventors: Sashi Raghupathy, Ming Ye, Victoria H. Chou
  • Patent number: 8019158
    Abstract: A method for altering a recognition error correction data structure, the method includes: altering at least one key out of a set of semantically similar keys in response to text appearance probabilities of keys of the set of semantically similar keys to provide an at least one altered key; and replacing the at least one key by the at least one altered key.
    Type: Grant
    Filed: January 2, 2008
    Date of Patent: September 13, 2011
    Assignee: International Business Machines Corporation
    Inventors: Ella Barkan, Tal Drory, André Heilper
  • Publication number: 20110213613
    Abstract: A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.
    Type: Application
    Filed: May 24, 2010
    Publication date: September 1, 2011
    Inventors: Michael H. Cohen, Shumeet Baluja, Pedro J. Moreno
  • Patent number: 8005300
    Abstract: An image search system includes a first calculation section that calculates a first similarity score of each registered image with respect to an input image on the basis of image features of the registered and the input image, a second calculation section that calculates a second similarity score of each registered image with respect to the input image on the basis of text features of the registered and the input image, a candidate extraction section that extracts one or more candidate images on the basis of the first and the second similarity scores of each registered image, a third calculation section that calculates a third similarity score of each candidate image on the basis of projection waveforms of the input image and the candidate image, and a search section that determines one or more registered images similar to the input image on the basis of the third similarity score.
    Type: Grant
    Filed: March 2, 2010
    Date of Patent: August 23, 2011
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Takahiro Koyama, Shigehisa Kawabe
  • Patent number: 8000504
    Abstract: Systems and methods for classifying content as adult content and, if desired, blocking content so classified from presentation to a user are provided. Received content is analyzed using a sequential series of classification techniques, each successive technique being implemented only if the previous technique did not result in classification of the content as adult content. In this way, adult content may be identified across a variety of different media types (e.g., text, images, video, etc.) and yet processing power may be reserved if one or more techniques requiring less power is sufficient to determine that the received content is, in fact, adult content. Content classification may be performed in-band (that is, in substantially real-time such that content may be identified and/or blocked at the time results of a user query are returned) or out-of-band (that is, prospectively as new content is received but not in association with a user query).
    Type: Grant
    Filed: August 3, 2007
    Date of Patent: August 16, 2011
    Assignee: Microsoft Corporation
    Inventors: Xiadong Fan, Richard Qian
  • Patent number: 8000956
    Abstract: A computer implemented system and method for processing text are disclosed. Partially processed text, in which named entities have been extracted by a standard named entity system, is processed to identify attributive relations between a named entity or proper noun and a corresponding attribute. A concept for the attribute is identified and, in the case of a named entity, compared with the named entity's context, enabling a confirmation or conflict between the two to be determined. In the case of a proper name, the attribute's context can be associated with the proper name, allowing the proper name to be recognized as a new named entity.
    Type: Grant
    Filed: February 8, 2008
    Date of Patent: August 16, 2011
    Assignee: Xerox Corporation
    Inventors: Caroline Brun, Caroline Hagege
  • Patent number: 7996227
    Abstract: There is disclosed a system and method for interpreting and describing graphic images. In an embodiment, the method of inserting a description of an image into an audio recording includes: interpreting an image and producing a word description of the image including at least one image keyword; parsing an audio recording into a plurality of audio clips, and producing a transcription of each audio clip, each audio clip transcription including at least one audio keyword; calculating a similarity distance between the at least one image keyword and the at least one audio keyword of each audio clip; and selecting the audio clip transcription having a shortest similarity distance to the at least one image keyword as a location to insert the word description of the image. The word description of the image can then be appended to the selected audio clip to produce an augmented audio recording including the interpreted word description of the image.
    Type: Grant
    Filed: October 3, 2007
    Date of Patent: August 9, 2011
    Assignee: International Business Machines Corporation
    Inventors: Peter C. Boyle, Yu Zhang
  • Patent number: 7990561
    Abstract: The present invention decides whether an OCR processing is necessary or not for a printing job by using a difference between text data extracted by performing the OCR processing on an image generated based on a previous printing job having been processed previously and text data extracted from text drawing command of the previous printing job having been processed previously. If the OCR processing is decided to be unnecessary, the text data extracted from the text drawing command of the printing job is registered in a database for retrieving an image data. If the OCR processing is decided to be necessary, text data extracted by performing OCR processing on the image data generated based on the drawing commands of the printing job and the text data extracted from the text drawing command of the printing job are registered in a database for retrieving an image data.
    Type: Grant
    Filed: July 14, 2008
    Date of Patent: August 2, 2011
    Assignee: Canon Kabushiki Kaisha
    Inventor: Kouya Okabe
  • Patent number: 7986843
    Abstract: A computer-implemented method of managing information is disclosed. The method can include receiving a message from a mobile device configured to connect to a mobile device network (the message including a digital image taken by the mobile device and including information corresponding to words), determining the words from the digital image information using optical character recognition, indexing the digital image based on the words, and storing the digital image for later retrieval of the digital image based on one or more received search terms.
    Type: Grant
    Filed: November 29, 2006
    Date of Patent: July 26, 2011
    Assignee: Google Inc.
    Inventors: Krishnendu Chaudhury, Ashutosh Garg, Prasenjit Phukan, Arvind Saraf
  • Publication number: 20110170144
    Abstract: Proposed is the use of a document widget for representing a property of a document. The document widget comprises: a human-readable portion for interpretation by a user; and a machine-readable portion representing the document property. By comprising information about a property of a document, a document widget may be processed in accordance with an optical recognition process so as to identify the document widget and enable extraction of the document property.
    Type: Application
    Filed: April 13, 2010
    Publication date: July 14, 2011
    Inventors: Yogesh SANKARASUBRAMANIAM, Krusheel MUNNANGI, Serene BANERJEE, Anjaneyulu Seetha Rama KUCHIBHOTLA, Abhishek CHAKRABORTY, Nagabhushana Ayyanahal MATAD
  • Publication number: 20110170788
    Abstract: According to various embodiments of the invention, methods are provided for capturing various data fields from mobile and scanned images of business cards. Most embodiments are provided for capturing Personal and Company name fields, which are difficult to identify using conventional OCR and data capture techniques. In addition, some embodiments of the invention involve methods for capturing an email, URL or telephone number from an image of a business card.
    Type: Application
    Filed: January 12, 2010
    Publication date: July 14, 2011
    Inventor: GRIGORI NEPOMNIACHTCHI
  • Publication number: 20110158548
    Abstract: A word recognition method in which as a result of a recognition process performed on an image of a character string, one or more character candidates are obtained for each of characters forming the character string, according to which a word corresponding to the character string is recognized using a word database having registered therein a plurality of words includes setting a predetermined number of words included in the word database, as initial word candidates, performing a process in which the characters forming the recognition target character string are set as processing targets, one character by one character, and every time a processing target character is set, word candidates present at a time of the setting are narrowed down to words in which character candidates obtained for the processing target character are arranged at a same location as a location where the processing target character is arranged in the recognition target character string, and identifying, when a narrowing-down process perfor
    Type: Application
    Filed: October 29, 2010
    Publication date: June 30, 2011
    Applicant: OMRON CORPORATION
    Inventor: Tomoyoshi Aizawa
  • Patent number: 7970213
    Abstract: Various embodiments of the invention describe a method, system and computer-readable storage medium containing instructions for improving the recognition of text present in an image. The image is processed by applying different operators to the image to obtain multiple processed versions of the image. Thereafter, characters and location information of the characters from each of the multiple processed versions of the image are obtained. The location information includes the pixel coordinates of the characters in the text. The text present in the image is edited, based on the relative location of the characters, to improve the recognition of the text in the image.
    Type: Grant
    Filed: May 21, 2007
    Date of Patent: June 28, 2011
    Assignee: A9.Com, Inc.
    Inventors: Mark A. Ruzon, Supratik Bhattacharya
  • Patent number: 7965886
    Abstract: The present invention provides a computer implemented process for detecting multi-view multi-pose objects. The process comprises training of a classifier for each intra-class exemplar, training of a strong classifier and combining the individual exemplar-based classifiers with a single objective function. This function is optimized using the two nested AdaBoost loops. The first loop is the outer loop that selects discriminative candidate exemplars. The second loop, the inner loop selects the discriminative candidate features on the selected exemplars to compute all weak classifiers for a specific position such as a view/pose. Then all the computed weak classifiers are automatically combined into a final classifier (strong classifier) which is the object to be detected.
    Type: Grant
    Filed: June 13, 2007
    Date of Patent: June 21, 2011
    Assignee: SRI International
    Inventors: Feng Han, Ying Shan, Harpreet Singh Sawhney, Rakesh Kumar
  • Patent number: 7966352
    Abstract: A system and process for harvesting context information from selected content is described. One may use a stylus to indicate what content is to be captured. The context information that may be associated with selected content may include URLs, file names, folder names, text from the content, and ink.
    Type: Grant
    Filed: January 26, 2004
    Date of Patent: June 21, 2011
    Assignee: Microsoft Corporation
    Inventors: Vikram Madan, Issa Khoury, Gerhard Schobbe, Guy Barker, Judy Tandog
  • Patent number: 7966552
    Abstract: A method consistent with certain embodiments of identifying a functional command set for an access device that accesses television programming provided by a service provider at a control device involves transmitting a first command from a first command set of a group of possible command sets for the access device to the access device. The first command includes a command that is expected to cause the access device to generate a text containing video frame. A determination is made as to whether the extracted text corresponds to the first command set. The first command set is identified as the functional command set for the access device in response to determining that the extracted text corresponds to the first command set. This abstract is not to be considered limiting, since other embodiments may deviate from the features described in this abstract.
    Type: Grant
    Filed: February 14, 2007
    Date of Patent: June 21, 2011
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventor: Brant L. Candelore
  • Patent number: 7958164
    Abstract: A system that provides a visual mechanism (e.g., user interface (UI)) by which a user can design a regular expression is provided. The graphical interactive mechanism enables a user to develop regular expressions without an understanding of the intricacies of the regular expression syntax. The UI can provide an interactive mechanism by which a user can graphically annotate (e.g., color, highlight) a regular expression thus, mapping the expression to a particular tabulated output. The novel UI can provide a particular kind of dialog layout with several controls and dynamically linked views, e.g., a data view, a regular expression view and a column view which can facilitate definition of the regular expression as well as creation of mappings to output columns (e.g., annotations).
    Type: Grant
    Filed: February 16, 2006
    Date of Patent: June 7, 2011
    Assignee: Microsoft Corporation
    Inventors: Sergei Ivanov, J. Kirk Haselden
  • Patent number: 7953602
    Abstract: Character information recognition means extracts, through a character recognition process, character information from a selection button included in an index image. Based on text data having been outputted from the character information recognition means, index dictionary creation means creates an index dictionary usable for a speech recognition process performed by speech recognition means. The speech recognition means performs the speech recognition process by using speech data retrieved through an ADC and the index dictionary stored in storage means. Based on a result of the speech recognition process performed by the speech recognition means, reproduction control means performs reproduction control of a chapter. Thus, a desired button can be selected by speech, from chapter selection buttons displayed on a chapter selection image of a DVD video.
    Type: Grant
    Filed: November 4, 2005
    Date of Patent: May 31, 2011
    Assignee: Panasonic Corporation
    Inventors: Atsushi Iisaka, Atsushi Yamashita, Takuya Hirai
  • Patent number: 7953295
    Abstract: Methods, systems, and apparatus including computer program products for enhancing text in images are provided. In one implementation, a computer-implemented method is provided. The method includes receiving a plurality of images each image including a corresponding version of an identified candidate text region and aligning each candidate text region from the plurality of images to a high resolution grid. The method also includes compositing the aligned candidate text regions to create a single superresolution image and performing character recognition on the superresolution image to identify text.
    Type: Grant
    Filed: June 29, 2006
    Date of Patent: May 31, 2011
    Assignee: Google Inc.
    Inventors: Luc Vincent, Adrian Ulges