Patents by Inventor Jianjun Dou

Jianjun Dou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Character image extracting apparatus and character image extracting method

Patent number: 8750616

Abstract: In an extracting step, the extracting portion obtains a linked component composed of a plurality of mutually linking pixels from a character string region composed of a plurality of characters, and extracts section elements from the character string region, the section elements each being surrounded by a circumscribing figure circumscribing to the linked component. In the first altering step, the first altering portion combines section elements at least having a mutually overlapping part among the extracted section elements so as to prepare a new section element. In the first selecting step, the first selecting portion determines a reference size in advance and selects section elements having a size greater than the reference size, from among the section elements altered in the first altering step.

Type: Grant

Filed: December 21, 2007

Date of Patent: June 10, 2014

Assignee: Sharp Kabushiki Kaisha

Inventors: Bo Wu, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia
Image document processing device, image document processing method, program, and storage medium

Patent number: 8295600

Abstract: An image document processing device extracts a character sequence image having M number of characters in an image document, divides the image into individual character images, extracts features of the individual character images, and based on the features, selects N (N is an integer more than 1) character images in the order of degree of matching from a font-feature dictionary for storing features of all character images according to fonts, and generates an M×N index matrix for the extracted character sequence. In searching, the device searches an index-information storage section with respect to each search character included in a search keyword in an input search expression, and extracts an image document including an index matrix including the search keyword. This provides an image document processing device and an image document processing method each allowing indexing not requiring user's operation and each allowing highly precise searching without OCR recognition.

Type: Grant

Filed: December 7, 2007

Date of Patent: October 23, 2012

Assignee: Sharp Kabushiki Kaisha

Inventors: Bo Wu, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia
Image document processing device, image document processing method, program, and storage medium

Patent number: 8290269

Abstract: A headline-region initial processing section clips a headline-region image in an image document, divides the image into individual character images, and extracts features of the individual character images. Based on the features, a candidate-character-sequence generating section selects N (N is an integer more than 1) character images as candidate characters in the order of degree of matching from a font-feature dictionary for storing features of individual character images, and generates M×N index matrix where M is the number of characters in an extracted character sequence. Based on the index matrix, a document-name generating section generates a meaningful document name according to the image document. An image-document-DB management section manages accumulated image documents using the document name.

Type: Grant

Filed: December 10, 2007

Date of Patent: October 16, 2012

Assignee: Sharp Kabushiki Kaisha

Inventors: Bo Wu, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia
Search and retrieval of documents indexed by optical character recognition

Patent number: 8208765

Abstract: An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided into separate characters. Image features of each character image are extracted. Based on the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters, from a character image feature dictionary which stores the image features of character image in units of character, and a first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting a first column of the first index matrix, is subjected to a lexical analysis according to a language model, and whereby a second index matrix having a character string which makes sense is prepared. In the language model, statistics are taken and then, the lexical analysis is performed.

Type: Grant

Filed: January 10, 2008

Date of Patent: June 26, 2012

Assignee: Sharp Kabushiki Kaisha

Inventors: Bo Wu, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia
Document image processing apparatus

Patent number: 8160402

Abstract: An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided character by character, and image features of each character image are extracted. On the basis of the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters from a character image feature dictionary which stores the image features of character image in units of character, and the first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting the first column of the first index matrix, is subjected to a lexical analysis according to a predetermined language model, whereby a second index matrix adjusted into a character string which makes sense is prepared to be utilized for searching.

Type: Grant

Filed: January 10, 2008

Date of Patent: April 17, 2012

Assignee: Sharp Kabushiki Kaisha

Inventors: Bo Wu, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia
CHARACTER IMAGE EXTRACTING APPARATUS AND CHARACTER IMAGE EXTRACTING METHOD

Publication number: 20090028435

Abstract: In an extracting step, the extracting portion obtains a linked component composed of a plurality of mutually linking pixels from a character string region composed of a plurality of characters, and extracts section elements from the character string region, the section elements each being surrounded by a circumscribing figure circumscribing to the linked component. In the first altering step, the first altering portion combines section elements at least having a mutually overlapping part among the extracted section elements so as to prepare a new section element. In the first selecting step, the first selecting portion determines a reference size in advance and selects section elements having a size greater than the reference size, from among the section elements altered in the first altering step.

Type: Application

Filed: December 21, 2007

Publication date: January 29, 2009

Inventors: Bo Wu, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia
DOCUMENT IMAGE PROCESSING APPARATUS, DOCUMENT IMAGE PROCESSING METHOD, DOCUMENT IMAGE PROCESSING PROGRAM, AND RECORDING MEDIUM ON WHICH DOCUMENT IMAGE PROCESSING PROGRAM IS RECORDED

Publication number: 20090028446

Abstract: An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided into separate characters. Image features of each character image are extracted. Based on the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters, from a character image feature dictionary which stores the image features of character image in units of character, and a first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting a first column of the first index matrix, is subjected to a lexical analysis according to a language model, and whereby a second index matrix having a character string which makes sense is prepared. In the language model, statistics are taken and then, the lexical analysis is performed.

Type: Application

Filed: January 10, 2008

Publication date: January 29, 2009

Inventors: Bo Wu, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia
CHARACTER IMAGE FEATURE DICTIONARY PREPARATION APPARATUS, DOCUMENT IMAGE PROCESSING APPARATUS HAVING THE SAME, CHARACTER IMAGE FEATURE DICTIONARY PREPARATION PROGRAM, RECORDING MEDIUM ON WHICH CHARACTER IMAGE FEATURE DICTIONARY PREPARATION PROGRAM IS RECORDED, DOCUMENT IMAGE PROCESSING PROGRAM, AND RECORDING MEDIUM ON WHICH DOCUMENT IMAGE PROCESSING PROGRAM IS RECORDED

Publication number: 20090028445

Abstract: An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided character by character, and image features of each character image are extracted. On the basis of the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters from a character image feature dictionary which stores the image features of character image in units of character, and the first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting the first column of the first index matrix, is subjected to a lexical analysis according to a predetermined language model, whereby a second index matrix adjusted into a character string which makes sense is prepared to he utilized for searching.

Type: Application

Filed: January 10, 2008

Publication date: January 29, 2009

Inventors: Bo Wu, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia
DOCUMENT IMAGE PROCESSING APPARATUS AND DOCUMENT IMAGE PROCESSING METHOD

Publication number: 20090030882

Abstract: There is provided a document image processing apparatus which can reduce troubles to find a desired heading from a document image. A heading region extracting portion searches an index information DB and extracts a heading region containing a search keyword. An order setting portion automatically sets in line with a predetermined rule an order of the heading regions extracted by the heading region extracting portion. On a displaying portion is displayed a document image on which the heading regions extracted by the heading region extracting portion are highlighted in accordance with the order set by the order setting portion. A display order of search results may be set by determining importance of the extracted heading regions based on the number of the search keyword and features of character images in the heading regions.

Type: Application

Filed: January 10, 2008

Publication date: January 29, 2009

Inventors: Bo Wu, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia
Image document processing device, image document processing method, program, and storage medium

Publication number: 20080181505

Abstract: A headline-region initial processing section clips a headline-region image in an image document, divides the image into individual character images, and extracts features of the individual character images. Based on the features, a candidate-character-sequence generating section selects N (N is an integer more than 1) character images as candidate characters in the order of degree of matching from a font-feature dictionary for storing features of individual character images, and generates M×N index matrix where M is the number of characters in an extracted character sequence. Based on the index matrix, a document-name generating section generates a meaningful document name according to the image document. An image-document-DB management section manages accumulated image documents using the document name.

Type: Application

Filed: December 10, 2007

Publication date: July 31, 2008

Inventors: Bo Wu, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia
IMAGE DOCUMENT PROCESSING DEVICE, IMAGE DOCUMENT PROCESSING METHOD, PROGRAM, AND STORAGE MEDIUM

Publication number: 20080170810

Abstract: An image document processing device extracts a character sequence image having M number of characters in an image document, divides the image into individual character images, extracts features of the individual character images, and based on the features, selects N (N is an integer more than 1) character images in the order of degree of matching from a font-feature dictionary for storing features of all character images according to fonts, and generates an M×N index matrix for the extracted character sequence. In searching, the device searches an index-information storage section with respect to each search character included in a search keyword in an input search expression, and extracts an image document including an index matrix including the search keyword. This provides an image document processing device and an image document processing method each allowing indexing not requiring user's operation and each allowing highly precise searching without OCR recognition.

Type: Application

Filed: December 7, 2007

Publication date: July 17, 2008

Inventors: Bo WU, Jianjun Dou, Ning Le, Yadong Wu, Jing Jia