Abstract: A method of training a system to extract information from documents comprises feeding digital form of training documents to an OCR module, which identifies multiple logical blocks in the documents and text present in the logical blocks. One or more tags for the whole of the document, the logical blocks and word tokens on the document are received by a tagging module. A text input comprising the text identified in the document and the tags for the whole of the document are received by a machine learning module. A first image of the document with layout of the one or more of the identified blocks superimposed, and the tags of the logical blocks in the document are received by the machine learning module, wherein the received text input, first image and tags for the logical blocks corresponds to a plurality of the training documents.
Abstract: In general, techniques are described for performing a visual search in a network. A client device comprising an interface, a feature extraction unit and a feature compression unit may implement various aspects of the techniques. The feature extraction unit extracts feature descriptors from an image. The feature compression unit quantizes the image feature descriptors at a first quantization level. The interface that transmits the first query data to the visual search device via the network. The feature compression unit determines second query data that augments the first query data such that when the first query data is updated with the second query data the updated first query data is representative of the image feature descriptors quantized at a second quantization level. The interface transmits the second query data to the visual search device via the network to successively refine the first query data.
Abstract: A document retrieving apparatus can retrieve a target document and output the retrieved target documents according to ranking when a retrieval keyword or retrieval expression is input. However, it requires a skilful technique to narrow a retrieval range since an appropriate retrieval keyword or retrieval expression needs to be created. A document retrieving apparatus of the present invention reads out and compiles a document list included in a designated area when a user designates an area of a document to be read on a two-dimensional map. When the user designates an area of a document to be read on the two-dimensional map, the document retrieving apparatus of the present invention combines query vectors of a plurality of documents included in a designated area and extracts documents based on a combined query vector.
Abstract: A computer process and tool (I_Sys), or information system, are described which permit electronically archiving information related to archaeological discoveries, in order to allow interested parties to easily consult such information and to allow safer, efficient and systematic preservation of such information. In particular, an efficient archaeological information system is described for the analysis, reconstruction, archiving and knowledge of landscapes, structures, and objects which are representations of antiquity.
Abstract: One embodiment of the present invention provides a system for facilitating social networking based on fashion-related information. During operation, the system receives fashion-related information from a user. Next, the system extracts the user's fashion preferences from the received information and compares the user's fashion preference with other users' fashion preferences. Finally, the system groups users based on similarity of their fashion preferences.
Type:
Application
Filed:
July 2, 2008
Publication date:
January 7, 2010
Applicant:
PALO ALTO RESEARCH CENTER INCORPORATED
Inventors:
Wei Zhang, Takashi Matsumoto, Maurice K. Chu, James M.A. Begole
Abstract: A multi-dimensional database and indexes and operations on the multi-dimensional database are described which include video search applications or other similar sequence or structure searches. Traversal indexes utilize highly discriminative information about images and video sequences or about object shapes. Global and local signatures around keypoints are used for compact and robust retrieval and discriminative information content of images or video sequences of interest. For other objects or structures relevant signature of pattern or structure are used for traversal indexes. Traversal indexes are stored in leaf nodes along with distance measures and occurrence of similar images in the database. During a sequence query, correlation scores are calculated for single frame, for frame sequence, and video clips, or for other objects or structures.
Type:
Application
Filed:
June 18, 2008
Publication date:
December 18, 2008
Applicant:
Zeitera, LLC
Inventors:
Jose Pio Pereira, Mihailo M. Stojancic, Shashank Merchant
Abstract: A system and method of detecting unidentified broadcast electronic media content using a self-similarity technique is presented. The process and system catalogues repeated instances of content that has not be positively identified, but are sufficiently similar as to infer repetitive broadcasts. These catalogued instances may be further processed on the basis of different broadcast channels, sources, geographic locations of broadcasts or format to further assist the identification thereof.