Using Extracted Text (epo) Patents (Class 707/E17.022)
  • Patent number: 12072837
    Abstract: An integrated digital-analog archiving system can automatically initiate a migration process to move electronic documents to a media library. For each electronic document, the system may retrieve the electronic document from a digital data storage medium, extract metadata from the electronic document, determine size, orientation, and format of the electronic document, generate indicators for indicating the start and end of the electronic document to be stored on an analog data storage medium, generate an analog document identifier for identifying the electronic document on the analog data storage medium, generate a scaled image of the electronic document based on the size, orientation, and format of the electronic document, generate a text string based at least in part on the extracted metadata, and render the indicators, the analog document identifier, the scaled image of the electronic document, and the text string on the analog data storage medium.
    Type: Grant
    Filed: February 7, 2023
    Date of Patent: August 27, 2024
    Assignee: OPEN TEXT SA ULC
    Inventor: Matthias Specht
  • Patent number: 11989922
    Abstract: A system includes a computing platform having processing hardware, and a memory storing software code. The processing hardware is configured to execute the software code to receive an image having a plurality of image regions, determine a boundary of each of the image regions to identify a plurality of bounded image regions, and identify, within each of the bounded image regions, one or more image sub-regions to identify a plurality of image sub-regions. The processing hardware is further configured to execute the software code to identify, within each of the bounded image regions, one or more first features, respectively, identify, within each of the image sub-regions, one or more second features, respectively, and provided an annotated image by annotating each of the bounded image regions using the respective first features and annotating each of the image sub-regions using the respective second features.
    Type: Grant
    Filed: February 18, 2022
    Date of Patent: May 21, 2024
    Assignee: Disney Enterprises, Inc.
    Inventors: Miquel Angel Farre Guiu, Monica Alfaro Vendrell, Pablo Pernias, Francesc Josep Guitart Bravo, Marc Junyent Martin, Albert Aparicio Isarn, Anthony M. Accardo, Steven S. Shapiro
  • Patent number: 11910060
    Abstract: This relates to using a computer simulation to test another computer program in real time or simulated real time that is sped up. The disclosed method and system synchronizes information input into the simulation so that the program under test operates in an independent way. The method and system operates a protocol to connect one running computer process, a trading computer program, with another running process, a computer program that executes a market simulation in order to optimize the quality and speed of the simulation and testing of the external computer program.
    Type: Grant
    Filed: May 14, 2021
    Date of Patent: February 20, 2024
    Assignee: Caspian Hill Group, LLC
    Inventors: Amy Bolivar, Steven Lubin, Audrey Faust
  • Patent number: 11847142
    Abstract: There is provided a system configured to appropriately determine a topic count in accordance with LDA to estimate latent meanings of a document. For a plurality of documents d, a perplexity PPL of each document d is evaluated in accordance with a document generation probability in which the document d is generated when topic counts N for defining a topic model based on the LDA as a document generation model are hypothetically specified as different values and word groups are specified by different random numbers. The topic model is defined by a reference topic count No determined by combining a first topic count N1 (the number of topics indicating a highest cumulative frequency at which the perplexity PPL first indicates a minimum value) and a second topic count N2 (the number of topics indicating a highest cumulative frequency at which the perplexity PPL indicates a smallest value).
    Type: Grant
    Filed: February 22, 2021
    Date of Patent: December 19, 2023
    Assignee: HONDA MOTOR CO., LTD.
    Inventor: Takamasa Suzuki
  • Patent number: 11842035
    Abstract: In example embodiments, techniques are provided for efficiently labeling, reviewing and correcting predictions for P&IDs in image-only formats. To label text boxes in the P&ID, the labeling application executes an OCR algorithm to predict a bounding box around, and machine-readable text within, each text box, and displays these predictions in its user interface. The labeling application provides functionality to receive a user confirmation or correction for each predicted bounding box and predicted machine-readable text. To label symbols in the P&ID, the labeling application receives user input to draw bounding boxes around symbols and assign symbols to classes of equipment. Where there are multiple occurrences of specific symbols, the labeling application provides functionality to duplicate and automatically detect and assign bounding boxes and classes.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: December 12, 2023
    Assignee: Bentley Systems, Incorporated
    Inventors: Karl-Alexandre Jahjah, Marc-André Gardner
  • Patent number: 11816983
    Abstract: The present invention is directed to a helmet wearing determination system including a imaging means that is installed in a predetermined position and images a two-wheel vehicle that travels on a road; and a helmet wearing determination means that processes an image imaged by the imaging means, estimates a rider head region corresponding to a head of a person who rides on the two-wheel vehicle that travels on the road, compares image characteristics of the rider head region with image characteristics according to the head at a time when a helmet is worn or/and at a time when a helmet is not worn, and determines whether or not the rider wears the helmet.
    Type: Grant
    Filed: April 9, 2021
    Date of Patent: November 14, 2023
    Assignee: NEC CORPORATION
    Inventor: Katsuhiko Takahashi
  • Patent number: 11804053
    Abstract: An image recognition method and a terminal, where the method includes obtaining, by the terminal, an image file comprising a target object, recognizing, by the terminal, the target object based on an image recognition model in the terminal to obtain object category information of the target object, and storing, by the terminal, the object category information as first label information of the target object. Hence, image recognition efficiency of the terminal can be improved, and privacy of a terminal user can be effectively protected.
    Type: Grant
    Filed: April 26, 2021
    Date of Patent: October 31, 2023
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Changzhu Li, Xiyong Wang
  • Patent number: 11800036
    Abstract: Examples disclosed herein relate to identifying a plurality of content areas of a document to be scanned, classifying each of the plurality of content areas into a content type, determining a minimum scanning resolution to maintain readability for each of the plurality of content areas according to the classified content type, and performing a scan of the document to a digital file, wherein each of the plurality of content areas is scanned at least at the determined minimum scanning resolution to maintain readability of the respective content area.
    Type: Grant
    Filed: January 23, 2020
    Date of Patent: October 24, 2023
    Assignee: Hewlett, Packard Development Company, L.P.
    Inventors: Todd J Harris, Peter Bauer, Litao Hu, Jan Allebach, Zhenhua Hu
  • Patent number: 11783605
    Abstract: Certain aspects of the present disclosure provide techniques for training and using machine learning models to extract key-value sets from a document. An example method generally includes identifying regions of a document including key-value sets corresponding to inputs to a data processing application based on a first machine learning model and an electronic version of the document. One or more keys and one or more values are identified in the document based on a second machine learning model. One or more key-value sets are generated based on matching keys of the one or more keys and values of the one or more values in the region of the document. The one or more key-value sets are provided to a data processing application for processing.
    Type: Grant
    Filed: June 30, 2022
    Date of Patent: October 10, 2023
    Assignee: INTUIT, INC.
    Inventors: Amogha Sekhar, Eric Vanoeveren, Deepankar Mohapatra, Tharathorn Rimchala, Priyadarshini Rajendran
  • Patent number: 11775759
    Abstract: Techniques are described herein for training and evaluating machine learning (ML) models for document processing computing applications using generalized vocabulary tokens. In some embodiments, an ML system determines a set of tokens for non-textual content in a plurality of documents. The ML system generates a fixed-length vocabulary that includes the set of tokens for the non-textual content. The ML system further generates for each respective document in a training dataset of documents, a respective feature vector based at least in part on which tokens in the fixed-length vocabulary occur in the respective document. The ML system trains a ML model based at least in part on the respective feature vector for each respective document in the training dataset.
    Type: Grant
    Filed: August 15, 2022
    Date of Patent: October 3, 2023
    Assignee: Oracle International Corporation
    Inventor: Sudhakar Kalluri
  • Patent number: 11768888
    Abstract: Disclosed are systems and methods for autonomously extracting attributes from domains of a vertical. The disclosed implementations train a deep neural network (“DNN”) based on one or more domains of a vertical using labeled embedding vectors generated for nodes of those one or more domains. The trained DNN may then be used to autonomously label nodes of other domains within the same vertical such that attributes corresponding to those labels can be extracted.
    Type: Grant
    Filed: August 11, 2021
    Date of Patent: September 26, 2023
    Assignee: Pinterest, Inc.
    Inventors: Jinfeng Zhuang, Zhengda Zhao, Vijai Mohan
  • Patent number: 11741380
    Abstract: Embodiments generate machine learning predictions for database migrations. For example, a trained machine learning model that has been trained using training data can be stored, where the training data includes migration information for database migrations and migration methods for the database migrations, and the training data migration information includes a source database type and a target database infrastructure. Migration information can be received for a candidate database migration that includes a source database type and a target database infrastructure. Using the trained machine learning model, migration methods based on the migration information for the candidate database migration can be predicted.
    Type: Grant
    Filed: January 31, 2020
    Date of Patent: August 29, 2023
    Assignee: Oracle International Corporation
    Inventors: Malay K. Khawas, Saumika Sarangi, Sudipto Basu, Ranajoy Bose, Padma Priya Rajan Natarajan, Bogapurapu L. K. Rao, Parul Yamini
  • Patent number: 11664012
    Abstract: In one embodiment, an electronic device includes an input device configured to provide an input stream, a first processing device, and a second processing device. The first processing device is configured to use a keyword-detection model to determine if the input stream comprises a keyword, wake up the second processing device in response to determining that a segment of the input stream comprises the keyword, and modify the keyword-detection model in response to a training input received from the second processing device. The second processing device is configured to use a first neural network to determine whether the segment of the input stream comprises the keyword and provide the training input to the first processing device in response to determining that the segment of the input stream does not comprise the keyword.
    Type: Grant
    Filed: March 25, 2020
    Date of Patent: May 30, 2023
    Assignee: Qualcomm Incorporated
    Inventors: Young Mo Kang, Sungrack Yun, Kyu Woong Hwang, Hye Jin Jang, Byeonggeun Kim
  • Patent number: 11651448
    Abstract: A disclosed computer-implemented method may include receiving a request to generate a dating profile for a user of a community-based dating service of a social networking system based on information associated with the user and maintained by the social networking system. The method may also include accessing information associated with the user and maintained by the social networking system. The method may additionally include selecting, from the information associated with the user and maintained by the social networking system (1) a set of contextual information associated with the user, and (2) a set of media items associated with the user. The method may further include generating the dating profile for the user by arranging the set of contextual information and the set of media items within a dating interface of the social networking system. Various other methods, systems, and computer-readable media are also disclosed.
    Type: Grant
    Filed: November 21, 2019
    Date of Patent: May 16, 2023
    Assignee: Meta Platforms, Inc.
    Inventor: Jordan Springstroh
  • Patent number: 11645826
    Abstract: The present disclosure relates to generating computer searchable text from digital images that depict documents utilizing an orientation neural network and/or text prediction neural network. For example, one or more embodiments detect digital images that depict documents, identify the orientation of the depicted documents, and generate computer searchable text from the depicted documents in the detected digital images. In particular, one or more embodiments train an orientation neural network to identify the orientation of a depicted document in a digital image. Additionally, one or more embodiments train a text prediction neural network to analyze a depicted document in a digital image to generate computer searchable text from the depicted document.
    Type: Grant
    Filed: September 14, 2020
    Date of Patent: May 9, 2023
    Assignee: Dropbox, Inc.
    Inventors: David J. Kriegman, Peter N. Belhumeur, Bradley Neuberg, Leonard Fink
  • Patent number: 11645600
    Abstract: Embodiments relate to a system, program product, and method for managing apparel to facilitate compliance through a cognitive system, i.e., using an artificial intelligence (AI) platform to dynamically analyze the apparel donned by individuals to determine compliance with established apparel compliance practices and provide suggestions for overcoming non-compliance. The determinations of non-compliance are accompanied with respective risk factors. The system, program product, and method disclosed herein facilitate leveraging written requirements processed by natural language processing (NLP) for the donning of apparel that includes proper clothing articles and accessories, as well as associated requirements of clothing articles and accessories that are not appropriate for the respective conditions.
    Type: Grant
    Filed: April 20, 2020
    Date of Patent: May 9, 2023
    Assignee: International Business Machines Corporation
    Inventors: Stan Kevin Daley, Michael Bender
  • Patent number: 11647261
    Abstract: A metadata server that includes circuitry is provided. The circuitry receives a first segment from a plurality of segments of first media content and determines context information associated with the first segment based on a characteristic of at least one frame of a plurality of frames included in the first segment. The circuitry generates first metadata associated with the first segment based on the context information. The first metadata includes timing information corresponding to the determined context information to control a first set of electrical devices. The circuitry further transmits the received first segment and the generated first metadata to a media device associated with the first set of electrical devices.
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: May 9, 2023
    Assignee: SONY CORPORATION
    Inventors: Jaison Joseph, Anil Sasidharan
  • Patent number: 11507770
    Abstract: Described is a system and method that provides a data protection risk assessment for the overall functioning of a backup and recovery system. Accordingly, the system may provide a single overall risk assessment score that provide an operator with an “at-a-glance” overview of the entire system. Moreover, the system may account for changes that occur over time based on leveraging statistical methods to automatically generate assessment scores for various components (e.g. application, server, network, load, etc.). In order to determine a risk assessment score, the system may utilize a predictive model based on historical data. Accordingly, residual values for newly observed data may be determined using the predictive model and the system may identify potentially anomalous or high risk indicators.
    Type: Grant
    Filed: May 1, 2020
    Date of Patent: November 22, 2022
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventors: Qiang Chen, Jing Yu, Pengfei Wu, Naveen Rastogi
  • Patent number: 11481554
    Abstract: Techniques are described herein for training and evaluating machine learning (ML) models for document processing computing applications using generalized vocabulary tokens. In some embodiments, an ML system determines a set of tokens for non-textual content in a plurality of documents. The ML system generates a fixed-length vocabulary that includes the set of tokens for the non-textual content. The ML system further generates for each respective document in a training dataset of documents, a respective feature vector based at least in part on which tokens in the fixed-length vocabulary occur in the respective document. The ML system trains a ML model based at least in part on the respective feature vector for each respective document in the training dataset.
    Type: Grant
    Filed: November 8, 2019
    Date of Patent: October 25, 2022
    Assignee: Oracle International Corporation
    Inventor: Sudhakar Kalluri
  • Patent number: 11423052
    Abstract: User information categorization using consent-based class rules is described. Consent from a user is received regarding at least one functional area where user information is shareable is received. Based on the consent, at least one data class that is permitted to be shared is determined. A user information designation is associated with the at least one data class and class rules are applied to user information associated with the user information designation based on the association between the user information designation and the at least one data class.
    Type: Grant
    Filed: December 14, 2017
    Date of Patent: August 23, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sushain Pandit, Martin Oberhofer, Steven Lockwood
  • Patent number: 8639714
    Abstract: A variety of computer based service that permit users to edit, compose, upload, or otherwise generate content also provide for the integration of sponsored media into presentations along with user-generated content. An exemplary service generates text based on user input, provides tags based on the text to a sponsored media repository, receives a sponsored media data structure in return, and formats sponsored media from the data structure for display to the user.
    Type: Grant
    Filed: August 29, 2007
    Date of Patent: January 28, 2014
    Assignee: Yahoo! Inc.
    Inventor: Roelof van Zwol
  • Patent number: 8639707
    Abstract: Retrieval is completed in a short time for presenting a retrieval result of a document file, which satisfies a retrieval condition, to a user having the authority to perform predetermined processing.
    Type: Grant
    Filed: December 16, 2010
    Date of Patent: January 28, 2014
    Assignee: International Business Machines Corporation
    Inventors: Masaki Komedani, Hirofumi Nishikawa, Fumihiko Terui
  • Patent number: 8626704
    Abstract: A map update data supply device and method includes an update map database of per section versions of an update data file, and a request update data extraction unit for extracting a request update section and an update data file. A safeguard update data extraction unit extracts a safeguard update section to safeguard a road network connection between adjacent sections. An integrated data generation unit integrates all versions of the update data file for each extracted request update section and generates a request update integrated data file. The integrated data generation unit integrates, per safeguard update section, versions of the update data file up to the update safeguard version for each extracted safeguard update section, and generates a safeguard update integrated data file. An integrated data supply unit supplies the generated request update integrated data file and the safeguard update integrated data file to a navigation device.
    Type: Grant
    Filed: January 13, 2011
    Date of Patent: January 7, 2014
    Assignee: Aisin Aw Co., Ltd.
    Inventor: Kimiyoshi Sawai
  • Publication number: 20130311489
    Abstract: A method for automatically extracting names that is implemented by a computer having a computer memory includes the steps of storing a list of first names in the computer memory; receiving a document in the computer memory, where at least some of the characters of the document are represented in a machine readable format; identifying a grouping of words in the document as a name candidate based on capitalization of a leading character of at least two of the words; selecting a subject word of the name candidate; comparing the subject word to the list of first names; and determining that the name candidate includes a personal name if the subject word is present in the list of first names, using the computer.
    Type: Application
    Filed: September 30, 2011
    Publication date: November 21, 2013
    Applicant: GOOGLE INC.
    Inventor: Alex Kerschhofer
  • Publication number: 20130144907
    Abstract: The present discussion relates to patient image data workflows. One example can temporarily serially arrange a set of semantic labeling modules in a patient image data workflow pipeline responsive to receiving an event trigger. The example can also remove the set of modules from the patient image data workflow pipeline responsive to receiving an event completion trigger.
    Type: Application
    Filed: December 6, 2011
    Publication date: June 6, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Steven J. White, Sayan D. Pathak, Bryan Dove, Duncan P. Robertson, Khan M. Siddiqui, Prabhu KrishnaMoorthy
  • Publication number: 20130080475
    Abstract: A system for generating statistics relating to recorded employee behavior, the system including: a first database of tasks performed by employees, the first database being stored on a computer-readable storage medium; a second database of actions taken by the employees while performing the tasks, the second database being stored on a computer-readable storage medium; and a software program, stored on a computer-readable storage medium, configured to extract information from the databases regarding the tasks performed by the employees as well as the actions performed by the employees while carrying out the tasks. The software program then calculates performance statistics relating to success or failure regarding a particular task. The software program furthermore sorts the employees into subgroups based on their status in the company and then calculates performance statistics for the subgroup to compare against individual performance within the subgroup.
    Type: Application
    Filed: September 25, 2011
    Publication date: March 28, 2013
    Inventor: Jonathon Gillen
  • Publication number: 20130073514
    Abstract: This document describes techniques that label text nodes of a seed site for each of a plurality of verticals. Once a seed site is labeled for a given vertical, the techniques extract features from the labeled text nodes of the seed site. The techniques learn vertical knowledge for the seed site based on the human labels and the extracted features, and adapt the learned vertical knowledge to a new web site to automatically and accurately identify attributes and extract attribute values targeted within a given vertical for structured web data extraction.
    Type: Application
    Filed: September 20, 2011
    Publication date: March 21, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Rui Cai, Lei Zhang, Qiang Hao
  • Publication number: 20130024476
    Abstract: A computer implemented method and system provide for automatic selection and extraction of metadata and media content from projects in a craft tool. Automated identification, classification and management of such metadata and content is provided using including techniques such as pattern recognition for audio and visual content. The automatic tracking and centralised storage of metadata and content for compliance purposes can be facilitated, and can enable querying of organised metadata stored in a central database. In an example, metadata and media content are extracted automatically from a project in a craft tool at a client system and are forwarded to a host system for the creation of a cue sheet including timings for media files from timing metadata in a project file to create the timings on the cue sheet.
    Type: Application
    Filed: October 7, 2010
    Publication date: January 24, 2013
    Inventors: Charles Hodgkinson, Kirk Zavieh
  • Publication number: 20130013553
    Abstract: Some embodiments provide a verification system for automated verification of entities. The verification system automatedly verifies entities using a two part verification campaign. One part verifies that the entity is the true owner of the entity account to be verified. This verification step involves (1) the entity receiving a verification code at the entity account and returning the verification code to the verification system, (2) the entity associating an account that it has registered at a service provider to an account that the verification system has registered at the service provider, (3) both. Another part verifies the entity can respond to communications that are sent to methods of contact that have been previously verified as belonging to the entity. The verification system submits a first communication with a code using a verified method of contact. The verification system then monitors for a second communication to be returned with the code.
    Type: Application
    Filed: November 7, 2011
    Publication date: January 10, 2013
    Inventors: Aaron B. Stibel, Peter Delgrosso, Jeffrey M. Stibel, Shailen Misltry, Bryan Mierke, Paul Servino, Charles Chi Thoi Le, David Lo, David Allen Lyon
  • Patent number: 8346620
    Abstract: A system for interactive paper is described. Data fragments are captured at locations in a rendered document. A digital version of the document is optionally located. Markup data applied to the capture creates a rich set of interactions for the user. New models for publishing documents and new document-related services are described.
    Type: Grant
    Filed: September 28, 2010
    Date of Patent: January 1, 2013
    Assignee: Google Inc.
    Inventors: Martin T. King, Dale L. Grover, Clifford A. Kushler, James Q. Stafford-Fraser
  • Publication number: 20120303661
    Abstract: Described herein are methods, systems, apparatuses and products for automatically discovering patterns in a text corpus. An aspect provides extracting at least one context string related to at least one annotator from the at least one text corpus; analyzing the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence; determining at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and grouping the at least one sequence signature into at least one group.
    Type: Application
    Filed: May 27, 2011
    Publication date: November 29, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sebastian Johannes Blohm, Vivian Yaw-Wen Chu, Ching-Tien Ho, Yunyao Li, Huaiyu Zhu
  • Publication number: 20120264480
    Abstract: Generally described, the present disclosure relates to an electronic device having limited memory. More specifically, the disclosure relates to intelligent data sharing for advanced features on mobile platforms. In one illustrative embodiment, a mobile device provides a platform having native services that use shared data. The data can be received from a central server. In turn, the data can be separated on the mobile device into categories. For a number of contacts, these categories can include, but are not limited to, usage, total count, grouping, location and organization. After the data is placed within the categories, the data can be shared between the services for applications. These applications can include, but are not limited to, voice dialing, Bluetooth™ dialing, searching and dialing. The data can be prioritized depending on the categories. Through prioritization, data can be removed when memory is low and new data is received.
    Type: Application
    Filed: April 18, 2011
    Publication date: October 18, 2012
    Inventors: Suriyaprakash Soundrapandian, James Dean Midtun
  • Publication number: 20120239668
    Abstract: Various embodiments of systems and methods for extraction and grouping of feature words are described herein. Feature words are obtained from a first corpus of text bodies comprising a plurality of reviews. A second corpus is created using a combination of the obtained feature words, verbs and adjectives from the first corpus. The second corpus comprises filtered reviews and each of the filtered reviews pertains to a review. Topics are preliminarily assigned for words in the filtered reviews of the second corpus. For each of the feature words in the second corpus, a topic count is determined for every preliminarily assigned topic. After determining the topic count, one or more of the topics are finally assigned to the feature words based on a topic count value. At least one topic is presented as a group of the feature words for which the at least one topic is assigned based on the topic count value.
    Type: Application
    Filed: March 17, 2011
    Publication date: September 20, 2012
    Inventors: CHIRANJIB BHATTACHARYYA, Himabindu Lakkaraju, Kaushik Nath, Sunil Arvindam
  • Patent number: 8261200
    Abstract: An interactive system provides for increasing retrieval performance of images depicting text by allowing users to provide relevance feedback on words contained in the images. The system includes a user interface through which the user queries the system with query terms for images contained in the system. Word image suggestions are displayed to the user through the user interface, where each word image suggestion contains the same or slightly variant text as recognized from the word image by the system than the particular query terms. Word image suggestions can be included in the system by the user to increase system recall of images for the one or more query terms and can be excluded from the system by the user to increase precision of image retrieval results for particular query terms.
    Type: Grant
    Filed: April 26, 2007
    Date of Patent: September 4, 2012
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Laurent Denoue, John E. Adcock, David M. Hilbert, Daniel Billsus
  • Publication number: 20120203764
    Abstract: A method of identifying one or more particular images from an image collection, includes indexing the image collection to provide image descriptors for each image in the image collection such that each image is described by one or more of the image descriptors; receiving a query from a user specifying at least one keyword for an image search; and using the keyword(s) to search a second collection of tagged images to identify co-occurrence keywords. The method further includes using the identified co-occurrence keywords to provide an expanded list of keywords; using the expanded list of keywords to search the image descriptors to identify a set of candidate images satisfying the keywords; grouping the set of candidate images according to at least one of the image descriptors, and selecting one or more representative images from each grouping; and displaying the representative images to the user.
    Type: Application
    Filed: February 4, 2011
    Publication date: August 9, 2012
    Inventors: Mark D. Wood, Alexander C. Loui
  • Publication number: 20120150792
    Abstract: The present disclosure involves systems, software, and computer implemented methods for providing a data extraction framework for extracting data and metadata from an application to provide additional functionality for the extracted data and metadata. One process includes operations for identifying a first application for data extraction and determining a set of data suitable for extraction from the first application using a software development kit associated with the first application. The set of data is stored in a repository without storing visualization components of the first application in the repository. The set of data is sent to a second application for further processing of the set of data. The second application is configured to bind different visualization components to the set of data for display of data elements in the set of data to a user.
    Type: Application
    Filed: December 9, 2010
    Publication date: June 14, 2012
    Applicant: SAP PORTALS ISRAEL LTD.
    Inventors: Ohad Yassin, Pavel Kravets, Nisim Hafzadi, Ram Alon
  • Publication number: 20120136812
    Abstract: One embodiment of the present invention provides a system for optimizing and customizing document-similarity calculation. During operation, the system presents a collection of similar documents to a user, collects feedback on the similarity of the documents from the user, generates generic rules for calculating document similarity, and filters documents with customized similarity calculation based on the feedback provided by the user.
    Type: Application
    Filed: November 29, 2010
    Publication date: May 31, 2012
    Applicant: PALO ALTO RESEARCH CENTER INCORPORATED
    Inventor: Oliver Brdiczka
  • Publication number: 20120089643
    Abstract: A computer implemented method and system provide for automatic selection and extraction of metadata and media content from projects in a craft tool. Automated identification, classification and management of such metadata and content is provided using including techniques such as pattern recognition for audio and visual content. The automatic tracking and centralised storage of metadata and content for compliance purposes can be facilitated, and can enable querying of organised metadata stored in a central database. In an example, metadata and media content are extracted automatically from a project in a craft tool at a client system and are forwarded to a host system for the creation of a cue sheet including timings for media files from timing metadata in a project file to create the timings on the cue sheet.
    Type: Application
    Filed: October 7, 2010
    Publication date: April 12, 2012
    Inventors: Charles Hodgkinson, Kirk Zavieh
  • Publication number: 20120089642
    Abstract: The system and methods described herein provide results previewing for an interactive text mining system in order to feedback partial query results to users before all results that are responsive to a query have been found. These partial results allow the user to see the progress of their text mining query much sooner.
    Type: Application
    Filed: October 6, 2010
    Publication date: April 12, 2012
    Inventors: David R. Milward, Roger W. Hale, Malcolm R. Parsons, Sylvia F. Knight, Christopher I. Sullivan, Jason Trenouth, James R. Thomas
  • Publication number: 20120047167
    Abstract: A portable terminal includes a word extracting unit that extracts a word contained in data of a Web page being viewed; a Web search request unit that transmits a search request to a search site with the word extracted by the word extracting unit as a search word and that receives a list of Web pages that contain the search word from the search site as a search result; and a display unit that displays the search result received by the Web search request unit.
    Type: Application
    Filed: November 2, 2011
    Publication date: February 23, 2012
    Applicant: FUJITSU TOSHIBA MOBILE COMMUNICATIONS LIMITED
    Inventors: Masaki SAKAI, Natsuko OUCHI
  • Publication number: 20120047172
    Abstract: A technique includes providing a collection of documents in multiple languages, identifying, from the collection of documents, a group of candidate documents, where each candidate document in the group shares multiple corresponding rare features, evaluating pairs of candidate documents in the group using multiple common features present in the collection of documents, and determining, based on evaluating the pairs of candidate documents, whether each pair of candidate documents corresponds to a translated pair of documents.
    Type: Application
    Filed: August 22, 2011
    Publication date: February 23, 2012
    Applicant: Google Inc.
    Inventors: Jay M. Ponte, Jakob Uszkoreit, Ashok C. Popat, Moshe Dubiner
  • Publication number: 20120047176
    Abstract: A system and methodology for real-time content aggregation and syndication is described. In one embodiment, for example, a method is described for assisting a user with extracting items relevant to search queries from documents including items of various types, the method comprises steps of: receiving a search query specifying a search phrase and a particular item type; identifying documents matching the search phrase; for each matching document, determining whether the document includes an item having the particular item type; and extracting items having the particular item type from the matching documents for display to the user. The solution enables a user to aggregate and syndicate content without a professional content manager or complicated content management software tools.
    Type: Application
    Filed: November 2, 2011
    Publication date: February 23, 2012
    Applicant: SYBASE, INC.
    Inventor: Michael Timmons
  • Publication number: 20120036144
    Abstract: According to one embodiment, an information recommendation device includes following units. The input unit is configured to input a first document and a second document which has been browsed before the first document. The subject-keyword extraction unit is configured to extract first and second subject keywords from the first and second documents, respectively. The interest-keyword extraction unit is configured to extract first interest keywords from the first and second subject keywords, and to extract second interest keywords based on information specifying the first and second documents, the first interest keywords, and the first and second subject keywords. The second interest keywords are estimated to be keywords in which the user is next interested. The acquiring unit is configured to acquire, based on the second interest keywords, recommendation information on third documents which are candidates to be browsed after the first document. The presentation unit presents the recommendation information.
    Type: Application
    Filed: August 25, 2011
    Publication date: February 9, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masayuki Okamoto, Nayuko Watanabe, Masaaki Kikuchi, Takayuki Iida, Mika Fukui
  • Publication number: 20110307497
    Abstract: “Synthewiser”™ is a search method and system that synthesizes a single non-template, text-based document that is organized by topic and integrates and consolidates information from multiple sources. This is accomplished by: having a user provide a search phrase; creating seed phrases; identifying seed locations in multiple sources; creating expanded text segments; grouping expanded text segments; consolidating content; and synthesizing a single document. Synthewiser has advantages over today's dominant search engine. Its results are organized by topic and are integrated across multiple sources.
    Type: Application
    Filed: June 14, 2010
    Publication date: December 15, 2011
    Inventor: Robert A. Connor
  • Publication number: 20110302179
    Abstract: Described is using context information obtained from entity mentions in likely relevant documents to extract entity mentions from documents that are ambiguous with respect to their relevance to a domain. A list of entities is input into an entity extraction mechanism, which processes a large collection of documents to determine data (counts) corresponding to frequency of entity mentions. Infrequently mentioned entities are specific entities, while frequently mentioned entities are non-specific (generic or ambiguous) entities. The context surrounding mentions of the specific entities is processed to obtain interesting context terms (words, phrases or both) for the domain. The interesting context terms are then compared against the contexts of non-specific entity mentions to determine whether each non-specific entity mention is relevant to the domain. A result set containing only relevant documents or relevant mentions collection is output.
    Type: Application
    Filed: June 7, 2010
    Publication date: December 8, 2011
    Applicant: Microsoft Corporation
    Inventor: Sanjay Agrawal
  • Publication number: 20110295893
    Abstract: A method of searching an expected image in an electronic apparatus comprises the steps of inputting a hand drawing of the expected image into the electronic apparatus; determining whether or not a text description for partially characterizing the expected image is inputted; identifying and searching the expected image in the electronic apparatus according to the hand drawing if the text description is not inputted, or selecting a text label from the text description and interpreting the selected text label by the electronic apparatus if the text description is inputted; and searching a database in the electronic apparatus according to the text label, and fetching the expected image from the database if the value of the image item matches the text label. The hand drawing and/or text label inputted from a mobile phone screen are provided for arranging and searching pictures or images in the database efficiently.
    Type: Application
    Filed: April 21, 2011
    Publication date: December 1, 2011
    Applicants: INVENTEC APPLIANCES (SHANGHAI) CO. LTD., INVENTEC APPLIANCES (NANCHANG) CO. LTD., INVENTEC APPLIANCES CORP.
    Inventor: PENG-FEI WU
  • Publication number: 20110295775
    Abstract: Techniques for identifying near-duplicates of a media object and associating metadata of the near-duplicates with the media object are described herein. One or more devices implementing the techniques are configured to identify the near duplicates based at least on similarity attributes included in the media object. Metadata is then extracted from the near-duplicates and is associated with the media object as descriptors of the media object to enable discovery of the media object based on the descriptors.
    Type: Application
    Filed: May 28, 2010
    Publication date: December 1, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Xin-Jing Wang, Lei Zhang, Ming Liu, Yi Li, Wei-Ying Ma
  • Publication number: 20110270819
    Abstract: Query classification techniques attempt to classify user search queries in order to better understand user search intent. Understanding a user's search intent allows search engines to provide relevant content tailored to the user's interest. Unfortunately, current classification techniques do not take into account contextual information. Accordingly, as provided herein, a target query may be classified based upon contextual information. In particular, features may be extracted from contextual information and/or other sources. For example, features may be extracted from the target query, related queries, and/or invoked search results of the related queries. In this way, the target query may be classified based upon other queries performed by the user and/or search results of the queries the user found interesting. In addition, a CRF model may be utilized in classifying the target query by providing generalized parameters learned from labeled query sessions.
    Type: Application
    Filed: April 30, 2010
    Publication date: November 3, 2011
    Applicant: Microsoft Corporation
    Inventors: Dou Shen, Daxin Jiang, Jian-Tao Sun
  • Publication number: 20110264675
    Abstract: A searching apparatus includes a memory unit which stores transposed indexes representing appearing positions of all n-grams in plural pieces of document data subjected to searching and appearing frequencies, an n-gram extracting unit that extracts all n-grams extractable from a searching character string, a smallest-frequency deriving unit which refers to the appearing frequency of the n-gram represented by the transposed index, and derives an n-gram with the smallest appearing frequency among all of the extracted n-grams, a searching n-gram selecting unit that selects, from all extracted n-grams, a plurality of searching n-grams which form the searching character string and include the n-gram with the smallest appearing frequency, and a document specifying unit that specifies, based on the plurality of selected searching n-grams and the appearing position of the searching n-gram represented by the transposed index, document data including the searching character string among the plural pieces of document da
    Type: Application
    Filed: April 26, 2011
    Publication date: October 27, 2011
    Applicant: CASIO COMPUTER CO., LTD.
    Inventor: Katsuhiko SATOH
  • Publication number: 20110246027
    Abstract: An image processing system inputs a captured image of a scene viewed from a vehicle in a predetermined road section and an image-capturing position at which the image is captured. The system uses a given position in the predetermined road section as a specific position, and sets a target vehicle movement amount at the specific position, for passing through the predetermined road section. The system generates reference image data from the captured image obtained at the specific position. The system generates reference data that is used when scenic image recognition is performed, by associating the reference image data with the specific position and the target vehicle movement amount at the specific position, and generates a reference data database that is a database of the reference data.
    Type: Application
    Filed: January 25, 2011
    Publication date: October 6, 2011
    Applicant: AISIN AW CO., LTD.
    Inventor: Takayuki MIYAJIMA