Using Extracted Text (epo) Patents (Class 707/E17.022)

Machine learning systems with memory based parameter adaptation for learning fast and slower

Patent number: 12242947

Abstract: There is described herein a computer-implemented method of processing an input data item. The method comprises processing the input data item using a parametric model to generate output data, wherein the parametric model comprises a first sub-model and a second sub-model. The processing comprises processing, by the first sub-model, the input data to generate a query data item, retrieving, from a memory storing data point-value pairs, at least one data point-value pair based upon the query data item and modifying weights of the second sub-model based upon the retrieved at least one data point-value pair. The output data is then generated based upon the modified second sub-model.

Type: Grant

Filed: October 29, 2018

Date of Patent: March 4, 2025

Assignee: DeepMind Technologies Limited

Inventors: Pablo Sprechmann, Siddhant Jayakumar, Jack William Rae, Alexander Pritzel, Adrià Puigdomènech Badia, Oriol Vinyals, Razvan Pascanu, Charles Blundell
System and method for algorithmic editing of video content

Patent number: 12231747

Abstract: A computer implemented method for algorithmically editing digital video content is disclosed. A video file containing source video is processed to extract metadata. Label taxonomies are applied to extracted metadata. The labelled metadata is processed to identify higher-level labels. Identified higher-level labels are stored as additional metadata associated with the video file. A clip generating algorithm applies the stored metadata for selectively editing the source video to generate a plurality of different candidate video clips. Responsive to determining a clip presentation trigger on a viewer device, a clip selection algorithm is implemented that applies engagement data and metadata for the candidate video clips to select one of the stored candidate video clips. The engagement data is representative of one or more engagement metrics recorded for at least one of the stored candidate video clips. The selected video clip is presented to one or more viewers via corresponding viewer devices.

Type: Grant

Filed: June 23, 2023

Date of Patent: February 18, 2025

Assignee: Playable Pty Ltd

Inventors: Robert Andrew Hitching, Ashley John Wing, Phillip John Wing
Online software platform (OSP) querying client data about relationship instances for application of permission digital rules in addition to resource digital rules for the relationship instances

Patent number: 12197616

Abstract: Systems and methods electronically determine whether a dataset is permitted or excluded based on permission digital rules. Primary entities often are required, or choose to, exclude proposed relationship instances with secondary entities. The systems and methods described herein allow permission digital rules to be defined and applied to datasets obtained from secondary entities relating to a proposed relationship instance with the primary entity, and permit or exclude a resource from being produced for the dataset based on the permission digital rules.

Type: Grant

Filed: June 15, 2023

Date of Patent: January 14, 2025

Assignee: Avalara, Inc.

Inventors: Mark Janzen, Gregory T. Kavounas, Charles M. Morrisette, Rohit Ghule
Systems and methods for capturing user consumption of information

Patent number: 12198293

Abstract: A client device assists in identifying user consumption of information. The client device comprises a hardware processor; a screen; memory storing computer instructions that when executed perform capturing a series of screen image snapshots being presented on the screen; reducing resolution of each screen image snapshot in the series of screen image snapshots; capturing metadata associated with each screen image snapshot in the series of screen image snapshots, the metadata at least including a timestamp; identifying a duplicate in the series of screen image snapshots; discarding the duplicate from the series of screen image snapshots; and uploading the series of captured screen image snapshots to a processing server for processing.

Type: Grant

Filed: June 27, 2023

Date of Patent: January 14, 2025

Assignee: MetaConsumer, Inc.

Inventors: Nathaniel D'Amico, Chandrasekhar Vijay Ramaseshan
Multimodal entity identification

Patent number: 12164603

Abstract: A machine learning based system can identify an entity as the likely subject of a multimodal message (e.g., a social media post having a short text phrase overlaid on an image) by creating embeddings for an image of the multimodal message and one or more string embeddings from text of the multimodal message. The embeddings can be weighted to maximize information gain, then recombined and compared against a result embedding database to identify an entity as the subject of the multimodal message.

Type: Grant

Filed: September 15, 2022

Date of Patent: December 10, 2024

Assignee: Snap Inc.

Inventors: Vitor Rocha de Carvalho, Leonardo Ribas Machado das Neves, Seungwhan Moon
Event archiving, systems and methods

Patent number: 12164467

Abstract: Method of retrieving event information is presented. Memento objects can be recognized by an archive engine. Based on the recognition, the archive engine obtains information related to the memento object, possibly one or more recognizable features, and uses the information to search for events associated with a timeline that have corresponding tags. The archive engine can then return the event information as a result set to a user.

Type: Grant

Filed: June 17, 2021

Date of Patent: December 10, 2024

Assignee: NANT HOLDINGS IP, LLC

Inventor: Patrick Soon-Shiong
Helmet wearing determination method, helmet wearing determination system, helmet wearing determination apparatus, and program

Patent number: 12165508

Abstract: The present invention is directed to a helmet wearing determination system including a imaging means that is installed in a predetermined position and images a two-wheel vehicle that travels on a road; and a helmet wearing determination means that processes an image imaged by the imaging means, estimates a rider head region corresponding to a head of a person who rides on the two-wheel vehicle that travels on the road, compares image characteristics of the rider head region with image characteristics according to the head at a time when a helmet is worn or/and at a time when a helmet is not worn, and determines whether or not the rider wears the helmet.

Type: Grant

Filed: October 5, 2023

Date of Patent: December 10, 2024

Assignee: NEC CORPORATION

Inventor: Katsuhiko Takahashi
Integrated digital-analog archiving systems and methods for document preservation

Patent number: 12072837

Abstract: An integrated digital-analog archiving system can automatically initiate a migration process to move electronic documents to a media library. For each electronic document, the system may retrieve the electronic document from a digital data storage medium, extract metadata from the electronic document, determine size, orientation, and format of the electronic document, generate indicators for indicating the start and end of the electronic document to be stored on an analog data storage medium, generate an analog document identifier for identifying the electronic document on the analog data storage medium, generate a scaled image of the electronic document based on the size, orientation, and format of the electronic document, generate a text string based at least in part on the extracted metadata, and render the indicators, the analog document identifier, the scaled image of the electronic document, and the text string on the analog data storage medium.

Type: Grant

Filed: February 7, 2023

Date of Patent: August 27, 2024

Assignee: OPEN TEXT SA ULC

Inventor: Matthias Specht
Automated image analysis and indexing

Patent number: 11989922

Abstract: A system includes a computing platform having processing hardware, and a memory storing software code. The processing hardware is configured to execute the software code to receive an image having a plurality of image regions, determine a boundary of each of the image regions to identify a plurality of bounded image regions, and identify, within each of the bounded image regions, one or more image sub-regions to identify a plurality of image sub-regions. The processing hardware is further configured to execute the software code to identify, within each of the bounded image regions, one or more first features, respectively, identify, within each of the image sub-regions, one or more second features, respectively, and provided an annotated image by annotating each of the bounded image regions using the respective first features and annotating each of the image sub-regions using the respective second features.

Type: Grant

Filed: February 18, 2022

Date of Patent: May 21, 2024

Assignee: Disney Enterprises, Inc.

Inventors: Miquel Angel Farre Guiu, Monica Alfaro Vendrell, Pablo Pernias, Francesc Josep Guitart Bravo, Marc Junyent Martin, Albert Aparicio Isarn, Anthony M. Accardo, Steven S. Shapiro
System and method for automatic detection of periods of heightened audience interest in broadcast electronic media

Patent number: 11910060

Abstract: This relates to using a computer simulation to test another computer program in real time or simulated real time that is sped up. The disclosed method and system synchronizes information input into the simulation so that the program under test operates in an independent way. The method and system operates a protocol to connect one running computer process, a trading computer program, with another running process, a computer program that executes a market simulation in order to optimize the quality and speed of the simulation and testing of the external computer program.

Type: Grant

Filed: May 14, 2021

Date of Patent: February 20, 2024

Assignee: Caspian Hill Group, LLC

Inventors: Amy Bolivar, Steven Lubin, Audrey Faust
Document analysis system

Patent number: 11847142

Abstract: There is provided a system configured to appropriately determine a topic count in accordance with LDA to estimate latent meanings of a document. For a plurality of documents d, a perplexity PPL of each document d is evaluated in accordance with a document generation probability in which the document d is generated when topic counts N for defining a topic model based on the LDA as a document generation model are hypothetically specified as different values and word groups are specified by different random numbers. The topic model is defined by a reference topic count No determined by combining a first topic count N1 (the number of topics indicating a highest cumulative frequency at which the perplexity PPL first indicates a minimum value) and a second topic count N2 (the number of topics indicating a highest cumulative frequency at which the perplexity PPL indicates a smallest value).

Type: Grant

Filed: February 22, 2021

Date of Patent: December 19, 2023

Assignee: HONDA MOTOR CO., LTD.

Inventor: Takamasa Suzuki
Techniques for labeling, reviewing and correcting label predictions for PandIDS

Patent number: 11842035

Abstract: In example embodiments, techniques are provided for efficiently labeling, reviewing and correcting predictions for P&IDs in image-only formats. To label text boxes in the P&ID, the labeling application executes an OCR algorithm to predict a bounding box around, and machine-readable text within, each text box, and displays these predictions in its user interface. The labeling application provides functionality to receive a user confirmation or correction for each predicted bounding box and predicted machine-readable text. To label symbols in the P&ID, the labeling application receives user input to draw bounding boxes around symbols and assign symbols to classes of equipment. Where there are multiple occurrences of specific symbols, the labeling application provides functionality to duplicate and automatically detect and assign bounding boxes and classes.

Type: Grant

Filed: December 21, 2020

Date of Patent: December 12, 2023

Assignee: Bentley Systems, Incorporated

Inventors: Karl-Alexandre Jahjah, Marc-André Gardner
Helmet wearing determination method, helmet wearing determination system, helmet wearing determination apparatus, and program

Patent number: 11816983

Abstract: The present invention is directed to a helmet wearing determination system including a imaging means that is installed in a predetermined position and images a two-wheel vehicle that travels on a road; and a helmet wearing determination means that processes an image imaged by the imaging means, estimates a rider head region corresponding to a head of a person who rides on the two-wheel vehicle that travels on the road, compares image characteristics of the rider head region with image characteristics according to the head at a time when a helmet is worn or/and at a time when a helmet is not worn, and determines whether or not the rider wears the helmet.

Type: Grant

Filed: April 9, 2021

Date of Patent: November 14, 2023

Assignee: NEC CORPORATION

Inventor: Katsuhiko Takahashi
Image recognition method and terminal

Patent number: 11804053

Abstract: An image recognition method and a terminal, where the method includes obtaining, by the terminal, an image file comprising a target object, recognizing, by the terminal, the target object based on an image recognition model in the terminal to obtain object category information of the target object, and storing, by the terminal, the object category information as first label information of the target object. Hence, image recognition efficiency of the terminal can be improved, and privacy of a terminal user can be effectively protected.

Type: Grant

Filed: April 26, 2021

Date of Patent: October 31, 2023

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Changzhu Li, Xiyong Wang
Determining minimum scanning resolution

Patent number: 11800036

Abstract: Examples disclosed herein relate to identifying a plurality of content areas of a document to be scanned, classifying each of the plurality of content areas into a content type, determining a minimum scanning resolution to maintain readability for each of the plurality of content areas according to the classified content type, and performing a scan of the document to a digital file, wherein each of the plurality of content areas is scanned at least at the determined minimum scanning resolution to maintain readability of the respective content area.

Type: Grant

Filed: January 23, 2020

Date of Patent: October 24, 2023

Assignee: Hewlett, Packard Development Company, L.P.

Inventors: Todd J Harris, Peter Bauer, Litao Hu, Jan Allebach, Zhenhua Hu
Generalizable key-value set extraction from documents using machine learning models

Patent number: 11783605

Abstract: Certain aspects of the present disclosure provide techniques for training and using machine learning models to extract key-value sets from a document. An example method generally includes identifying regions of a document including key-value sets corresponding to inputs to a data processing application based on a first machine learning model and an electronic version of the document. One or more keys and one or more values are identified in the document based on a second machine learning model. One or more key-value sets are generated based on matching keys of the one or more keys and values of the one or more values in the region of the document. The one or more key-value sets are provided to a data processing application for processing.

Type: Grant

Filed: June 30, 2022

Date of Patent: October 10, 2023

Assignee: INTUIT, INC.

Inventors: Amogha Sekhar, Eric Vanoeveren, Deepankar Mohapatra, Tharathorn Rimchala, Priyadarshini Rajendran
Systems and methods for training and evaluating machine learning models using generalized vocabulary tokens for document processing

Patent number: 11775759

Abstract: Techniques are described herein for training and evaluating machine learning (ML) models for document processing computing applications using generalized vocabulary tokens. In some embodiments, an ML system determines a set of tokens for non-textual content in a plurality of documents. The ML system generates a fixed-length vocabulary that includes the set of tokens for the non-textual content. The ML system further generates for each respective document in a training dataset of documents, a respective feature vector based at least in part on which tokens in the fixed-length vocabulary occur in the respective document. The ML system trains a ML model based at least in part on the respective feature vector for each respective document in the training dataset.

Type: Grant

Filed: August 15, 2022

Date of Patent: October 3, 2023

Assignee: Oracle International Corporation

Inventor: Sudhakar Kalluri
Attribute extraction

Patent number: 11768888

Abstract: Disclosed are systems and methods for autonomously extracting attributes from domains of a vertical. The disclosed implementations train a deep neural network (“DNN”) based on one or more domains of a vertical using labeled embedding vectors generated for nodes of those one or more domains. The trained DNN may then be used to autonomously label nodes of other domains within the same vertical such that attributes corresponding to those labels can be extracted.

Type: Grant

Filed: August 11, 2021

Date of Patent: September 26, 2023

Assignee: Pinterest, Inc.

Inventors: Jinfeng Zhuang, Zhengda Zhao, Vijai Mohan
Machine learning predictions for database migrations

Patent number: 11741380

Abstract: Embodiments generate machine learning predictions for database migrations. For example, a trained machine learning model that has been trained using training data can be stored, where the training data includes migration information for database migrations and migration methods for the database migrations, and the training data migration information includes a source database type and a target database infrastructure. Migration information can be received for a candidate database migration that includes a source database type and a target database infrastructure. Using the trained machine learning model, migration methods based on the migration information for the candidate database migration can be predicted.

Type: Grant

Filed: January 31, 2020

Date of Patent: August 29, 2023

Assignee: Oracle International Corporation

Inventors: Malay K. Khawas, Saumika Sarangi, Sudipto Basu, Ranajoy Bose, Padma Priya Rajan Natarajan, Bogapurapu L. K. Rao, Parul Yamini
On-device self training in a two-stage wakeup system comprising a system on chip which operates in a reduced-activity mode

Patent number: 11664012

Abstract: In one embodiment, an electronic device includes an input device configured to provide an input stream, a first processing device, and a second processing device. The first processing device is configured to use a keyword-detection model to determine if the input stream comprises a keyword, wake up the second processing device in response to determining that a segment of the input stream comprises the keyword, and modify the keyword-detection model in response to a training input received from the second processing device. The second processing device is configured to use a first neural network to determine whether the segment of the input stream comprises the keyword and provide the training input to the first processing device in response to determining that the segment of the input stream does not comprise the keyword.

Type: Grant

Filed: March 25, 2020

Date of Patent: May 30, 2023

Assignee: Qualcomm Incorporated

Inventors: Young Mo Kang, Sungrack Yun, Kyu Woong Hwang, Hye Jin Jang, Byeonggeun Kim
Systems and methods for generating a dating profile for a community-based dating service of a social networking system

Patent number: 11651448

Abstract: A disclosed computer-implemented method may include receiving a request to generate a dating profile for a user of a community-based dating service of a social networking system based on information associated with the user and maintained by the social networking system. The method may also include accessing information associated with the user and maintained by the social networking system. The method may additionally include selecting, from the information associated with the user and maintained by the social networking system (1) a set of contextual information associated with the user, and (2) a set of media items associated with the user. The method may further include generating the dating profile for the user by arranging the set of contextual information and the set of media items within a dating interface of the social networking system. Various other methods, systems, and computer-readable media are also disclosed.

Type: Grant

Filed: November 21, 2019

Date of Patent: May 16, 2023

Assignee: Meta Platforms, Inc.

Inventor: Jordan Springstroh
Generating searchable text for documents portrayed in a repository of digital images utilizing orientation and text prediction neural networks

Patent number: 11645826

Abstract: The present disclosure relates to generating computer searchable text from digital images that depict documents utilizing an orientation neural network and/or text prediction neural network. For example, one or more embodiments detect digital images that depict documents, identify the orientation of the depicted documents, and generate computer searchable text from the depicted documents in the detected digital images. In particular, one or more embodiments train an orientation neural network to identify the orientation of a depicted document in a digital image. Additionally, one or more embodiments train a text prediction neural network to analyze a depicted document in a digital image to generate computer searchable text from the depicted document.

Type: Grant

Filed: September 14, 2020

Date of Patent: May 9, 2023

Assignee: Dropbox, Inc.

Inventors: David J. Kriegman, Peter N. Belhumeur, Bradley Neuberg, Leonard Fink
Electrical devices control based on media-content context

Patent number: 11647261

Abstract: A metadata server that includes circuitry is provided. The circuitry receives a first segment from a plurality of segments of first media content and determines context information associated with the first segment based on a characteristic of at least one frame of a plurality of frames included in the first segment. The circuitry generates first metadata associated with the first segment based on the context information. The first metadata includes timing information corresponding to the determined context information to control a first set of electrical devices. The circuitry further transmits the received first segment and the generated first metadata to a media device associated with the first set of electrical devices.

Type: Grant

Filed: November 22, 2019

Date of Patent: May 9, 2023

Assignee: SONY CORPORATION

Inventors: Jaison Joseph, Anil Sasidharan
Managing apparel to facilitate compliance

Patent number: 11645600

Abstract: Embodiments relate to a system, program product, and method for managing apparel to facilitate compliance through a cognitive system, i.e., using an artificial intelligence (AI) platform to dynamically analyze the apparel donned by individuals to determine compliance with established apparel compliance practices and provide suggestions for overcoming non-compliance. The determinations of non-compliance are accompanied with respective risk factors. The system, program product, and method disclosed herein facilitate leveraging written requirements processed by natural language processing (NLP) for the donning of apparel that includes proper clothing articles and accessories, as well as associated requirements of clothing articles and accessories that are not appropriate for the respective conditions.

Type: Grant

Filed: April 20, 2020

Date of Patent: May 9, 2023

Assignee: International Business Machines Corporation

Inventors: Stan Kevin Daley, Michael Bender
Precomputed similarity index of files in data protection systems with neural network

Patent number: 11507770

Abstract: Described is a system and method that provides a data protection risk assessment for the overall functioning of a backup and recovery system. Accordingly, the system may provide a single overall risk assessment score that provide an operator with an “at-a-glance” overview of the entire system. Moreover, the system may account for changes that occur over time based on leveraging statistical methods to automatically generate assessment scores for various components (e.g. application, server, network, load, etc.). In order to determine a risk assessment score, the system may utilize a predictive model based on historical data. Accordingly, residual values for newly observed data may be determined using the predictive model and the system may identify potentially anomalous or high risk indicators.

Type: Grant

Filed: May 1, 2020

Date of Patent: November 22, 2022

Assignee: EMC IP HOLDING COMPANY LLC

Inventors: Qiang Chen, Jing Yu, Pengfei Wu, Naveen Rastogi
Systems and methods for training and evaluating machine learning models using generalized vocabulary tokens for document processing

Patent number: 11481554

Abstract: Techniques are described herein for training and evaluating machine learning (ML) models for document processing computing applications using generalized vocabulary tokens. In some embodiments, an ML system determines a set of tokens for non-textual content in a plurality of documents. The ML system generates a fixed-length vocabulary that includes the set of tokens for the non-textual content. The ML system further generates for each respective document in a training dataset of documents, a respective feature vector based at least in part on which tokens in the fixed-length vocabulary occur in the respective document. The ML system trains a ML model based at least in part on the respective feature vector for each respective document in the training dataset.

Type: Grant

Filed: November 8, 2019

Date of Patent: October 25, 2022

Assignee: Oracle International Corporation

Inventor: Sudhakar Kalluri
User information association with consent-based class rules

Patent number: 11423052

Abstract: User information categorization using consent-based class rules is described. Consent from a user is received regarding at least one functional area where user information is shareable is received. Based on the consent, at least one data class that is permitted to be shared is determined. A user information designation is associated with the at least one data class and class rules are applied to user information associated with the user information designation based on the association between the user information designation and the at least one data class.

Type: Grant

Filed: December 14, 2017

Date of Patent: August 23, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sushain Pandit, Martin Oberhofer, Steven Lockwood
Retrieval device, retrieval system, retrieval method, and computer program for retrieving a document file stored in a storage device

Patent number: 8639707

Abstract: Retrieval is completed in a short time for presenting a retrieval result of a document file, which satisfies a retrieval condition, to a user having the authority to perform predetermined processing.

Type: Grant

Filed: December 16, 2010

Date of Patent: January 28, 2014

Assignee: International Business Machines Corporation

Inventors: Masaki Komedani, Hirofumi Nishikawa, Fumihiko Terui
Integrating sponsored media with user-generated content

Patent number: 8639714

Abstract: A variety of computer based service that permit users to edit, compose, upload, or otherwise generate content also provide for the integration of sponsored media into presentations along with user-generated content. An exemplary service generates text based on user input, provides tags based on the text to a sponsored media repository, receives a sponsored media data structure in return, and formats sponsored media from the data structure for display to the user.

Type: Grant

Filed: August 29, 2007

Date of Patent: January 28, 2014

Assignee: Yahoo! Inc.

Inventor: Roelof van Zwol
Map update data supply device and method

Patent number: 8626704

Abstract: A map update data supply device and method includes an update map database of per section versions of an update data file, and a request update data extraction unit for extracting a request update section and an update data file. A safeguard update data extraction unit extracts a safeguard update section to safeguard a road network connection between adjacent sections. An integrated data generation unit integrates all versions of the update data file for each extracted request update section and generates a request update integrated data file. The integrated data generation unit integrates, per safeguard update section, versions of the update data file up to the update safeguard version for each extracted safeguard update section, and generates a safeguard update integrated data file. An integrated data supply unit supplies the generated request update integrated data file and the safeguard update integrated data file to a navigation device.

Type: Grant

Filed: January 13, 2011

Date of Patent: January 7, 2014

Assignee: Aisin Aw Co., Ltd.

Inventor: Kimiyoshi Sawai
Systems and Methods for Extracting Names From Documents

Publication number: 20130311489

Abstract: A method for automatically extracting names that is implemented by a computer having a computer memory includes the steps of storing a list of first names in the computer memory; receiving a document in the computer memory, where at least some of the characters of the document are represented in a machine readable format; identifying a grouping of words in the document as a name candidate based on capitalization of a leading character of at least two of the words; selecting a subject word of the name candidate; comparing the subject word to the list of first names; and determining that the name candidate includes a personal name if the subject word is present in the list of first names, using the computer.

Type: Application

Filed: September 30, 2011

Publication date: November 21, 2013

Applicant: GOOGLE INC.

Inventor: Alex Kerschhofer
METADATA EXTRACTION PIPELINE

Publication number: 20130144907

Abstract: The present discussion relates to patient image data workflows. One example can temporarily serially arrange a set of semantic labeling modules in a patient image data workflow pipeline responsive to receiving an event trigger. The example can also remove the set of modules from the patient image data workflow pipeline responsive to receiving an event completion trigger.

Type: Application

Filed: December 6, 2011

Publication date: June 6, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Steven J. White, Sayan D. Pathak, Bryan Dove, Duncan P. Robertson, Khan M. Siddiqui, Prabhu KrishnaMoorthy
Employee Profiler and Database

Publication number: 20130080475

Abstract: A system for generating statistics relating to recorded employee behavior, the system including: a first database of tasks performed by employees, the first database being stored on a computer-readable storage medium; a second database of actions taken by the employees while performing the tasks, the second database being stored on a computer-readable storage medium; and a software program, stored on a computer-readable storage medium, configured to extract information from the databases regarding the tasks performed by the employees as well as the actions performed by the employees while carrying out the tasks. The software program then calculates performance statistics relating to success or failure regarding a particular task. The software program furthermore sorts the employees into subgroups based on their status in the company and then calculates performance statistics for the subgroup to compare against individual performance within the subgroup.

Type: Application

Filed: September 25, 2011

Publication date: March 28, 2013

Inventor: Jonathon Gillen
FLEXIBLE AND SCALABLE STRUCTURED WEB DATA EXTRACTION

Publication number: 20130073514

Abstract: This document describes techniques that label text nodes of a seed site for each of a plurality of verticals. Once a seed site is labeled for a given vertical, the techniques extract features from the labeled text nodes of the seed site. The techniques learn vertical knowledge for the seed site based on the human labels and the extracted features, and adapt the learned vertical knowledge to a new web site to automatically and accurately identify attributes and extract attribute values targeted within a given vertical for structured web data extraction.

Type: Application

Filed: September 20, 2011

Publication date: March 21, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Rui Cai, Lei Zhang, Qiang Hao
METADATA RECORD GENERATION

Publication number: 20130024476

Abstract: A computer implemented method and system provide for automatic selection and extraction of metadata and media content from projects in a craft tool. Automated identification, classification and management of such metadata and content is provided using including techniques such as pattern recognition for audio and visual content. The automatic tracking and centralised storage of metadata and content for compliance purposes can be facilitated, and can enable querying of organised metadata stored in a central database. In an example, metadata and media content are extracted automatically from a project in a craft tool at a client system and are forwarded to a host system for the creation of a cue sheet including timings for media files from timing metadata in a project file to create the timings on the cue sheet.

Type: Application

Filed: October 7, 2010

Publication date: January 24, 2013

Inventors: Charles Hodgkinson, Kirk Zavieh
Automated Entity Verification

Publication number: 20130013553

Abstract: Some embodiments provide a verification system for automated verification of entities. The verification system automatedly verifies entities using a two part verification campaign. One part verifies that the entity is the true owner of the entity account to be verified. This verification step involves (1) the entity receiving a verification code at the entity account and returning the verification code to the verification system, (2) the entity associating an account that it has registered at a service provider to an account that the verification system has registered at the service provider, (3) both. Another part verifies the entity can respond to communications that are sent to methods of contact that have been previously verified as belonging to the entity. The verification system submits a first communication with a code using a verified method of contact. The verification system then monitors for a second communication to be returned with the code.

Type: Application

Filed: November 7, 2011

Publication date: January 10, 2013

Inventors: Aaron B. Stibel, Peter Delgrosso, Jeffrey M. Stibel, Shailen Misltry, Bryan Mierke, Paul Servino, Charles Chi Thoi Le, David Lo, David Allen Lyon
Automatic modification of web pages

Patent number: 8346620

Abstract: A system for interactive paper is described. Data fragments are captured at locations in a rendered document. A digital version of the document is optionally located. Markup data applied to the capture creates a rich set of interactions for the user. New models for publishing documents and new document-related services are described.

Type: Grant

Filed: September 28, 2010

Date of Patent: January 1, 2013

Assignee: Google Inc.

Inventors: Martin T. King, Dale L. Grover, Clifford A. Kushler, James Q. Stafford-Fraser
SYSTEMS AND METHODS FOR INFORMATION EXTRACTION USING CONTEXTUAL PATTERN DISCOVERY

Publication number: 20120303661

Abstract: Described herein are methods, systems, apparatuses and products for automatically discovering patterns in a text corpus. An aspect provides extracting at least one context string related to at least one annotator from the at least one text corpus; analyzing the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence; determining at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and grouping the at least one sequence signature into at least one group.

Type: Application

Filed: May 27, 2011

Publication date: November 29, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sebastian Johannes Blohm, Vivian Yaw-Wen Chu, Ching-Tien Ho, Yunyao Li, Huaiyu Zhu
System and method of intelligent data sharing for advanced features on mobile platforms

Publication number: 20120264480

Abstract: Generally described, the present disclosure relates to an electronic device having limited memory. More specifically, the disclosure relates to intelligent data sharing for advanced features on mobile platforms. In one illustrative embodiment, a mobile device provides a platform having native services that use shared data. The data can be received from a central server. In turn, the data can be separated on the mobile device into categories. For a number of contacts, these categories can include, but are not limited to, usage, total count, grouping, location and organization. After the data is placed within the categories, the data can be shared between the services for applications. These applications can include, but are not limited to, voice dialing, Bluetooth™ dialing, searching and dialing. The data can be prioritized depending on the categories. Through prioritization, data can be removed when memory is low and new data is received.

Type: Application

Filed: April 18, 2011

Publication date: October 18, 2012

Inventors: Suriyaprakash Soundrapandian, James Dean Midtun
EXTRACTION AND GROUPING OF FEATURE WORDS

Publication number: 20120239668

Abstract: Various embodiments of systems and methods for extraction and grouping of feature words are described herein. Feature words are obtained from a first corpus of text bodies comprising a plurality of reviews. A second corpus is created using a combination of the obtained feature words, verbs and adjectives from the first corpus. The second corpus comprises filtered reviews and each of the filtered reviews pertains to a review. Topics are preliminarily assigned for words in the filtered reviews of the second corpus. For each of the feature words in the second corpus, a topic count is determined for every preliminarily assigned topic. After determining the topic count, one or more of the topics are finally assigned to the feature words based on a topic count value. At least one topic is presented as a group of the feature words for which the at least one topic is assigned based on the topic count value.

Type: Application

Filed: March 17, 2011

Publication date: September 20, 2012

Inventors: CHIRANJIB BHATTACHARYYA, Himabindu Lakkaraju, Kaushik Nath, Sunil Arvindam
Increasing retrieval performance of images by providing relevance feedback on word images contained in the images

Patent number: 8261200

Abstract: An interactive system provides for increasing retrieval performance of images depicting text by allowing users to provide relevance feedback on words contained in the images. The system includes a user interface through which the user queries the system with query terms for images contained in the system. Word image suggestions are displayed to the user through the user interface, where each word image suggestion contains the same or slightly variant text as recognized from the word image by the system than the particular query terms. Word image suggestions can be included in the system by the user to increase system recall of images for the one or more query terms and can be excluded from the system by the user to increase precision of image retrieval results for particular query terms.

Type: Grant

Filed: April 26, 2007

Date of Patent: September 4, 2012

Assignee: Fuji Xerox Co., Ltd.

Inventors: Laurent Denoue, John E. Adcock, David M. Hilbert, Daniel Billsus
IDENTIFYING PARTICULAR IMAGES FROM A COLLECTION

Publication number: 20120203764

Abstract: A method of identifying one or more particular images from an image collection, includes indexing the image collection to provide image descriptors for each image in the image collection such that each image is described by one or more of the image descriptors; receiving a query from a user specifying at least one keyword for an image search; and using the keyword(s) to search a second collection of tagged images to identify co-occurrence keywords. The method further includes using the identified co-occurrence keywords to provide an expanded list of keywords; using the expanded list of keywords to search the image descriptors to identify a set of candidate images satisfying the keywords; grouping the set of candidate images according to at least one of the image descriptors, and selecting one or more representative images from each grouping; and displaying the representative images to the user.

Type: Application

Filed: February 4, 2011

Publication date: August 9, 2012

Inventors: Mark D. Wood, Alexander C. Loui
DATA EXTRACTION FRAMEWORK

Publication number: 20120150792

Abstract: The present disclosure involves systems, software, and computer implemented methods for providing a data extraction framework for extracting data and metadata from an application to provide additional functionality for the extracted data and metadata. One process includes operations for identifying a first application for data extraction and determining a set of data suitable for extraction from the first application using a software development kit associated with the first application. The set of data is stored in a repository without storing visualization components of the first application in the repository. The set of data is sent to a second application for further processing of the set of data. The second application is configured to bind different visualization components to the set of data for display of data elements in the set of data to a user.

Type: Application

Filed: December 9, 2010

Publication date: June 14, 2012

Applicant: SAP PORTALS ISRAEL LTD.

Inventors: Ohad Yassin, Pavel Kravets, Nisim Hafzadi, Ram Alon
METHOD AND SYSTEM FOR MACHINE-LEARNING BASED OPTIMIZATION AND CUSTOMIZATION OF DOCUMENT SIMILARITIES CALCULATION

Publication number: 20120136812

Abstract: One embodiment of the present invention provides a system for optimizing and customizing document-similarity calculation. During operation, the system presents a collection of similar documents to a user, collects feedback on the similarity of the documents from the user, generates generic rules for calculating document similarity, and filters documents with customized similarity calculation based on the feedback provided by the user.

Type: Application

Filed: November 29, 2010

Publication date: May 31, 2012

Applicant: PALO ALTO RESEARCH CENTER INCORPORATED

Inventor: Oliver Brdiczka
PROVIDING USERS WITH A PREVIEW OF TEXT MINING RESULTS FROM QUERIES OVER UNSTRUCTURED OR SEMI-STRUCTURED TEXT

Publication number: 20120089642

Abstract: The system and methods described herein provide results previewing for an interactive text mining system in order to feedback partial query results to users before all results that are responsive to a query have been found. These partial results allow the user to see the progress of their text mining query much sooner.

Type: Application

Filed: October 6, 2010

Publication date: April 12, 2012

Inventors: David R. Milward, Roger W. Hale, Malcolm R. Parsons, Sylvia F. Knight, Christopher I. Sullivan, Jason Trenouth, James R. Thomas
METADATA RECORD GENERATION

Publication number: 20120089643

Abstract: A computer implemented method and system provide for automatic selection and extraction of metadata and media content from projects in a craft tool. Automated identification, classification and management of such metadata and content is provided using including techniques such as pattern recognition for audio and visual content. The automatic tracking and centralised storage of metadata and content for compliance purposes can be facilitated, and can enable querying of organised metadata stored in a central database. In an example, metadata and media content are extracted automatically from a project in a craft tool at a client system and are forwarded to a host system for the creation of a cue sheet including timings for media files from timing metadata in a project file to create the timings on the cue sheet.

Type: Application

Filed: October 7, 2010

Publication date: April 12, 2012

Inventors: Charles Hodgkinson, Kirk Zavieh
PORTABLE TERMINAL

Publication number: 20120047167

Abstract: A portable terminal includes a word extracting unit that extracts a word contained in data of a Web page being viewed; a Web search request unit that transmits a search request to a search site with the word extracted by the word extracting unit as a search word and that receives a list of Web pages that contain the search word from the search site as a search result; and a display unit that displays the search result received by the Web search request unit.

Type: Application

Filed: November 2, 2011

Publication date: February 23, 2012

Applicant: FUJITSU TOSHIBA MOBILE COMMUNICATIONS LIMITED

Inventors: Masaki SAKAI, Natsuko OUCHI
System and Method for Real-Time Content Aggregation and Syndication

Publication number: 20120047176

Abstract: A system and methodology for real-time content aggregation and syndication is described. In one embodiment, for example, a method is described for assisting a user with extracting items relevant to search queries from documents including items of various types, the method comprises steps of: receiving a search query specifying a search phrase and a particular item type; identifying documents matching the search phrase; for each matching document, determining whether the document includes an item having the particular item type; and extracting items having the particular item type from the matching documents for display to the user. The solution enables a user to aggregate and syndicate content without a professional content manager or complicated content management software tools.

Type: Application

Filed: November 2, 2011

Publication date: February 23, 2012

Applicant: SYBASE, INC.

Inventor: Michael Timmons
PARALLEL DOCUMENT MINING

Publication number: 20120047172

Abstract: A technique includes providing a collection of documents in multiple languages, identifying, from the collection of documents, a group of candidate documents, where each candidate document in the group shares multiple corresponding rare features, evaluating pairs of candidate documents in the group using multiple common features present in the collection of documents, and determining, based on evaluating the pairs of candidate documents, whether each pair of candidate documents corresponds to a translated pair of documents.

Type: Application

Filed: August 22, 2011

Publication date: February 23, 2012

Applicant: Google Inc.

Inventors: Jay M. Ponte, Jakob Uszkoreit, Ashok C. Popat, Moshe Dubiner
INFORMATION AND RECOMMENDATION DEVICE, METHOD, AND PROGRAM

Publication number: 20120036144

Abstract: According to one embodiment, an information recommendation device includes following units. The input unit is configured to input a first document and a second document which has been browsed before the first document. The subject-keyword extraction unit is configured to extract first and second subject keywords from the first and second documents, respectively. The interest-keyword extraction unit is configured to extract first interest keywords from the first and second subject keywords, and to extract second interest keywords based on information specifying the first and second documents, the first interest keywords, and the first and second subject keywords. The second interest keywords are estimated to be keywords in which the user is next interested. The acquiring unit is configured to acquire, based on the second interest keywords, recommendation information on third documents which are candidates to be browsed after the first document. The presentation unit presents the recommendation information.

Type: Application

Filed: August 25, 2011

Publication date: February 9, 2012

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Masayuki Okamoto, Nayuko Watanabe, Masaaki Kikuchi, Takayuki Iida, Mika Fukui

1 2 3 next