Latent Semantic Index Or Analysis (lsi Or Lsa) Patents (Class 707/739)

Identifying files associated with a workflow

Patent number: 8019765

Abstract: To determine files associated with one or more workflows, a trace of accesses of files in at least one server is received. The files are grouped into at least one set of files, where the files in the set are accessed together more than a predetermined number of times in the trace. Files associated with the particular workflow are identified based on the at least one set.

Type: Grant

Filed: October 29, 2008

Date of Patent: September 13, 2011

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Anna Povzner, Kimberly Keeton, Marcos K. Aguilera, Arif A. Merchant, Charles B. Morrey, III, Mustafa Uysal
DETERMINATION OF PASSAGES AND FORMATION OF INDEXES BASED ON PARAGRAPHS

Publication number: 20110219003

Abstract: A method for retrieving information from a document includes a process of grouping paragraphs in the document to form passages, and forming indexes relating to a number of words in the passages. The number of paragraphs in a passage is determined based on the number of paragraphs considered optimum for a writer to cover a particular topic. Passages are formed by merging each N consecutive paragraphs in the document, where N is an integer greater than 1. Thus, individual passages may include paragraphs that are identical to other passages.

Type: Application

Filed: May 16, 2011

Publication date: September 8, 2011

Inventor: Jiandong BI
VOICE OPERATED, MATRIX-CONNECTED, ARTIFICIALLY INTELLIGENT ADDRESS BOOK SYSTEM

Publication number: 20110211677

Abstract: An online address book system having sufficient hardware and software to operate an address book user interface and to perform intelligent interpretations of voice and text inputs from users. The system includes at least one server software module that includes software to perform a plurality of functions. These include the ability to receive voice input data and separate user voice queries, wherein the software can arrange the data so as to create a data base that includes at least three access dimensions, including contact access, contact-relationship access and contact-time frame access, and so as to create a connectivity matrix based on a plurality of contact pair relationships applying connective recognition logic. The system provides a voice operated user interface that permits access to address book stored data based on user input selected from the group consisting of contact, a contact-relationship pair, a contact-time frame pair, and combinations thereof.

Type: Application

Filed: May 13, 2011

Publication date: September 1, 2011

Inventor: CHARLES M. BASNER
Systems and Methods for Finding Keyword Relationships Using Wisdoms from Multiple Sources

Publication number: 20110208708

Abstract: Systems and methods for finding related terms based on three different sources are disclosed. Generally, a first plurality of distances is determined based on one or more received terms and a first plurality of terms derived from an algorithmic search list. A second plurality of distances is determined based on the one or more received terms and a second plurality of terms derived from a sponsored search list. A third plurality of distances is determined based on the one or more received terms and a third plurality of terms derived from search logs. The first, second, and third pluralities of distances are combined to derive a fourth plurality of distances. Finally, a plurality of related terms related to the one or more received terms is generated based on the fourth plurality of distances.

Type: Application

Filed: February 25, 2010

Publication date: August 25, 2011

Applicant: Yahoo! Inc.

Inventors: Weiguo Liu, Qiong Zhang
Methods, systems, and products for classifying content segments

Patent number: 8005841

Abstract: Methods, systems, and products are disclosed for classifying content segments. A set of annotations is received that occur within a segment of time-varying content. Each annotation is scored to each node in an ontology. The segment is classified based on at least one of the scores.

Type: Grant

Filed: April 28, 2006

Date of Patent: August 23, 2011

Assignee: Qurio Holdings, Inc.

Inventors: Richard J. Walsh, Alfredo C. Issa
SYSTEM AND METHOD FOR DETERMINING THE PROVENANCE OF A DOCUMENT

Publication number: 20110202535

Abstract: A method of identifying a provenance of a document is provided. The method may include obtaining a query document that is included in a document set comprising a plurality of documents. The method may also include grouping the plurality of documents into a plurality of fine clusters based on a textual similarity between the plurality of documents. The method may also include identifying a target fine cluster within the plurality of fine clusters, the target fine cluster including the query document. The method may also include ordering the documents included in the target fine cluster based, at least in part, on metadata associated with each of the documents to identify a source document. The method may also include generating a query response that includes the source document.

Type: Application

Filed: February 13, 2010

Publication date: August 18, 2011

Inventors: Vinay Deolalikar, Hernan Laffitte
Method and apparatus for detecting web-based electronic mail in network traffic

Patent number: 7996406

Abstract: Method and apparatus for detecting web-based electronic mail in network traffic is described. In some examples, web pages are extracted from the network traffic. Fields in each page of a group of the web pages that share a documents structure are identified. A statistical analysis of the fields of each page in the group of web pages is performed to identify any electronic mail (e-mail) fields. The group of web pages is indicated to include web-based e-mail messages if the fields of each page in the group of web pages include at least one e-mail field.

Type: Grant

Filed: September 30, 2008

Date of Patent: August 9, 2011

Assignee: Symantec Corporation

Inventors: Basant Rajan, Chirag Deepak Dalal, Navin Kabra
AUTOMATIC ORGANIZATION OF BROWSING HISTORIES

Publication number: 20110191344

Abstract: An automatic organization into topics for a browsing history. In one embodiment, a system identifies groups of browsing actions as related, and clusters the browsing history (e.g. a web browsing history) into sessions based on heuristics used to determine relationships. Latent semantic analysis can be used to determine the relationships which can be considered topics. User interfaces for displaying or otherwise presenting these sessions can include icons representative of topics, and these icons can have different sizes depending on a frequency of web page visits within a topic. The topics can be displayed in time ranges or in a cover flow view or both time ranges and cover flow view.

Type: Application

Filed: February 3, 2010

Publication date: August 4, 2011

Inventors: Jing Jin, Kevin Decker, Timothy Hatcher, Raymond Sepulveda, Michael Thole
DOCUMENT ANALYSIS SYSTEM

Publication number: 20110191345

Abstract: An information processing apparatus (5) is provided comprising: a lexicon generation module (22) operable to process a set of documents (1) to identify key words (2) present in the documents; a link generation module (24) operable to generate network data (3) linking documents which share the same or semantically related key words identified by the lexicon generation module; and a network analysis module (26) operable to associate documents with metric values based upon the patterns of connectivity of the network data generated by the link generation module. The metric values associated with documents in the set can be utilized to select documents or groups of associated documents for further processing or indexing.

Type: Application

Filed: January 28, 2011

Publication date: August 4, 2011

Applicant: E-THERAPEUTICS PLC

Inventor: Malcolm P. Young
EXPERT LIST RECOMMENDATION METHODS AND SYSTEMS

Publication number: 20110184926

Abstract: An expert list recommendation system is provided, including: a domain modeler for establishing an expert knowledge database according to a plurality of expert publications in different domains, receiving an inquired proposal, determining the academic field of the inquired proposal according to keywords of the inquired proposal and keyword sets of the expert publications in different domains stored in the expert knowledge database, and outputting a first domain expert list corresponding to the inquired proposal, wherein the first domain expert list comprises a first group of expert publications and a first group of expert names; and an expertise matcher for receiving the first domain expert list, comparing semantic relatedness between keywords of the inquired proposal and keywords corresponding to the first group of the expert publications of the first domain expert list to output a first expert list to a display device.

Type: Application

Filed: June 25, 2010

Publication date: July 28, 2011

Applicant: NATIONAL TAIWAN UNIVERSITY OF SCIENCE & TECHNOLOGY

Inventors: Hahn-Ming LEE, Jan-Ming HO, Jerome YEH, Kai-Hsiang YANG, Tai-Liang KUO, Chun-Han CHEN
Domain-specific sentiment classification

Patent number: 7987188

Abstract: A domain-specific sentiment classifier that can be used to score the polarity and magnitude of sentiment expressed by domain-specific documents is created. A domain-independent sentiment lexicon is established and a classifier uses the lexicon to score sentiment of domain-specific documents. Sets of high-sentiment documents having positive and negative polarities are identified. The n-grams within the high-sentiment documents are filtered to remove extremely common n-grams. The filtered n-grams are saved as a domain-specific sentiment lexicon and are used as features in a model. The model is trained using a set of training documents which may be manually or automatically labeled as to their overall sentiment to produce sentiment scores for the n-grams in the domain-specific sentiment lexicon. This lexicon is used by the domain-specific sentiment classifier.

Type: Grant

Filed: August 23, 2007

Date of Patent: July 26, 2011

Assignee: Google Inc.

Inventors: Tyler J. Neylon, Kerry L. Hannan, Ryan T. McDonald, Michael Wells, Jeffrey C. Reynar
Methods and Apparatuses For Abstract Representation of Financial Documents

Publication number: 20110179036

Abstract: Systems and methods are provided for creating abstracted, normalized, and reuseable and combinable representations of information contained in multiple documents and information of any supported format, and allowing for exporting of information in any other desired and supported format. Further the system and methods provide for uploading documents based on a known template, where the data members can be automatically recognized and the document stored in normalized format without end-user or developer intervention. Normalization of data is achieved transparently on upload and denormalization performed transparently on download. Further, embodiments provide for the reuse and recombination of data members to create entirely new representations.

Type: Application

Filed: December 16, 2010

Publication date: July 21, 2011

Inventors: Jason Townes French, Auston John Stewart
CITATION NETWORK VIEWER AND METHOD

Publication number: 20110179035

Abstract: A visualization-based interactive legal research tool that generates from a multi-dimensional citation network a semantics-constrained citation sub-network that focuses on one individual issue in which a user is interested, and puts the sub-network on an interactive user interface (“UT”), which allows the researcher to browse, navigate, and jump over to start new sub-networks on different issues that are relevant to original issues.

Type: Application

Filed: June 1, 2010

Publication date: July 21, 2011

Applicant: LEXISNEXIS, A DIVISION OF REED ELSEVIER INC.

Inventors: Paul Zhang, Lavanya Koppaka
Domain specific local search

Patent number: 7984041

Abstract: Methods and apparatus provide for a local search indexer to allow for an optimized search within a web server that returns accurate search results while maintaining independent control as to defining search patterns, search prioritization, and updated content available for search. Specifically, the local search indexer organizes content according to a hierarchical directory structure at a web server. The hierarchical directory structure includes at least one directory level that provides at least one directory for storing the content. The local search indexer builds a search index associated with the directory and stores the search index at the web server. The search index is populated with indexed content based on an update of the content stored in the directory. The local search indexer employs a search engine, at the web server, to process search queries against the indexed content to provide a search result that includes the update of the content.

Type: Grant

Filed: July 9, 2007

Date of Patent: July 19, 2011

Assignee: Oracle America, Inc.

Inventor: Yogesh Y Patil
METHOD OF DETERMINING A RELIABILITY INDICATOR FOR SIGNATURES OBTAINED FROM CLINICAL DATA AND USE OF THE RELIABILITY INDICATOR FOR FAVORING ONE SIGNATURE OVER THE OTHER

Publication number: 20110173201

Abstract: This invention relates to a method and an apparatus for determining a reliability indicator for at least one set of signatures obtained from clinical data collected from a group of samples. The signatures are obtained by detecting characteristics in the clinical data from the group of sample sand each of the signatures generate a first set of stratification values that stratify the group of samples. At least one additional and parallel stratification source to the signatures obtained from group of sample sis provided, the at least one additional and parallel stratification source to the signatures being independent from the signatures and generates a second set of stratification values. A comparison is done for each respective sample, where the first stratification values are compared with a true reference stratification values, and where the second stratification values are compared with the true reference stratification values.

Type: Application

Filed: September 24, 2009

Publication date: July 14, 2011

Applicant: KONINKLIJKE PHILIPS ELECTRONICS N.V.

Inventors: Angel Janevski, Nilanjana Banerjee, Yasser Alsafadi, Vinay Varadan
APPARATUS AND METHOD FOR AUTHORING DATA IN COMMUNICATION SYSTEM

Publication number: 20110173200

Abstract: An apparatus for authoring data in a communication system includes: an extraction unit configured to receive media corresponding to contents and extract contents information regarding the contents from the received media; a generation unit configured to generate a DMB ECG XML-based metadata comprising the extracted contents information; and a processing unit configured to visualize particulars of the DMB ECG XML-based metadata through a user interface and process the user interface so that the DMB ECG XML-based metadata is generated and edited on a template.

Type: Application

Filed: November 12, 2010

Publication date: July 14, 2011

Applicant: Electronics and Telecommunications Research Institute

Inventors: Seung-Jun YANG, Min-Sik Park, Han-Kyu Lee, Jin-Woo Hong
VISUAL AND MULTI-DIMENSIONAL SEARCH

Publication number: 20110167053

Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.

Type: Application

Filed: March 15, 2011

Publication date: July 7, 2011

Applicant: Microsoft Corporation

Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
Document categorisation system

Patent number: 7971150

Abstract: A document categorization system, including a clusterer for generating clusters of related electronic documents based on features extracted from the documents, and a filter module for generating a filter on the basis of the clusters to categorize further documents received by the system. The system may include an editor for manually browsing and modifying the clusters. The categorization of the documents is based on n-grams, which are used to determine significant features of the documents. The system includes a trend analyzer for determining trends of changing document categories over time, and for identifying novel clusters. The system may be implemented as a plug-in module for a spreadsheet application for permitting one-off or ongoing analysis of text entries in a worksheet.

Type: Grant

Filed: September 25, 2001

Date of Patent: June 28, 2011

Assignee: Telstra New Wave Pty Ltd.

Inventors: Bhavani Raskutti, Adam Kowalczyk
Extraction of attributes and values from natural language documents

Patent number: 7970767

Abstract: One or more classification algorithms are applied to at least one natural language document in order to extract both attributes and values of a given product. Supervised classification algorithms, semi-supervised classification algorithms, unsupervised classification algorithms or combinations of such classification algorithms may be employed for this purpose. The at least one natural language document may be obtained via a public communication network. Two or more attributes (or two or more values) thus identified may be merged to form one or more attribute phrases or value phrases. Once attributes and values have been extracted in this manner, association or linking operations may be performed to establish attribute-value pairs that are descriptive of the product. In a presently preferred embodiment, an (unsupervised) algorithm is used to generate seed attributes and values which can then support a supervised or semi-supervised classification algorithm.

Type: Grant

Filed: April 30, 2007

Date of Patent: June 28, 2011

Assignee: Accenture Global Services Limited

Inventors: Katharina Probst, Rayid Ghani, Andrew E. Fano, Marko Krema, Yan Liu
SEMANTIC RECONSTRUCTION

Publication number: 20110119272

Abstract: Determining a semantic relationship is disclosed. Source content is received. Cluster analysis is performed at least in part by using at least a portion of the source content. At least a portion of a result of the cluster analysis is used to determine the semantic relationship between two or more content elements comprising the source content.

Type: Application

Filed: January 25, 2011

Publication date: May 19, 2011

Applicant: APPLE INC.

Inventors: Philip Andrew Mansfield, Michael Robert Levy, Yuri Khramov, Darryl Will Fuller
Operation assisting apparatus and operation assisting method

Patent number: 7945864

Abstract: An operation assisting apparatus includes: an option-function distance storage unit that stores a semantic distance between each of the options displayed on a menu screen and each of functions positioned at an end in the hierarchical structure; an operation history storage unit that stores the operation history of the options sequentially selected by the user; an estimation unit that estimates, based on a semantic distance between a selection option selected by the user and each of the functions, and a semantic distance between an unselected selection option that has been selectable but not selected and each of the functions, a degree of probability that the function is the function desired by the user; and an operational assistance determination unit that determines, based on the result of the estimation, a detail of an output such that functions with higher probability will be presented with higher precedence in selectability.

Type: Grant

Filed: October 29, 2007

Date of Patent: May 17, 2011

Assignee: Panasonic Corporation

Inventors: Tsuyoshi Inoue, Makoto Nishizaki, Satoshi Matsuura
Synthesizing information-bearing content from multiple channels

Patent number: 7945564

Abstract: A computing system and method receive a query; separate a plurality of information sources into individual elements of content (EOC); tag each EOC with metadata that indicate source, date, and other relevant information; pattern match each EOC; calculate the respective distance function from every EOC to every other EOC; and output EOC to a set of virtual buffers (404) containing appropriately related EOC less than a given distance value. The method further creates virtual summary buffers (406); then concatenates the EOC in each virtual buffer (404); applies a comparative analysis filter (318) to remove redundant sub-elements; and presents the results as summary digests (408).

Type: Grant

Filed: August 14, 2008

Date of Patent: May 17, 2011

Assignee: International Business Machines Corporation

Inventors: Amon Amir, Gal Ashour, Brian K. Blanchard, Matthew Denesuk, Reiner Kraft
Method for categorizing content published on internet

Patent number: 7945555

Abstract: The present invention provides method and system for categorizing a content published on Internet. The method comprising gathering one or more feeds associated with the content. The method further comprises extracting contextual information from the one or more feeds. Thereafter, the content is categorized into one or more general web-based categories belonging to a set of general web-based categories. The categorizing step further comprises performing a semantic analysis of the contextual information that yields a keyword string. The content is classified into the one or more general web-based category based on the keyword string. Finally, the set of general web-based categories is translated to a set of pre-defined categories, such that one or more general web-based category is translated to a pre-defined category that is relevant to an end user.

Type: Grant

Filed: December 27, 2007

Date of Patent: May 17, 2011

Assignee: Yume, Inc.

Inventors: Ayyappan Sankaran, Jayant Kadambi, Matthew D Shaver
Dynamic corpus generation

Patent number: 7941418

Abstract: A computer-implemented method of generating a dynamic corpus includes generating web threads, based upon corresponding sets of words dequeued from a word queue, to obtain web thread resulting URLs. The web thread resulting URLs are enqueued in a URL queue. Multiple text extraction threads are generated, based upon documents downloaded using URLs dequeued from the URL queue, to obtain text files. New words are randomly obtained from the text files, and the randomly obtained words from the text files are enqueued in the word queue. This process is iteratively performed, resulting in a dynamic corpus.

Type: Grant

Filed: November 9, 2005

Date of Patent: May 10, 2011

Assignee: Microsoft Corporation

Inventor: Carlos Alejandro Arguelles
SYSTEMS AND METHODS FOR INFORMATION INTEGRATION THROUGH CONTEXT-BASED ENTITY DISAMBIGUATION

Publication number: 20110106807

Abstract: Described within are systems and methods for disambiguating entities, by generating entity profiles and extracting information from multiple documents to generate a set of entity profiles, determining equivalence within the set of entity profiles using similarity matching algorithms, and integrating the information in the correlated entity profiles. Additionally, described within are systems and methods for representing entities in a document in a Resource Description Framework and leveraging the features to determine the similarity between a plurality of entities. An entity may include a person, place, location, or other entity type.

Type: Application

Filed: November 1, 2010

Publication date: May 5, 2011

Applicant: JANYA, INC

Inventors: Rohini K. Srihari, Harish Srinivasan, Richard Smith, John Chen
System and Method of Content Generation

Publication number: 20110093343

Abstract: Methods and systems are given for representing and generating contents from pre-existed and pre-built contents for a given content. Methods are given for transforming information representation from one medium, type, or language to another medium, type and language. Exemplary embodiment is given for transforming the semantics of a given text or spoken language to a visual representation or combination of them. The systems and methods generate new contents in general and multimedia contents in particular in response to or for representing an input composition utilizing pre-existed and pre-built contents of various types, languages, and forms. The associated client server systems over the communication network are also given for generating contents for the contents given by the clients.

Type: Application

Filed: October 20, 2010

Publication date: April 21, 2011

Inventor: Hamid Hatami-Hanza
SEMANTIC ANALYSIS OF DOCUMENTS TO RANK TERMS

Publication number: 20110082863

Abstract: A method, apparatus and computer program product provides for a semantic analyzer to produce and rank semantic terms to reflect their relationship to the theme and topics of a document. The text and the document can have no relationship to any pre-selected keywords before the semantic analyzer performs text extraction. The semantic analyzer extracts text from a document and performs semantic analysis on the extracted text. The semantic analyzer provides a plurality of ranked semantic terms as a result of the semantic analysis and associates semantic terms with the document as semantic keywords. The semantic terms define content to be presented with the document where the content is an advertisement, a link to a remote information resource or a second document.

Type: Application

Filed: December 15, 2010

Publication date: April 7, 2011

Applicant: ADOBE SYSTEMS INCORPORATED

Inventors: WALTER CHANG, NADIA GHAMRAWI
SYSTEMS AND METHODS FOR USING METADATA TO ENHANCE DATA IDENTIFICATION OPERATIONS

Publication number: 20110078146

Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.

Type: Application

Filed: September 20, 2010

Publication date: March 31, 2011

Applicant: CommVault Systems, Inc.

Inventors: Anand Prahlad, Jeremy Alan Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
SYSTEM AND METHOD FOR DATA CORRELATION AND MOBILE TERMINAL THEREFOR

Publication number: 20110077048

Abstract: The invention relates to a system for data correlation, having: a receiving device 1 having an image acquisition element 10 and a data set generator 12 for generating at least one object data set from at least one acquired first image, which represents a physical object, and an identification label, which uniquely determines an object-related acquisition procedure, and at least one information data set from at least one acquired second image, which represents coded information related to the physical object, and the identification label; a correlation device 2 for the extraction 20 of the coded information from the information data set, for the semantic analysis 22 of the extracted information, and for the generation of at least one combination data sets ? from the results of the semantic analysis, the extracted information, and the at least one object data set with the same identification label as the extracted information data set; and a user device 3 for the storage and further use of the combination data

Type: Application

Filed: March 3, 2009

Publication date: March 31, 2011

Applicant: Linguatec Sprachtechnologien GmbH

Inventor: Reinhard Busch
Visual and multi-dimensional search

Patent number: 7917514

Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.

Type: Grant

Filed: June 28, 2006

Date of Patent: March 29, 2011

Assignee: Microsoft Corporation

Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
Expediting Reverse Geocoding With A Bounding Region

Publication number: 20110072020

Abstract: A method for reverse geocoding location information obtained by a wireless communications device comprises determining the location information for a location, communicating the location information to a reverse geocoding server that reverse-geocodes the location information to generate location description data for a bounding region that geographically surrounds the location, receiving the location description data from the reverse geocoding server for the bounding region containing the location, and caching the location description data for the bounding region in a memory cache on the device. When the current location remains within one or more bounding regions cached on the device, location description data is fetched from the cache, thus improving application responsiveness. Only when the current location is no longer within the bounding region(s) does the device communicate a new request to the reverse geocoding server.

Type: Application

Filed: September 18, 2009

Publication date: March 24, 2011

Applicant: RESEARCH IN MOTION LIMITED

Inventors: Ngoc Bich Ngo, Russell Norman Owen
Semantic and Text Matching Techniques for Network Search

Publication number: 20110072021

Abstract: In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.

Type: Application

Filed: September 21, 2009

Publication date: March 24, 2011

Applicant: YAHOO! INC.

Inventors: Yumao Lu, Lei Duan, Fan Li, Benoit Dumoulin, Xing Wei
Adaptive archive data management

Patent number: 7912816

Abstract: In one embodiment, input is received from a user defining a classification and an analytic for the classification. Multiple classifications and analytics may be defined by a user. A definition of relevance parameters is determined that characterize the classification and a set of analytics measures associated with the analytic. The definition may be for the classification. Unstructured data and structured data are analyzed based on the definition of the relevance parameters to determine relevant data in the unstructured data and the structured data. The relevant data being data that is determined to be relevant to the classification defined by the user. An index of the terms from the relevant data is determined. The index is useable by an analytics tool to provide results for queries of the unstructured data and structured data. The query may be used within the classification such that targeted results are provided using the index and the relevant data to the classification.

Type: Grant

Filed: April 18, 2008

Date of Patent: March 22, 2011

Assignee: Alumni Data Inc.

Inventors: Aloke Guha, Joan Wrabetz
QUERY TERM RELATIONSHIP CHARACTERIZATION FOR QUERY RESPONSE DETERMINATION

Publication number: 20110066618

Abstract: Methods, apparatuses, and systems are provided to determine a response to a user submitted query based, at least in part, on a relationship between and/or among a plurality of terms of the query.

Type: Application

Filed: September 14, 2009

Publication date: March 17, 2011

Applicant: Yahoo! Inc.

Inventors: Borkur Sigurbjornsson, Vanessa Murdock, Roelof van Zwol, Maarten Clements
AUTOMATICALLY FINDING CONTEXTUALLY RELATED ITEMS OF A TASK

Publication number: 20110066619

Abstract: Architecture for enabling a user to automatically recover documents and other information associated with work contexts and recover documents and other information artifacts associated with a specific project. The architecture enables monitoring and recording of activity information related to user interactions with information artifacts pertaining to a particular work context. The user can select a document having a portion of work content (e.g., a term or other type of reference item in a document) related to the work context. A lexical analysis is performed on the activity information and the reference item to identify lexical similarities. A list of candidate items (e.g., related documents) is inferred from the information artifacts based on the lexical similarities. The candidate items related to the work context are presented to the user, who can select specific items to reestablish the work context.

Type: Application

Filed: September 16, 2009

Publication date: March 17, 2011

Applicant: Microsoft Corporation

Inventors: George Perantatos, Kuldeep Karnawat, John S. Wana
Polyarchical data indexing and automatically generated hierarchical data indexing paths

Patent number: 7908253

Abstract: Data indexing using polyarchical indexing codes and automatically generated expansion paths. For a piece of data, an indexing code is received relating to a particular categorization or other indexing parameter. Based upon the indexing code, one or more expansion sets of codes are retrieved and applied to the piece of data. The expansion sets of codes may include indexing codes that relate to hierarchical levels of indexing. The expansion sets of codes may also include different expansion paths through the hierarchical levels of indexing. The polyarchical codes may include multiple cross-categorization of the data across the same or different levels of categories. They may also include multiple expansion paths in different directions across hierarchical levels of categories or indexing.

Type: Grant

Filed: August 7, 2008

Date of Patent: March 15, 2011

Assignee: Factiva, Inc.

Inventors: Jonathan Guy Grenside Cooke, Andrew Richard Young
Information providing system and information providing method for providing advertisement information based on keywords associated with content

Patent number: 7908171

Abstract: The present invention provides an information providing system including an information registration unit capable of registering a front keyword for use in relation to content or content information to be provided a user terminal and back keywords set in relation to the front keyword can be registered, an advertisement registration unit capable of registering advertisement information for use in relation to the back keyword and an information providing unit capable of providing the advertisement information to the user terminal. The advertisement registration unit is capable of selecting specific advertisement information through an auction transaction. The information providing unit is capable of displaying keyword buttons enabling keyword selection in a display screen at the user terminal.

Type: Grant

Filed: November 13, 2007

Date of Patent: March 15, 2011

Assignees: Sony Corporation, Plat-Ease Corporation

Inventors: Kazuhiro Fukuda, Tetsuo Maruyama, Tetsu Sumita
Semantic correlation for flow analysis in messaging systems

Patent number: 7904457

Abstract: Improved techniques for flow analysis in messaging systems are disclosed. For example, a method for finding correlations between messages of a system based on content includes the following steps. For one or more executions of the system, obtaining the messages of the system, wherein each message has a schema associated therewith. The messages are categorized into groups, wherein each group has a common schema. Pairs of messages from disparate groups are found wherein, for the messages of a pair, there is a feature in common in their contents.

Type: Grant

Filed: May 30, 2007

Date of Patent: March 8, 2011

Assignee: International Business Machines Corporation

Inventors: Wim De Pauw, Robert L. Hoch, Yi Huang
Method and apparatus for optimizing queries under parametric aggregation constraints

Patent number: 7904458

Abstract: The present invention relates to a method and apparatus for optimizing queries. The present invention discloses an efficient method for providing answers to queries under parametric aggregation constraints.

Type: Grant

Filed: December 26, 2009

Date of Patent: March 8, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Nikolaos Koudas, Divesh Srivastava, Sudipto Guha, Dimitrios Gunopulos, Michail Vlachos
INFORMATION NERVOUS SYSTEM

Publication number: 20110047148

Abstract: A semantically integrated knowledge retrieval, management, delivery and presentation system.

Type: Application

Filed: March 26, 2010

Publication date: February 24, 2011

Inventor: Nosa Omoigui
Computer method and apparatus for parameterized semantic inquiry templates with type annotations

Patent number: 7885973

Abstract: A computer method and system generates inquires. The method and system provide a plurality of templates. Each template outlines a respective inquiry and is associated with one or more semantic types or contexts. Each template has one or more parameters for defining a query instance of the respective inquiry. User input selects a template from the plurality and specifies values for the parameters of the user selected template. Using the user selected template and the user-specified parameter values, an instance of a query is produced. Each template, is associated with semantic types during template construction. The semantic types may be based on classes in an ontology. Template construction may include templatizing prior existing or other queries to create respective templates. In application or use of a template, query generation may be during modeling of a certain domain, and the produced query is for information about the certain domain.

Type: Grant

Filed: February 22, 2008

Date of Patent: February 8, 2011

Assignee: International Business Machines Corporation

Inventors: Nishanth R. Sastry, Steven I. Ross, Daniel M. Gruen, Susanne C. Hupfer
MIXING KNOWLEDGE SOURCES FOR IMPROVED ENTITY EXTRACTION

Publication number: 20110022598

Abstract: The disclosed embodiments of computer systems and techniques utilize an ensemble semantics framework to combine knowledge acquisition systems that yield significantly higher quality resources than each system in isolation. Gains in entity extraction are achieved by combining state-of-the-art distributional and pattern-based systems with a large set of features from, for example, a webcrawl, query logs, and wisdom of the crowd sources. This results in improved query interpretation and greater relevancy in providing search results and advertising, for example.

Type: Application

Filed: July 24, 2009

Publication date: January 27, 2011

Applicant: YAHOO! INC.

Inventors: Marco Pennacchiotti, Patrick Pantel
Methods and apparatus for evaluating semantic proximity

Patent number: 7877349

Abstract: Methods and apparatus to evaluate the semantic proximity between reference free-form text entry and a candidate free-form text request.

Type: Grant

Filed: April 1, 2010

Date of Patent: January 25, 2011

Assignee: Microsoft Corporation

Inventors: Francois Huet, Gray Salmon Norton
Method and system for guided cluster based processing on prototypes

Patent number: 7877388

Abstract: A method (and system) for clustering a plurality of items. Each of the items includes information. The method includes inputting a plurality of items. Each of the items includes information. The items are provided into a clustering process. The method also inputs an initial organization structure into the clustering process. The initial organization structure includes one or more categories, at least one of the categories being associated with one of the items. The method processes the plurality of items based upon at least the initial organization structure and the information in each of the items; and determines a resulting organization structure based upon the processing. The resulting organization structure relates to the initial organization structure.

Type: Grant

Filed: October 31, 2007

Date of Patent: January 25, 2011

Assignee: Stratify, Inc.

Inventors: John O. Lamping, Ramana Venkata, Shashidhar Thakur, Samdeer Siruguri
Semantic analysis documents to rank terms

Patent number: 7873640

Abstract: A method, apparatus and computer program product provides for a semantic analyzer to produce and rank semantic terms to reflect their relationship to the theme and topics of a document. The text and the document can have no relationship to any pre-selected keywords before the semantic analyzer performs text extraction. The semantic analyzer extracts text from a document and performs semantic analysis on the extracted text. The semantic analyzer provides a plurality of ranked semantic terms as a result of the semantic analysis and associates semantic terms with the document as semantic keywords. The semantic terms define content to be presented with the document where the content is an advertisement, a link to a remote information resource or a second document.

Type: Grant

Filed: March 27, 2007

Date of Patent: January 18, 2011

Assignee: Adobe Systems Incorporated

Inventors: Walter Chang, Nadia Ghamrawi
Rewriting node reference-based XQuery using SQL/SML

Patent number: 7870124

Abstract: Techniques for processing reference-based SQL/XML operators are provided. Instead of extracting copies of one or more nodes from XML data, a reference-based operator returns a reference to a node. Such a reference is used to determine, for example, whether the corresponding node comes logical before, after, or is the same as another node. An SQL/XML query that includes a reference-based operator may be the original query, or may be generated (e.g., rewritten) from a non-SQL/XML query, such as an XQuery query. One or more physical rewrites may be performed on the SQL/XML query, depending on how the XML data is stored and/or whether an XML index exists for the XML data.

Type: Grant

Filed: December 13, 2007

Date of Patent: January 11, 2011

Assignee: Oracle International Corporation

Inventors: Zhen Hua Liu, Hui Joe Chang, James W. Warner
DIAGNOSTIC REPORT SEARCH SUPPORTING APPARATUS AND DIAGNOSTIC REPORT SEARCHING APPARATUS

Publication number: 20110004595

Abstract: According to embodiments, a diagnostic report search supporting apparatus and a diagnostic report searching apparatus each have a report registering part, a structuring processing part, a related-term analyzing part, a counting part, and a keyword extracting part. The structuring processing part extracts terms from a sentence written in a diagnostic report, and classifies the terms into predetermined kinds. The related-term analyzing part generates combinations each composed of two or more terms based on the plurality of terms having been extracted. The counting part counts the existence number of same combinations in the plurality of combinations, and extracts combinations whose existence numbers are a predetermined number or more. The keyword extracting part extracts a combination including a desired keyword, and extracts a term other than the desired keyword as a related keyword.

Type: Application

Filed: June 23, 2010

Publication date: January 6, 2011

Applicants: Kabushiki Kaisha Toshiba, TOSHIBA MEDICAL SYSTEMS CORPORATION

Inventors: Hiromasa YAMAGISHI, Hikaru Futami, Kenichi Niwa
Information managing system, information managing method, and information managing program for managing various items of information of objects to be retrieved

Patent number: 7860867

Abstract: An information managing system includes a parameter setting unit for setting a parameter representative of an attribute of a user and information to be retrieved, and an information relevance space generator for generating an information relevance space representative of information indicating a relevance between the user and the information to be retrieved, based on the parameter set by the parameter setting unit.

Type: Grant

Filed: December 20, 2006

Date of Patent: December 28, 2010

Assignee: NEC Corporation

Inventors: Masaki Kan, Junichi Yamato, Yuji Kaneko, Yoshihiro Kajiki
Information Process Apparatus, Information Process Method, and Program

Publication number: 20100312767

Abstract: Provided is an information process apparatus including: an extraction unit which is configured to extract words in a predetermined word class from comments which predetermined users write about a predetermined item; a grouping unit which is configured to group the predetermined users by performing a multivariate analysis using the words extracted by the extraction unit; a storage unit which is configured to store the groups, the predetermined item, and the words in association with each other; a determination unit which is configured to determine which group a user who is to write a comment belongs to when the user is to write the comment about the predetermined item; and a reading unit which is configured to read from the storage unit words which are associated with the group determined by the determination unit and the predetermined item which the comment is to be written about.

Type: Application

Filed: May 14, 2010

Publication date: December 9, 2010

Inventor: Mari SAITO
System And Method For A Unified Semantic Ranking of Compositions of Ontological Subjects And The Applications Thereof

Publication number: 20100293166

Abstract: The present invention discloses methods, systems, and tools for unified semantic ranking of compositions of ontological subjects. The method breaks a composition to a plurality of partitions as well as its constituent ontological subjects of different orders and builds a participation matrix indicating the participation of ontological subjects of the composition in other ontological subjects, i.e. the partitions, of the composition. Using the participation information of the OSs into each other a similarity matrix is built from which the semantic importance ranks of the partitions of the composition are calculated. The method systematically enables the calculation the semantic ranks of ontological subjects of different orders of the composition. Various systems for implementing the method and numerous applications and services are disclosed.

Type: Application

Filed: April 7, 2010

Publication date: November 18, 2010

Applicant: Hamid Hatami-Hanza

Inventor: Hamid Hatami-Hanza

prev … 5 6 7 8 9 10 next