Latent Semantic Index Or Analysis (lsi Or Lsa) Patents (Class 707/739)
  • Patent number: 8019765
    Abstract: To determine files associated with one or more workflows, a trace of accesses of files in at least one server is received. The files are grouped into at least one set of files, where the files in the set are accessed together more than a predetermined number of times in the trace. Files associated with the particular workflow are identified based on the at least one set.
    Type: Grant
    Filed: October 29, 2008
    Date of Patent: September 13, 2011
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Anna Povzner, Kimberly Keeton, Marcos K. Aguilera, Arif A. Merchant, Charles B. Morrey, III, Mustafa Uysal
  • Publication number: 20110219003
    Abstract: A method for retrieving information from a document includes a process of grouping paragraphs in the document to form passages, and forming indexes relating to a number of words in the passages. The number of paragraphs in a passage is determined based on the number of paragraphs considered optimum for a writer to cover a particular topic. Passages are formed by merging each N consecutive paragraphs in the document, where N is an integer greater than 1. Thus, individual passages may include paragraphs that are identical to other passages.
    Type: Application
    Filed: May 16, 2011
    Publication date: September 8, 2011
    Inventor: Jiandong BI
  • Publication number: 20110211677
    Abstract: An online address book system having sufficient hardware and software to operate an address book user interface and to perform intelligent interpretations of voice and text inputs from users. The system includes at least one server software module that includes software to perform a plurality of functions. These include the ability to receive voice input data and separate user voice queries, wherein the software can arrange the data so as to create a data base that includes at least three access dimensions, including contact access, contact-relationship access and contact-time frame access, and so as to create a connectivity matrix based on a plurality of contact pair relationships applying connective recognition logic. The system provides a voice operated user interface that permits access to address book stored data based on user input selected from the group consisting of contact, a contact-relationship pair, a contact-time frame pair, and combinations thereof.
    Type: Application
    Filed: May 13, 2011
    Publication date: September 1, 2011
    Inventor: CHARLES M. BASNER
  • Publication number: 20110208708
    Abstract: Systems and methods for finding related terms based on three different sources are disclosed. Generally, a first plurality of distances is determined based on one or more received terms and a first plurality of terms derived from an algorithmic search list. A second plurality of distances is determined based on the one or more received terms and a second plurality of terms derived from a sponsored search list. A third plurality of distances is determined based on the one or more received terms and a third plurality of terms derived from search logs. The first, second, and third pluralities of distances are combined to derive a fourth plurality of distances. Finally, a plurality of related terms related to the one or more received terms is generated based on the fourth plurality of distances.
    Type: Application
    Filed: February 25, 2010
    Publication date: August 25, 2011
    Applicant: Yahoo! Inc.
    Inventors: Weiguo Liu, Qiong Zhang
  • Patent number: 8005841
    Abstract: Methods, systems, and products are disclosed for classifying content segments. A set of annotations is received that occur within a segment of time-varying content. Each annotation is scored to each node in an ontology. The segment is classified based on at least one of the scores.
    Type: Grant
    Filed: April 28, 2006
    Date of Patent: August 23, 2011
    Assignee: Qurio Holdings, Inc.
    Inventors: Richard J. Walsh, Alfredo C. Issa
  • Publication number: 20110202535
    Abstract: A method of identifying a provenance of a document is provided. The method may include obtaining a query document that is included in a document set comprising a plurality of documents. The method may also include grouping the plurality of documents into a plurality of fine clusters based on a textual similarity between the plurality of documents. The method may also include identifying a target fine cluster within the plurality of fine clusters, the target fine cluster including the query document. The method may also include ordering the documents included in the target fine cluster based, at least in part, on metadata associated with each of the documents to identify a source document. The method may also include generating a query response that includes the source document.
    Type: Application
    Filed: February 13, 2010
    Publication date: August 18, 2011
    Inventors: Vinay Deolalikar, Hernan Laffitte
  • Patent number: 7996406
    Abstract: Method and apparatus for detecting web-based electronic mail in network traffic is described. In some examples, web pages are extracted from the network traffic. Fields in each page of a group of the web pages that share a documents structure are identified. A statistical analysis of the fields of each page in the group of web pages is performed to identify any electronic mail (e-mail) fields. The group of web pages is indicated to include web-based e-mail messages if the fields of each page in the group of web pages include at least one e-mail field.
    Type: Grant
    Filed: September 30, 2008
    Date of Patent: August 9, 2011
    Assignee: Symantec Corporation
    Inventors: Basant Rajan, Chirag Deepak Dalal, Navin Kabra
  • Publication number: 20110191344
    Abstract: An automatic organization into topics for a browsing history. In one embodiment, a system identifies groups of browsing actions as related, and clusters the browsing history (e.g. a web browsing history) into sessions based on heuristics used to determine relationships. Latent semantic analysis can be used to determine the relationships which can be considered topics. User interfaces for displaying or otherwise presenting these sessions can include icons representative of topics, and these icons can have different sizes depending on a frequency of web page visits within a topic. The topics can be displayed in time ranges or in a cover flow view or both time ranges and cover flow view.
    Type: Application
    Filed: February 3, 2010
    Publication date: August 4, 2011
    Inventors: Jing Jin, Kevin Decker, Timothy Hatcher, Raymond Sepulveda, Michael Thole
  • Publication number: 20110191345
    Abstract: An information processing apparatus (5) is provided comprising: a lexicon generation module (22) operable to process a set of documents (1) to identify key words (2) present in the documents; a link generation module (24) operable to generate network data (3) linking documents which share the same or semantically related key words identified by the lexicon generation module; and a network analysis module (26) operable to associate documents with metric values based upon the patterns of connectivity of the network data generated by the link generation module. The metric values associated with documents in the set can be utilized to select documents or groups of associated documents for further processing or indexing.
    Type: Application
    Filed: January 28, 2011
    Publication date: August 4, 2011
    Applicant: E-THERAPEUTICS PLC
    Inventor: Malcolm P. Young
  • Publication number: 20110184926
    Abstract: An expert list recommendation system is provided, including: a domain modeler for establishing an expert knowledge database according to a plurality of expert publications in different domains, receiving an inquired proposal, determining the academic field of the inquired proposal according to keywords of the inquired proposal and keyword sets of the expert publications in different domains stored in the expert knowledge database, and outputting a first domain expert list corresponding to the inquired proposal, wherein the first domain expert list comprises a first group of expert publications and a first group of expert names; and an expertise matcher for receiving the first domain expert list, comparing semantic relatedness between keywords of the inquired proposal and keywords corresponding to the first group of the expert publications of the first domain expert list to output a first expert list to a display device.
    Type: Application
    Filed: June 25, 2010
    Publication date: July 28, 2011
    Applicant: NATIONAL TAIWAN UNIVERSITY OF SCIENCE & TECHNOLOGY
    Inventors: Hahn-Ming LEE, Jan-Ming HO, Jerome YEH, Kai-Hsiang YANG, Tai-Liang KUO, Chun-Han CHEN
  • Patent number: 7987188
    Abstract: A domain-specific sentiment classifier that can be used to score the polarity and magnitude of sentiment expressed by domain-specific documents is created. A domain-independent sentiment lexicon is established and a classifier uses the lexicon to score sentiment of domain-specific documents. Sets of high-sentiment documents having positive and negative polarities are identified. The n-grams within the high-sentiment documents are filtered to remove extremely common n-grams. The filtered n-grams are saved as a domain-specific sentiment lexicon and are used as features in a model. The model is trained using a set of training documents which may be manually or automatically labeled as to their overall sentiment to produce sentiment scores for the n-grams in the domain-specific sentiment lexicon. This lexicon is used by the domain-specific sentiment classifier.
    Type: Grant
    Filed: August 23, 2007
    Date of Patent: July 26, 2011
    Assignee: Google Inc.
    Inventors: Tyler J. Neylon, Kerry L. Hannan, Ryan T. McDonald, Michael Wells, Jeffrey C. Reynar
  • Publication number: 20110179036
    Abstract: Systems and methods are provided for creating abstracted, normalized, and reuseable and combinable representations of information contained in multiple documents and information of any supported format, and allowing for exporting of information in any other desired and supported format. Further the system and methods provide for uploading documents based on a known template, where the data members can be automatically recognized and the document stored in normalized format without end-user or developer intervention. Normalization of data is achieved transparently on upload and denormalization performed transparently on download. Further, embodiments provide for the reuse and recombination of data members to create entirely new representations.
    Type: Application
    Filed: December 16, 2010
    Publication date: July 21, 2011
    Inventors: Jason Townes French, Auston John Stewart
  • Publication number: 20110179035
    Abstract: A visualization-based interactive legal research tool that generates from a multi-dimensional citation network a semantics-constrained citation sub-network that focuses on one individual issue in which a user is interested, and puts the sub-network on an interactive user interface (“UT”), which allows the researcher to browse, navigate, and jump over to start new sub-networks on different issues that are relevant to original issues.
    Type: Application
    Filed: June 1, 2010
    Publication date: July 21, 2011
    Applicant: LEXISNEXIS, A DIVISION OF REED ELSEVIER INC.
    Inventors: Paul Zhang, Lavanya Koppaka
  • Patent number: 7984041
    Abstract: Methods and apparatus provide for a local search indexer to allow for an optimized search within a web server that returns accurate search results while maintaining independent control as to defining search patterns, search prioritization, and updated content available for search. Specifically, the local search indexer organizes content according to a hierarchical directory structure at a web server. The hierarchical directory structure includes at least one directory level that provides at least one directory for storing the content. The local search indexer builds a search index associated with the directory and stores the search index at the web server. The search index is populated with indexed content based on an update of the content stored in the directory. The local search indexer employs a search engine, at the web server, to process search queries against the indexed content to provide a search result that includes the update of the content.
    Type: Grant
    Filed: July 9, 2007
    Date of Patent: July 19, 2011
    Assignee: Oracle America, Inc.
    Inventor: Yogesh Y Patil
  • Publication number: 20110173201
    Abstract: This invention relates to a method and an apparatus for determining a reliability indicator for at least one set of signatures obtained from clinical data collected from a group of samples. The signatures are obtained by detecting characteristics in the clinical data from the group of sample sand each of the signatures generate a first set of stratification values that stratify the group of samples. At least one additional and parallel stratification source to the signatures obtained from group of sample sis provided, the at least one additional and parallel stratification source to the signatures being independent from the signatures and generates a second set of stratification values. A comparison is done for each respective sample, where the first stratification values are compared with a true reference stratification values, and where the second stratification values are compared with the true reference stratification values.
    Type: Application
    Filed: September 24, 2009
    Publication date: July 14, 2011
    Applicant: KONINKLIJKE PHILIPS ELECTRONICS N.V.
    Inventors: Angel Janevski, Nilanjana Banerjee, Yasser Alsafadi, Vinay Varadan
  • Publication number: 20110173200
    Abstract: An apparatus for authoring data in a communication system includes: an extraction unit configured to receive media corresponding to contents and extract contents information regarding the contents from the received media; a generation unit configured to generate a DMB ECG XML-based metadata comprising the extracted contents information; and a processing unit configured to visualize particulars of the DMB ECG XML-based metadata through a user interface and process the user interface so that the DMB ECG XML-based metadata is generated and edited on a template.
    Type: Application
    Filed: November 12, 2010
    Publication date: July 14, 2011
    Applicant: Electronics and Telecommunications Research Institute
    Inventors: Seung-Jun YANG, Min-Sik Park, Han-Kyu Lee, Jin-Woo Hong
  • Publication number: 20110167053
    Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.
    Type: Application
    Filed: March 15, 2011
    Publication date: July 7, 2011
    Applicant: Microsoft Corporation
    Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
  • Patent number: 7971150
    Abstract: A document categorization system, including a clusterer for generating clusters of related electronic documents based on features extracted from the documents, and a filter module for generating a filter on the basis of the clusters to categorize further documents received by the system. The system may include an editor for manually browsing and modifying the clusters. The categorization of the documents is based on n-grams, which are used to determine significant features of the documents. The system includes a trend analyzer for determining trends of changing document categories over time, and for identifying novel clusters. The system may be implemented as a plug-in module for a spreadsheet application for permitting one-off or ongoing analysis of text entries in a worksheet.
    Type: Grant
    Filed: September 25, 2001
    Date of Patent: June 28, 2011
    Assignee: Telstra New Wave Pty Ltd.
    Inventors: Bhavani Raskutti, Adam Kowalczyk
  • Patent number: 7970767
    Abstract: One or more classification algorithms are applied to at least one natural language document in order to extract both attributes and values of a given product. Supervised classification algorithms, semi-supervised classification algorithms, unsupervised classification algorithms or combinations of such classification algorithms may be employed for this purpose. The at least one natural language document may be obtained via a public communication network. Two or more attributes (or two or more values) thus identified may be merged to form one or more attribute phrases or value phrases. Once attributes and values have been extracted in this manner, association or linking operations may be performed to establish attribute-value pairs that are descriptive of the product. In a presently preferred embodiment, an (unsupervised) algorithm is used to generate seed attributes and values which can then support a supervised or semi-supervised classification algorithm.
    Type: Grant
    Filed: April 30, 2007
    Date of Patent: June 28, 2011
    Assignee: Accenture Global Services Limited
    Inventors: Katharina Probst, Rayid Ghani, Andrew E. Fano, Marko Krema, Yan Liu
  • Publication number: 20110119272
    Abstract: Determining a semantic relationship is disclosed. Source content is received. Cluster analysis is performed at least in part by using at least a portion of the source content. At least a portion of a result of the cluster analysis is used to determine the semantic relationship between two or more content elements comprising the source content.
    Type: Application
    Filed: January 25, 2011
    Publication date: May 19, 2011
    Applicant: APPLE INC.
    Inventors: Philip Andrew Mansfield, Michael Robert Levy, Yuri Khramov, Darryl Will Fuller
  • Patent number: 7945864
    Abstract: An operation assisting apparatus includes: an option-function distance storage unit that stores a semantic distance between each of the options displayed on a menu screen and each of functions positioned at an end in the hierarchical structure; an operation history storage unit that stores the operation history of the options sequentially selected by the user; an estimation unit that estimates, based on a semantic distance between a selection option selected by the user and each of the functions, and a semantic distance between an unselected selection option that has been selectable but not selected and each of the functions, a degree of probability that the function is the function desired by the user; and an operational assistance determination unit that determines, based on the result of the estimation, a detail of an output such that functions with higher probability will be presented with higher precedence in selectability.
    Type: Grant
    Filed: October 29, 2007
    Date of Patent: May 17, 2011
    Assignee: Panasonic Corporation
    Inventors: Tsuyoshi Inoue, Makoto Nishizaki, Satoshi Matsuura
  • Patent number: 7945564
    Abstract: A computing system and method receive a query; separate a plurality of information sources into individual elements of content (EOC); tag each EOC with metadata that indicate source, date, and other relevant information; pattern match each EOC; calculate the respective distance function from every EOC to every other EOC; and output EOC to a set of virtual buffers (404) containing appropriately related EOC less than a given distance value. The method further creates virtual summary buffers (406); then concatenates the EOC in each virtual buffer (404); applies a comparative analysis filter (318) to remove redundant sub-elements; and presents the results as summary digests (408).
    Type: Grant
    Filed: August 14, 2008
    Date of Patent: May 17, 2011
    Assignee: International Business Machines Corporation
    Inventors: Amon Amir, Gal Ashour, Brian K. Blanchard, Matthew Denesuk, Reiner Kraft
  • Patent number: 7945555
    Abstract: The present invention provides method and system for categorizing a content published on Internet. The method comprising gathering one or more feeds associated with the content. The method further comprises extracting contextual information from the one or more feeds. Thereafter, the content is categorized into one or more general web-based categories belonging to a set of general web-based categories. The categorizing step further comprises performing a semantic analysis of the contextual information that yields a keyword string. The content is classified into the one or more general web-based category based on the keyword string. Finally, the set of general web-based categories is translated to a set of pre-defined categories, such that one or more general web-based category is translated to a pre-defined category that is relevant to an end user.
    Type: Grant
    Filed: December 27, 2007
    Date of Patent: May 17, 2011
    Assignee: Yume, Inc.
    Inventors: Ayyappan Sankaran, Jayant Kadambi, Matthew D Shaver
  • Patent number: 7941418
    Abstract: A computer-implemented method of generating a dynamic corpus includes generating web threads, based upon corresponding sets of words dequeued from a word queue, to obtain web thread resulting URLs. The web thread resulting URLs are enqueued in a URL queue. Multiple text extraction threads are generated, based upon documents downloaded using URLs dequeued from the URL queue, to obtain text files. New words are randomly obtained from the text files, and the randomly obtained words from the text files are enqueued in the word queue. This process is iteratively performed, resulting in a dynamic corpus.
    Type: Grant
    Filed: November 9, 2005
    Date of Patent: May 10, 2011
    Assignee: Microsoft Corporation
    Inventor: Carlos Alejandro Arguelles
  • Publication number: 20110106807
    Abstract: Described within are systems and methods for disambiguating entities, by generating entity profiles and extracting information from multiple documents to generate a set of entity profiles, determining equivalence within the set of entity profiles using similarity matching algorithms, and integrating the information in the correlated entity profiles. Additionally, described within are systems and methods for representing entities in a document in a Resource Description Framework and leveraging the features to determine the similarity between a plurality of entities. An entity may include a person, place, location, or other entity type.
    Type: Application
    Filed: November 1, 2010
    Publication date: May 5, 2011
    Applicant: JANYA, INC
    Inventors: Rohini K. Srihari, Harish Srinivasan, Richard Smith, John Chen
  • Publication number: 20110093343
    Abstract: Methods and systems are given for representing and generating contents from pre-existed and pre-built contents for a given content. Methods are given for transforming information representation from one medium, type, or language to another medium, type and language. Exemplary embodiment is given for transforming the semantics of a given text or spoken language to a visual representation or combination of them. The systems and methods generate new contents in general and multimedia contents in particular in response to or for representing an input composition utilizing pre-existed and pre-built contents of various types, languages, and forms. The associated client server systems over the communication network are also given for generating contents for the contents given by the clients.
    Type: Application
    Filed: October 20, 2010
    Publication date: April 21, 2011
    Inventor: Hamid Hatami-Hanza
  • Publication number: 20110082863
    Abstract: A method, apparatus and computer program product provides for a semantic analyzer to produce and rank semantic terms to reflect their relationship to the theme and topics of a document. The text and the document can have no relationship to any pre-selected keywords before the semantic analyzer performs text extraction. The semantic analyzer extracts text from a document and performs semantic analysis on the extracted text. The semantic analyzer provides a plurality of ranked semantic terms as a result of the semantic analysis and associates semantic terms with the document as semantic keywords. The semantic terms define content to be presented with the document where the content is an advertisement, a link to a remote information resource or a second document.
    Type: Application
    Filed: December 15, 2010
    Publication date: April 7, 2011
    Applicant: ADOBE SYSTEMS INCORPORATED
    Inventors: WALTER CHANG, NADIA GHAMRAWI
  • Publication number: 20110078146
    Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.
    Type: Application
    Filed: September 20, 2010
    Publication date: March 31, 2011
    Applicant: CommVault Systems, Inc.
    Inventors: Anand Prahlad, Jeremy Alan Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
  • Publication number: 20110077048
    Abstract: The invention relates to a system for data correlation, having: a receiving device 1 having an image acquisition element 10 and a data set generator 12 for generating at least one object data set from at least one acquired first image, which represents a physical object, and an identification label, which uniquely determines an object-related acquisition procedure, and at least one information data set from at least one acquired second image, which represents coded information related to the physical object, and the identification label; a correlation device 2 for the extraction 20 of the coded information from the information data set, for the semantic analysis 22 of the extracted information, and for the generation of at least one combination data sets ? from the results of the semantic analysis, the extracted information, and the at least one object data set with the same identification label as the extracted information data set; and a user device 3 for the storage and further use of the combination data
    Type: Application
    Filed: March 3, 2009
    Publication date: March 31, 2011
    Applicant: Linguatec Sprachtechnologien GmbH
    Inventor: Reinhard Busch
  • Patent number: 7917514
    Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.
    Type: Grant
    Filed: June 28, 2006
    Date of Patent: March 29, 2011
    Assignee: Microsoft Corporation
    Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
  • Publication number: 20110072020
    Abstract: A method for reverse geocoding location information obtained by a wireless communications device comprises determining the location information for a location, communicating the location information to a reverse geocoding server that reverse-geocodes the location information to generate location description data for a bounding region that geographically surrounds the location, receiving the location description data from the reverse geocoding server for the bounding region containing the location, and caching the location description data for the bounding region in a memory cache on the device. When the current location remains within one or more bounding regions cached on the device, location description data is fetched from the cache, thus improving application responsiveness. Only when the current location is no longer within the bounding region(s) does the device communicate a new request to the reverse geocoding server.
    Type: Application
    Filed: September 18, 2009
    Publication date: March 24, 2011
    Applicant: RESEARCH IN MOTION LIMITED
    Inventors: Ngoc Bich Ngo, Russell Norman Owen
  • Publication number: 20110072021
    Abstract: In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.
    Type: Application
    Filed: September 21, 2009
    Publication date: March 24, 2011
    Applicant: YAHOO! INC.
    Inventors: Yumao Lu, Lei Duan, Fan Li, Benoit Dumoulin, Xing Wei
  • Patent number: 7912816
    Abstract: In one embodiment, input is received from a user defining a classification and an analytic for the classification. Multiple classifications and analytics may be defined by a user. A definition of relevance parameters is determined that characterize the classification and a set of analytics measures associated with the analytic. The definition may be for the classification. Unstructured data and structured data are analyzed based on the definition of the relevance parameters to determine relevant data in the unstructured data and the structured data. The relevant data being data that is determined to be relevant to the classification defined by the user. An index of the terms from the relevant data is determined. The index is useable by an analytics tool to provide results for queries of the unstructured data and structured data. The query may be used within the classification such that targeted results are provided using the index and the relevant data to the classification.
    Type: Grant
    Filed: April 18, 2008
    Date of Patent: March 22, 2011
    Assignee: Alumni Data Inc.
    Inventors: Aloke Guha, Joan Wrabetz
  • Publication number: 20110066618
    Abstract: Methods, apparatuses, and systems are provided to determine a response to a user submitted query based, at least in part, on a relationship between and/or among a plurality of terms of the query.
    Type: Application
    Filed: September 14, 2009
    Publication date: March 17, 2011
    Applicant: Yahoo! Inc.
    Inventors: Borkur Sigurbjornsson, Vanessa Murdock, Roelof van Zwol, Maarten Clements
  • Publication number: 20110066619
    Abstract: Architecture for enabling a user to automatically recover documents and other information associated with work contexts and recover documents and other information artifacts associated with a specific project. The architecture enables monitoring and recording of activity information related to user interactions with information artifacts pertaining to a particular work context. The user can select a document having a portion of work content (e.g., a term or other type of reference item in a document) related to the work context. A lexical analysis is performed on the activity information and the reference item to identify lexical similarities. A list of candidate items (e.g., related documents) is inferred from the information artifacts based on the lexical similarities. The candidate items related to the work context are presented to the user, who can select specific items to reestablish the work context.
    Type: Application
    Filed: September 16, 2009
    Publication date: March 17, 2011
    Applicant: Microsoft Corporation
    Inventors: George Perantatos, Kuldeep Karnawat, John S. Wana
  • Patent number: 7908253
    Abstract: Data indexing using polyarchical indexing codes and automatically generated expansion paths. For a piece of data, an indexing code is received relating to a particular categorization or other indexing parameter. Based upon the indexing code, one or more expansion sets of codes are retrieved and applied to the piece of data. The expansion sets of codes may include indexing codes that relate to hierarchical levels of indexing. The expansion sets of codes may also include different expansion paths through the hierarchical levels of indexing. The polyarchical codes may include multiple cross-categorization of the data across the same or different levels of categories. They may also include multiple expansion paths in different directions across hierarchical levels of categories or indexing.
    Type: Grant
    Filed: August 7, 2008
    Date of Patent: March 15, 2011
    Assignee: Factiva, Inc.
    Inventors: Jonathan Guy Grenside Cooke, Andrew Richard Young
  • Patent number: 7908171
    Abstract: The present invention provides an information providing system including an information registration unit capable of registering a front keyword for use in relation to content or content information to be provided a user terminal and back keywords set in relation to the front keyword can be registered, an advertisement registration unit capable of registering advertisement information for use in relation to the back keyword and an information providing unit capable of providing the advertisement information to the user terminal. The advertisement registration unit is capable of selecting specific advertisement information through an auction transaction. The information providing unit is capable of displaying keyword buttons enabling keyword selection in a display screen at the user terminal.
    Type: Grant
    Filed: November 13, 2007
    Date of Patent: March 15, 2011
    Assignees: Sony Corporation, Plat-Ease Corporation
    Inventors: Kazuhiro Fukuda, Tetsuo Maruyama, Tetsu Sumita
  • Patent number: 7904457
    Abstract: Improved techniques for flow analysis in messaging systems are disclosed. For example, a method for finding correlations between messages of a system based on content includes the following steps. For one or more executions of the system, obtaining the messages of the system, wherein each message has a schema associated therewith. The messages are categorized into groups, wherein each group has a common schema. Pairs of messages from disparate groups are found wherein, for the messages of a pair, there is a feature in common in their contents.
    Type: Grant
    Filed: May 30, 2007
    Date of Patent: March 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Wim De Pauw, Robert L. Hoch, Yi Huang
  • Patent number: 7904458
    Abstract: The present invention relates to a method and apparatus for optimizing queries. The present invention discloses an efficient method for providing answers to queries under parametric aggregation constraints.
    Type: Grant
    Filed: December 26, 2009
    Date of Patent: March 8, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Nikolaos Koudas, Divesh Srivastava, Sudipto Guha, Dimitrios Gunopulos, Michail Vlachos
  • Publication number: 20110047148
    Abstract: A semantically integrated knowledge retrieval, management, delivery and presentation system.
    Type: Application
    Filed: March 26, 2010
    Publication date: February 24, 2011
    Inventor: Nosa Omoigui
  • Patent number: 7885973
    Abstract: A computer method and system generates inquires. The method and system provide a plurality of templates. Each template outlines a respective inquiry and is associated with one or more semantic types or contexts. Each template has one or more parameters for defining a query instance of the respective inquiry. User input selects a template from the plurality and specifies values for the parameters of the user selected template. Using the user selected template and the user-specified parameter values, an instance of a query is produced. Each template, is associated with semantic types during template construction. The semantic types may be based on classes in an ontology. Template construction may include templatizing prior existing or other queries to create respective templates. In application or use of a template, query generation may be during modeling of a certain domain, and the produced query is for information about the certain domain.
    Type: Grant
    Filed: February 22, 2008
    Date of Patent: February 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Nishanth R. Sastry, Steven I. Ross, Daniel M. Gruen, Susanne C. Hupfer
  • Publication number: 20110022598
    Abstract: The disclosed embodiments of computer systems and techniques utilize an ensemble semantics framework to combine knowledge acquisition systems that yield significantly higher quality resources than each system in isolation. Gains in entity extraction are achieved by combining state-of-the-art distributional and pattern-based systems with a large set of features from, for example, a webcrawl, query logs, and wisdom of the crowd sources. This results in improved query interpretation and greater relevancy in providing search results and advertising, for example.
    Type: Application
    Filed: July 24, 2009
    Publication date: January 27, 2011
    Applicant: YAHOO! INC.
    Inventors: Marco Pennacchiotti, Patrick Pantel
  • Patent number: 7877349
    Abstract: Methods and apparatus to evaluate the semantic proximity between reference free-form text entry and a candidate free-form text request.
    Type: Grant
    Filed: April 1, 2010
    Date of Patent: January 25, 2011
    Assignee: Microsoft Corporation
    Inventors: Francois Huet, Gray Salmon Norton
  • Patent number: 7877388
    Abstract: A method (and system) for clustering a plurality of items. Each of the items includes information. The method includes inputting a plurality of items. Each of the items includes information. The items are provided into a clustering process. The method also inputs an initial organization structure into the clustering process. The initial organization structure includes one or more categories, at least one of the categories being associated with one of the items. The method processes the plurality of items based upon at least the initial organization structure and the information in each of the items; and determines a resulting organization structure based upon the processing. The resulting organization structure relates to the initial organization structure.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: January 25, 2011
    Assignee: Stratify, Inc.
    Inventors: John O. Lamping, Ramana Venkata, Shashidhar Thakur, Samdeer Siruguri
  • Patent number: 7873640
    Abstract: A method, apparatus and computer program product provides for a semantic analyzer to produce and rank semantic terms to reflect their relationship to the theme and topics of a document. The text and the document can have no relationship to any pre-selected keywords before the semantic analyzer performs text extraction. The semantic analyzer extracts text from a document and performs semantic analysis on the extracted text. The semantic analyzer provides a plurality of ranked semantic terms as a result of the semantic analysis and associates semantic terms with the document as semantic keywords. The semantic terms define content to be presented with the document where the content is an advertisement, a link to a remote information resource or a second document.
    Type: Grant
    Filed: March 27, 2007
    Date of Patent: January 18, 2011
    Assignee: Adobe Systems Incorporated
    Inventors: Walter Chang, Nadia Ghamrawi
  • Patent number: 7870124
    Abstract: Techniques for processing reference-based SQL/XML operators are provided. Instead of extracting copies of one or more nodes from XML data, a reference-based operator returns a reference to a node. Such a reference is used to determine, for example, whether the corresponding node comes logical before, after, or is the same as another node. An SQL/XML query that includes a reference-based operator may be the original query, or may be generated (e.g., rewritten) from a non-SQL/XML query, such as an XQuery query. One or more physical rewrites may be performed on the SQL/XML query, depending on how the XML data is stored and/or whether an XML index exists for the XML data.
    Type: Grant
    Filed: December 13, 2007
    Date of Patent: January 11, 2011
    Assignee: Oracle International Corporation
    Inventors: Zhen Hua Liu, Hui Joe Chang, James W. Warner
  • Publication number: 20110004595
    Abstract: According to embodiments, a diagnostic report search supporting apparatus and a diagnostic report searching apparatus each have a report registering part, a structuring processing part, a related-term analyzing part, a counting part, and a keyword extracting part. The structuring processing part extracts terms from a sentence written in a diagnostic report, and classifies the terms into predetermined kinds. The related-term analyzing part generates combinations each composed of two or more terms based on the plurality of terms having been extracted. The counting part counts the existence number of same combinations in the plurality of combinations, and extracts combinations whose existence numbers are a predetermined number or more. The keyword extracting part extracts a combination including a desired keyword, and extracts a term other than the desired keyword as a related keyword.
    Type: Application
    Filed: June 23, 2010
    Publication date: January 6, 2011
    Applicants: Kabushiki Kaisha Toshiba, TOSHIBA MEDICAL SYSTEMS CORPORATION
    Inventors: Hiromasa YAMAGISHI, Hikaru Futami, Kenichi Niwa
  • Patent number: 7860867
    Abstract: An information managing system includes a parameter setting unit for setting a parameter representative of an attribute of a user and information to be retrieved, and an information relevance space generator for generating an information relevance space representative of information indicating a relevance between the user and the information to be retrieved, based on the parameter set by the parameter setting unit.
    Type: Grant
    Filed: December 20, 2006
    Date of Patent: December 28, 2010
    Assignee: NEC Corporation
    Inventors: Masaki Kan, Junichi Yamato, Yuji Kaneko, Yoshihiro Kajiki
  • Publication number: 20100312767
    Abstract: Provided is an information process apparatus including: an extraction unit which is configured to extract words in a predetermined word class from comments which predetermined users write about a predetermined item; a grouping unit which is configured to group the predetermined users by performing a multivariate analysis using the words extracted by the extraction unit; a storage unit which is configured to store the groups, the predetermined item, and the words in association with each other; a determination unit which is configured to determine which group a user who is to write a comment belongs to when the user is to write the comment about the predetermined item; and a reading unit which is configured to read from the storage unit words which are associated with the group determined by the determination unit and the predetermined item which the comment is to be written about.
    Type: Application
    Filed: May 14, 2010
    Publication date: December 9, 2010
    Inventor: Mari SAITO
  • Publication number: 20100293166
    Abstract: The present invention discloses methods, systems, and tools for unified semantic ranking of compositions of ontological subjects. The method breaks a composition to a plurality of partitions as well as its constituent ontological subjects of different orders and builds a participation matrix indicating the participation of ontological subjects of the composition in other ontological subjects, i.e. the partitions, of the composition. Using the participation information of the OSs into each other a similarity matrix is built from which the semantic importance ranks of the partitions of the composition are calculated. The method systematically enables the calculation the semantic ranks of ontological subjects of different orders of the composition. Various systems for implementing the method and numerous applications and services are disclosed.
    Type: Application
    Filed: April 7, 2010
    Publication date: November 18, 2010
    Applicant: Hamid Hatami-Hanza
    Inventor: Hamid Hatami-Hanza