Based On Term Frequency Of Appearance Patents (Class 707/750)

Method for monitoring abnormal state of internet information

Patent number: 8185537

Abstract: The present application discloses a method for monitoring abnormal state of Internet information. The method includes obtaining frequency data for current date common words appearing on the current date web pages, combining with a hot words dictionary that Internet users focuses on to determine a list of current date keywords related to the Internet information, determining a weight of each current date keyword, determining an abnormal threshold of the current date keywords, and detecting an abnormal level of the current date keywords to determine current date hot Internet information. The disclose method further calculates an abnormal level of keywords by monitoring the change in the hot words frequency in the Internet information, and generates warning for the abnormal level of hot words frequency change, which allows the Internet users to respond at the first moment.

Type: Grant

Filed: April 24, 2008

Date of Patent: May 22, 2012

Assignee: Peking University

Inventors: Xun Liang, Hua Chen, Jian Yang
Dynamic keyword suggestion and image-search re-ranking

Patent number: 8185526

Abstract: A content-based re-ranking (CBR) process may be performed on query results based on a selected keyword that is extracted from previous query results, and thereby increase a relevancy of search results. A search engine may perform the CBR process using a target image that is selected from a plurality of image search results, the CBR to identify re-ranked image search results. Keywords may be extracted from the re-ranked image search results. A portion of the keywords may be outputted as suggested keywords and made selectable by a user. Finally, a refined CBR process may be performed based on the target image and a received selection a suggested keyword, the refined CBR to output the refined image search results.

Type: Grant

Filed: January 21, 2010

Date of Patent: May 22, 2012

Assignee: Microsoft Corporation

Inventors: Fang Wen, Jian Sun
Electronic data retrieving apparatus

Patent number: 8180772

Abstract: An electronic data retrieving apparatus is provided that increases the retrieval accuracy without deteriorating the retrieval efficiency by reflecting differences between the numbers of word appearances due to genres of electronic data in the setting of the retrieval words. The electronic data retrieving apparatus according to the present invention sets the retrieval words of the electronic data not only as a word appearing on a retrieval word setting table of the recorded electronic data for a predetermined number of times (e.g., three times) or more but also a word appearing on the retrieval word setting table and appearing on a retrieval word setting reference table for a predetermined number of times (e.g., three times) or more.

Type: Grant

Filed: February 13, 2009

Date of Patent: May 15, 2012

Assignee: Sharp Kabushiki Kaisha

Inventors: Hiroshi Murakami, Yoshio Nishimoto
System and method for using external references to validate a data object's classification / consolidation

Patent number: 8180779

Abstract: A computer system and method for validating data object classification and consolidation using external references. The external references may be web pages, product catalogs, external databases, URLs, search results provided by a search engine or subsets or combinations of any of these to validate a classification or consolidation of records. Embodiments validate a data object classification or consolidation decision by searching external data sources, such as databases, the Internet etc. for references to the transactional data object and determining a confidence level based on the original data object and the unstructured information reference, URL, or search result for example. Decisions may be verified or denied based on the comparison of the external references related to each data object. Embodiments of the invention save substantial labor in validating business data objects and make data more reliable across enterprise systems.

Type: Grant

Filed: December 30, 2005

Date of Patent: May 15, 2012

Assignee: SAP AG

Inventors: Yoram Horowitz, Avi Malamud
System and method for automatically publishing data items associated with an event

Patent number: 8176032

Abstract: Systems and methods are disclosed to automatically publish data items associated with a news event. In one example embodiment, a method comprises monitoring search queries associated with a search query category, detecting a change in a search request frequency associated with the search query category with respect to a baseline frequency, determining an event associated with the search query category, identifying one or more data items associated with the event, and generating a visual representation of a relationship between the one or more data items and the event. The search query category may be associated with at least one search term a baseline frequency.

Type: Grant

Filed: October 22, 2009

Date of Patent: May 8, 2012

Assignee: eBay Inc.

Inventors: Dan Shen, Xiaodi Zhang, Qiang Wang, Helen Hang Ye, Jin Yu Lou
AUGMENTING QUERIES WITH SYNONYMS FROM SYNONYMS MAP

Publication number: 20120109978

Abstract: Methods, systems, and apparatus, including computer program products, operable to perform operations including receiving through a user interface with an interface language a search query having query terms; using the interface language to select one or more mappings and using the selected mappings to simplify each query term; and applying each simplified query term to a synonyms map to identify possible synonyms with which to augment the search query. In alternative embodiments, the operations include generating a synonyms map from a corpus of documents; where the synonyms map maps each of multiple keys to one or more corresponding variants, where each variant is associated with one or more of document languages. In alternative embodiments, the operations include generating a synonyms map from documents by applying document language-dependent mappings to words in the documents to generate keys for the map.

Type: Application

Filed: January 12, 2012

Publication date: May 3, 2012

Applicant: GOOGLE INC.

Inventor: Ruchira S. Datta
METHOD FOR ASSISTING IN MAKING A DECISION ON BIOMETRIC DATA

Publication number: 20120109976

Abstract: The present invention relates to a method for assisting a user in making a decision to compare biometric data of an individual with data from a database relating to a large number of individuals, and biometric data is acquired for an individual concerned, that this data is encoded, that the data items are compared in pairs with corresponding data from the database, that, for each comparison score the duplicate occurrence frequency/non-duplicate occurrence frequency ration is established, that the product of all the available ratios is calculated, that this product is standardized, that the standardized ratio is compared to a pre-set threshold, that the values greater than the pre-set threshold are kept and that this result is submitted to the user for him to validate it as appropriate.

Type: Application

Filed: November 2, 2006

Publication date: May 3, 2012

Applicant: THALES

Inventor: Jean Beaudet
KEYWORD DETERMINATION BASED ON A WEIGHT OF MEANINGFULNESS

Publication number: 20120109977

Abstract: Example embodiments relate to keyword determination based on a weight of meaningfulness. In example embodiments, a computing device may determine a number of occurrences of a word in a particular document and may then determine a weight of meaningfulness for the word based on the number of occurrences. The computing device may then add the word to a set of keywords for the document based on the weight of meaningfulness.

Type: Application

Filed: November 2, 2010

Publication date: May 3, 2012

Inventors: Helen Balinsky, Alexander Balinsky, Steven J. Simske
Method and vector analysis for a document

Patent number: 8171026

Abstract: The invention provides a document representation method and a document analysis method including extraction of important sentences from a given document and/or determination of similarity between two documents. The inventive method detects terms that occur in the input document, segments the input document into document segments, each segment being an appropriately sized chunk and generates document segment vectors, each vector including as its element values according to occurrence frequencies of the terms occurring in the document segments. The method further calculates eigenvalues and eigenvectors of a square sum matrix in which a rank of the respective document segment vector is represented by R and selects from the eigenvectors a plural (L) of eigenvectors to be used for determining the importance.

Type: Grant

Filed: April 16, 2009

Date of Patent: May 1, 2012

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Takahiko Kawatani
Methods for improving the diversity of image search results

Patent number: 8171043

Abstract: Techniques are described to increase the diversity or focus of image search results. A user submits an original query to search for images. A server generates a first results set by executing the original query using metadata associated with each image. The server selects, from the first results set, a specified number of results ranked highest and generates a list of terms from the metadata of each of the results selected. The terms may be only the tags of the results. The server generates an updated query using terms in the list that may be weighted based on the frequency of the term in the list or include only a specified number of the highest occurring terms in the list. The server generates a second results set by executing the updated query using metadata associated with each image. The second results set is then stored and displayed to the user.

Type: Grant

Filed: October 24, 2008

Date of Patent: May 1, 2012

Assignee: Yahoo! Inc.

Inventors: Vanessa Murdock, Roelof Van Zwol, Lluis Garcia Pueyo, Georgina Ramirez Camps
Index optimization for ranking using a linear model

Patent number: 8171031

Abstract: Technologies are described herein for providing a more efficient approach to ranking search results. An illustrative technology reduces an amount of ranking data analyzed at query time. In the technology, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a top document list or a bottom document list based on the rank. Predefined values of at least part of the rank are stored in the top document list for documents in the top document list and are not stored in the bottom document list for documents in the bottom document list.

Type: Grant

Filed: January 19, 2010

Date of Patent: May 1, 2012

Assignee: Microsoft Corporation

Inventors: Vladimir Tankovich, Dmitriy Meyerzon, Mihai Petriuc
Temporally-aware evaluative score

Patent number: 8166050

Abstract: A method includes processing a performance query to a dimensional data model by processing dimension coordinates that exist within the dimensional data model, wherein the dimension coordinates have a first particular grain (“finer grain”) that is finer than a second particular grain (“coarser grain”), the method to determine an evaluative score for a particular finer grain value based on performance facts for dimension coordinates associated with the particular finer grain value. Performance parameters are determined relative to a particular coarser grain value, against which to measure the performance facts associated with the finer grain value, including processing the temporal relationships of finer grain values to coarser grain values for the dimension coordinates. The evaluative score is determined for the particular finer grain value based on performance facts of dimension coordinates having the particular finer grain value, in view of the determined performance parameters.

Type: Grant

Filed: February 8, 2011

Date of Patent: April 24, 2012

Assignee: Merced Systems, Inc

Inventor: Todd O. Dampier
Computation of term dominance in text documents

Patent number: 8166051

Abstract: An improved entropy-based term dominance metric useful for characterizing a corpus of text documents, and is useful for comparing the term dominance metrics of a first corpus of documents to a second corpus having a different number of documents.

Type: Grant

Filed: February 3, 2009

Date of Patent: April 24, 2012

Assignee: Sandia Corporation

Inventors: Travis L. Bauer, Zachary O. Benz, Stephen J. Verzi
Techniques for computing similarity measurements between segments representative of documents

Patent number: 8166049

Abstract: Keyword frequency data for a plurality of document-derived segments is represented in a matrix form in which each segment is represented as a vector of dimensionality equal to the number of keywords. The matrix may be subdivided into a plurality of sub-matrices, each preferably corresponding to a non-overlapping portion of the plurality of keywords. When determining a similarity measurement between any pair of segments, at least a portion of the keyword frequency data for each sub-matrix's non-overlapping keywords are used to determine a sub-matrix dot product for the pair of segments. The resulting plurality of sub-matrix dot products are then summed together in order to provide the similarity measurement. Keywords that are synonyms of each other may be accommodated through the modification of keyword frequency data. Where the keyword frequency data in the matrix representation is relative sparse, compressed views of the matrix representation may be provided.

Type: Grant

Filed: May 28, 2009

Date of Patent: April 24, 2012

Assignee: Accenture Global Services Limited

Inventors: Jagadeesh Chandra Bose Rantham Prabhakara, Ashwin Nayak, Anitha Chandran
Method and system for classifying postings in a forum

Patent number: 8165997

Abstract: A method for classifying a previously unclassified posting that includes extracting a plurality of terms from the previously unclassified posting on an application forum, calculating a term answer probability and a term comment probability for each term of the plurality of terms. The term answer probability defines a probability that the term is in an answer posting assigned to an answer class, and the term comment probability defines a probability that the term is in a comment posting assigned to a comment class. The method further includes performing a Bayesian analysis using the term answer probability and the term comment probability for each term of the plurality of terms to select a posting class for the previously unclassified posting. The posting class is either the answer class or the comment class. The posting class is assigned to the previously unclassified posting.

Type: Grant

Filed: July 27, 2009

Date of Patent: April 24, 2012

Assignee: Intuit Inc.

Inventors: Igor A. Podgorny, Howard Chen, Floyd J. Morgan, Amit Rohatgi
System and method for adaptive categorization for use with dynamic taxonomies

Patent number: 8161028

Abstract: A system, method and computer program product provides a solution to a class of categorization problems using a semi-supervised clustering approach, the method employing performing a Soft Seeded k-means algorithm, which makes effective use of the side information provided by seeds with a wide range of confidence levels, even when they do not provide complete coverage of the pre-defined categories. The semi-supervised clustering is achieved through the introductions of a seed re-assignment penalty measure and model selection measure.

Type: Grant

Filed: December 5, 2008

Date of Patent: April 17, 2012

Assignee: International Business Machines Corporation

Inventors: Jianying Hu, Aleksandra Mojsilovic, Moninder Singh
Document-based synonym generation

Patent number: 8161041

Abstract: One embodiment of the present invention provides a system that automatically generates synonyms for words from documents. During operation, this system determines co-occurrence frequencies for pairs of words in the documents. The system also determines closeness scores for pairs of words in the documents, wherein a closeness score indicates whether a pair of words are located so close to each other that the words are likely to occur in the same sentence or phrase. Finally, the system determines whether pairs of words are synonyms based on the determined co-occurrence frequencies and the determined closeness scores. While making this determination, the system can additionally consider correlations between words in a title or an anchor of a document and words in the document as well as word-form scores for pairs of words in the documents.

Type: Grant

Filed: February 10, 2011

Date of Patent: April 17, 2012

Assignee: Google Inc.

Inventors: Oleksandr Grushetskyy, Steven D. Baker
Contextual mobile content placement on a mobile communication facility

Patent number: 8156128

Abstract: In embodiments of the present invention improved capabilities are described for displaying mobile content in association with a website on a mobile communication facility based at least in part on receiving a website request from a mobile carrier gateway, receiving contextual information relating to the requested website, associating the received contextual information with a mobile content, and, finally, displaying the mobile content with the website on a mobile communication facility.

Type: Grant

Filed: June 12, 2009

Date of Patent: April 10, 2012

Assignee: Jumptap, Inc.

Inventors: Jorey Ramer, Dennis Doughty, Adam Soroca
Facilitating display of an interactive and dynamic cloud of terms related to one or more input terms

Patent number: 8150829

Abstract: According to certain embodiments, facilitating display of terms includes facilitating display of a graphical user interface. One or more first input terms entered into a user entry window of the graphical user interface are received. One or more first output terms related to the first input terms are determined. Display of a first graphical cloud comprising the first output terms is facilitated. The first input terms are modified to yield one or more second input terms. One or more second output terms related to the second input terms are determined. Display of a second graphical cloud comprising the second output terms is facilitated.

Type: Grant

Filed: April 7, 2009

Date of Patent: April 3, 2012

Assignee: Fujitsu Limited

Inventors: Yannis Labrou, Stergios Stergiou, David L. Marvit, Albert Reinhardt
Ranking authors and their content in the same framework

Patent number: 8150860

Abstract: One or more server devices may simultaneously calculate first ranking scores for a group of users and second ranking scores for a group of comments authored by the group of users. The calculating may occur during a same process. The one or more server devices may further provide one of a first ranked list that includes information identifying the group of users, the information identifying the group of users being ordered based on the first ranking scores, or a second group of comments of the group of comments, the comments in the second group of comments being ordered based on the second ranking scores.

Type: Grant

Filed: August 12, 2009

Date of Patent: April 3, 2012

Assignee: Google Inc.

Inventors: Michal Cierniak, Na Tang
Classifying text into hierarchical categories

Patent number: 8145636

Abstract: Systems, methods and program products for classifying text. A system classifies text into first subject matter categories. The system identifies one or more second subject matter categories in a collection of second subject matter categories, each of the second categories is a hierarchical classification of a collection of confirmed valid search results for queries, in which at least one query for each identified second category includes a term in the text. The system filters the identified categories by excluding identified categories whose ancestors are not among the first categories. The system selects categories from the filtered categories based on one or more thresholds in which a threshold specifies a degree of relatedness between a selected category and the text. The selected categories are a sufficient basis for recommending content to a user, the content being associated with one or more of the selected categories.

Type: Grant

Filed: March 13, 2009

Date of Patent: March 27, 2012

Assignee: Google Inc.

Inventors: Glen M. Jeh, Beverly Yang
System and method for determining a composite score for categorized search results

Patent number: 8145618

Abstract: A system and method for scoring documents is described. One or more documents are identified responsive to a search criteria. A text match score indicating a quality of match of the identified documents is determined. A category match score is determined over categories. A document-categories score is determined indicating a quality of match between an identified document and a plurality of categories. A search criteria-categories score is determined indicating a quality of match between the search criteria and the categories. An overall score is determined based on the text match score and the category match score.

Type: Grant

Filed: October 11, 2010

Date of Patent: March 27, 2012

Assignee: Google Inc.

Inventors: Karl Pfleger, Brian Larson
Method for selecting electronic advertisements using machine translation techniques

Patent number: 8145649

Abstract: A system for selecting electronic advertisements from an advertisement pool to match the surrounding content is disclosed. To select advertisements, the system takes an approach to content match that takes advantage of machine translation technologies. The system of the present invention implements this goal by means of simple and efficient machine translation features that are extracted from the surrounding context to match with the pool of potential advertisements. Machine translation features used as features for training a machine learning model. In one embodiment, a ranking SVM (Support Vector Machines) trained to identify advertisements relevant to a particular context. The trained machine learning model can then be used to rank advertisements for a particular context by supplying the machine learning model with the machine translation features measures for the advertisements and the surrounding context.

Type: Grant

Filed: December 16, 2010

Date of Patent: March 27, 2012

Assignee: Yahoo! Inc.

Inventors: Vanessa Murdock, Massimiliano Ciaramita, Vassilis Plachouras
INFORMATION RETRIEVAL METHOD, INFORMATION RETRIEVAL APPARATUS, AND COMPUTER PRODUCT

Publication number: 20120072434

Abstract: An information retrieval apparatus includes an acquiring unit that acquires a numerical value defining a boundary of a numerical range; a detecting unit that detects a number of places in and a head numeral of the numerical value; an extracting unit that extracts from a bit string group, a bit string indicating whether a numerical value in a numerical value group having the number of places and the head numeral is present in files subject to retrieval; a specifying unit that specifies a file corresponding to a bit in the extracted bit string, the bit indicating the presence of a numerical value of the numerical value group; a determining unit that determines whether a numerical value in the specified file meets the boundary condition; and a designating unit that, based on a determination by the determining unit designates the specified file to have a numerical value within the numerical range.

Type: Application

Filed: November 30, 2011

Publication date: March 22, 2012

Applicant: FUJITSU LIMITED

Inventors: Masahiro Kataoka, Hiroyuki Torii, Masahiro Kurishima, Hideo Kasai
Methods and apparatuses for searching content

Patent number: 8140511

Abstract: Embodiments of methods and apparatuses for searching contents, including structured search are described herein. Embodiments of the present invention use tree structures (or more generally, graph structures), layout structures, and/or content category information to capture within search results relevant content that would otherwise be missed, to reduce the incidence of false positives within search results, and to improve the accuracy of rankings within search results. Embodiments of the present invention further use tree structures (or more generally, graph structures), layout structures, and/or content category information to extend search results to include sub-document constituents. Embodiments of the present invention also support the use of distribution properties as criteria for ranking search results.

Type: Grant

Filed: April 10, 2009

Date of Patent: March 20, 2012

Assignee: Zalag Corporation

Inventor: Samuel S. Epstein
System and methods for ranking documents based on content characteristics

Patent number: 8140526

Abstract: A system is described for assessing information in natural language contents. A user interface receives an object name as a query term and a value for a customized ranking parameter from a user. A computer storage stores an object-specific data set related to the object name, wherein the object-specific data set includes a plurality of property names and association-strength values. A computer processing system can count a first frequency of a first property name and count a second frequency of a second property name in a document containing text in a natural language, calculate a relevance score as a function of the first frequency and the second frequency, and rank the plurality of documents using their respective relevance scores, and return one or more documents to the user based on the ranking of the plurality of documents. The function is in part defined by the customized ranking parameter.

Type: Grant

Filed: February 3, 2010

Date of Patent: March 20, 2012

Inventor: Guangsheng Zhang
Automatic classification of defects

Patent number: 8140514

Abstract: A method of automatically classifying defects. The method generally includes the steps of (A) receiving information for a current defect, (B) extracting field values from the current defect, (C) counting a number of occurrences of one or more keywords in the current defect, (D) determining one or more new keywords occurring in the current defect and storing the one or more new keywords in a database and (E) creating one or more linkages in the database between a first record corresponding to the current defect and one or more second records corresponding to previous defects based upon one or more similarities between the first and the second records.

Type: Grant

Filed: November 26, 2008

Date of Patent: March 20, 2012

Assignee: LSI Corporation

Inventors: Khanh Nguyen, Seonmi Anderson, Michael L. Peterson
Apparatus, method and program for text mining

Patent number: 8140337

Abstract: Disclosed is an apparatus includes a text input device that inputs text data provided with confidence measure, as subject for mining, a language processing unit that performs language analysis of the input text data provided with the confidence measures, a confidence measure exploiting characteristic word count unit that counts the characteristic words in the input text to provide a count result and that exploits the statistical information and the confidence measures provided in the input text to correct the count result obtained, a characteristic measure calculation unit that calculates the characteristic measure of each characteristic word from the corrected count result, a mining result output device that outputs the characteristic measure of each characteristic word obtained, a user operation input device for a user to input setting for language processing of the input text and setting for a technique for calculating the characteristic measure being found, a mining process management unit that transmits

Type: Grant

Filed: July 18, 2007

Date of Patent: March 20, 2012

Assignee: NEC Corporation

Inventors: Satoshi Nakazawa, Satoshi Morinaga
Discovering query intent from search queries and concept networks

Patent number: 8135721

Abstract: A system is described for discovering query intent based on search queries and concept networks. The system may construct frequency vectors from log data corresponding to a submitted query and at least one related query submitted to one or more search engines. The system may also construct a query intent vector based on the frequency vectors. The query intent vector may include frequency scores that represent the intent of the query.

Type: Grant

Filed: October 21, 2010

Date of Patent: March 13, 2012

Assignee: Yahoo! Inc.

Inventors: Deepa B. Joshi, John J. Thrall
Homology searching method

Patent number: 8135720

Abstract: An apparatus for controlling devices for searching homology of queries in a base sequence in parallel, includes: a memory for storing a base sequence and an appearing frequency of each of first strings each having a fixed length appearing in the base sequence; and a processor for executing a process including: obtaining queries for searching homology in the base sequence; retrieving each of second strings each having a longer fixed length then that of first strings and partially appearing in each of the queries; determining an approximate appearing frequency of each of the second string on the basis of the appearing frequency of the first strings; evaluating for each of the query sequences a load of task for searching homology; and allocating each task for searching homology for each of the queries among the devices on the basis of the result of evaluation of the load of the each task.

Type: Grant

Filed: November 2, 2009

Date of Patent: March 13, 2012

Assignee: Fujitsu Limited

Inventor: Akira Naruse
Rapid automatic keyword extraction for information retrieval and analysis

Patent number: 8131735

Abstract: Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.

Type: Grant

Filed: September 9, 2009

Date of Patent: March 6, 2012

Assignee: Battelle Memorial Institute

Inventors: Stuart J Rose, Wendy E Cowley, Vernon L Crow, Nicholas O Cramer
Tuning of relevancy ranking for federated search

Patent number: 8131716

Abstract: Determining a relevancy ranking score is disclosed. An indication is received that a relevancy ranking score algorithm is to be tuned to a selected preference. The relevancy ranking score algorithm is updated based at least in part on the selected preference, wherein the relevancy ranking score of a search result resulting from a search query is based at least in part on one or more constraints of the search query.

Type: Grant

Filed: July 12, 2010

Date of Patent: March 6, 2012

Assignee: EMC Corporation

Inventors: Pierre-Yves Chevalier, Bruno Roustant
SYSTEM AND METHOD FOR GENERATING A RELATIONSHIP NETWORK

Publication number: 20120054206

Abstract: A computer-implemented system and process for generating a relationship network is disclosed. The system provides a set of data items to be related and generates variable length data vectors to represent the relationships between the terms within each data item. The system can be used to generate a relationship network for documents, images, or any other type of file. This relationship network can then be queried to discover the relationships between terms within the set of data items.

Type: Application

Filed: July 25, 2011

Publication date: March 1, 2012

Applicant: The Regents of the University of California

Inventors: Kasian Franks, Cornelia A. Myers, Raf M. Podowski
Distributing content indices

Patent number: 8117215

Abstract: A query-centric system and process for distributing reverse indices for a distributed content system. Relevance ranking techniques in organizing distributed system indices. Query-centric configuration subprocesses (1) analyze query data, partitioning terms for reverse index server(s) (RIS), (2) distribute each partitioned data set by generally localizing search terms for the RIS that have some query-centric correlation, and (3) generate and maintain a map for the partitioned reverse index system terms by mapping the terms for the reverse index to a plurality of different index server nodes. Indexing subprocess element builds distributed reverse indices from content host indices. Routines of the query execution use the map derived in the configuration to more efficiently return more relevant search results to the searcher.

Type: Grant

Filed: September 24, 2010

Date of Patent: February 14, 2012

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: George H. Forman, Zhichen Xu
Music artist retrieval system and method of retrieving music artist

Patent number: 8117214

Abstract: The present invention provides a music artist retrieval system which makes it possible for users to automatically retrieve an unknown music artist similar to the user's favorite artist while actually reproducing and confirming a piece of music of the unknown artist. A music artist similarity map storing section (13) computes a plurality of similarities for a plurality of music artists and makes a music artist similarity map for the plurality of music artists based on the plurality of similarities, then stores the music artist similarity map. Here, the similarities are computed between one of the plurality of music artists and the other music artists based on features of the respective music artists. A similar artists selecting and displaying section (17) displays on a display plurality of indications related to one music artist and two or more music artists whose similarities are close to the one music artist, based on the music artist similarity map.

Type: Grant

Filed: October 5, 2007

Date of Patent: February 14, 2012

Assignee: National Institute of Advanced Industrial Science and Technology

Inventors: Elias Pampalk, Masataka Goto
Determining veracity of data in a repository using a semantic network

Patent number: 8108410

Abstract: A mechanism for determining the veracity of data in a repository. Responsive to receiving a search query from a user, a semantic network is created from the documents in the repository. A determination is made as to whether data from a first document in the semantic network conflicts with data from a second document in the semantic network. If a conflict exists, a determination is made as to whether the data from the first document is obsolete in comparison to data from the second document. If the data from the first document is obsolete in comparison to data from the second document, a portion of the first document corresponding to the obsolete data is automatically annotating with the data from the second document to form an annotated first document. A search result list is then provided to the user comprising the second document and the annotated first document.

Type: Grant

Filed: August 6, 2008

Date of Patent: January 31, 2012

Assignee: International Business Machines Corporation

Inventors: Ann Margaret Strosaker, Michael Thomas Strosaker
Pangenetic web user behavior prediction system

Patent number: 8108406

Abstract: Computer based systems, methods, software and databases are presented in which correlations between web item preferences, behaviors and pangenetic (genetic and epigenetic) attributes of individuals are used for pangenetic based user behavior prediction in which predictions of a user's online behavior can be generated based on the user's pangenetic makeup. Data masking can be used to maintain privacy of sensitive portions of the pangenetic data.

Type: Grant

Filed: December 30, 2008

Date of Patent: January 31, 2012

Assignee: Expanse Networks, Inc.

Inventors: Andrew Alexander Kenedy, Charles Anthony Eldering
Informationn retrieval apparatus

Patent number: 8108407

Abstract: An information retrieval apparatus, which can present to a user only a related word matching a user search intent, includes: an associative dictionary storage unit for storing words included in plural pieces of text to be searched and relevance degrees between the words; an appearance frequency storage unit for storing an appearance frequency that is the number of pieces of text in which the words stored in the associative dictionary storage unit appear, among the plural pieces of text to be searched; and a related word obtaining unit that obtains a related word to be presented to the user, from the relevance degree between the search word entered by the user and another word among the words, the appearance frequency, and the user search intent.

Type: Grant

Filed: November 6, 2007

Date of Patent: January 31, 2012

Assignee: Panasonic Corporation

Inventors: Takashi Tsuzuki, Kenji Mizutani, Kazutoyo Takata, Satoshi Matsuura
Determining top combinations of items to present to a user

Patent number: 8108409

Abstract: Embodiments of the present invention pertain to determining top combinations of items to present to a user. According to one embodiment, data that includes information describing a plurality of combinations of records is accessed. Each record describes a plurality of items. The data is analyzed using a branch and bound search procedure to determine top combinations of items based on a specified metric and a specified number. According to one embodiment, the metric is value enabled and the specified number determines how many combinations of items are associated with the top combinations of items.

Type: Grant

Filed: July 19, 2007

Date of Patent: January 31, 2012

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Julie W. Drew, Juan Antonio R. Garay, Krishna Venkatraman
Method and system for matching data sets of non-standard formats

Patent number: 8103679

Abstract: A system and method is described for receiving a plurality of non-standardized data sets and generating respective plurality of standardized profiles that can be used for efficiently comparing and matching one profile against the other plurality of profiles. One application of this invention is to convert job seekers' resumes and job postings into respective profiles and then permitting either a job seeker to search for job postings that most closely match the job seeker's resume or, conversely, permitting an employer to search for job seekers whose resumes most closely match the employer's job posting.

Type: Grant

Filed: August 8, 2007

Date of Patent: January 24, 2012

Assignee: CareerBuilder, LLC

Inventors: Andrew B. Cranfill, Jason Elliott
SYSTEM, METHOD AND COMPUTER PROGRAM FOR PREPARING DATA FOR ANALYSIS

Publication number: 20120011132

Abstract: A method of preparing data for analysis, comprising the steps of receiving an initial data set including a plurality of records, each of the plurality of records including an identifier attribute and an associative attribute that identifies a further one or more records; receiving the further one or more records identified by the associative attribute in each of the plurality of records; and associating the further one or more records with the initial data set to form a final data set.

Type: Application

Filed: July 8, 2011

Publication date: January 12, 2012

Applicant: Patent Analytics Holding Pty Ltd

Inventor: Doris Spielthenner
Automatic index term augmentation in document retrieval

Patent number: 8095533

Abstract: Disclosed are methods and systems for automatically assigning index terms to electronic documents such as Web pages or sites in a manner which may be used to facilitate the retrieval of electronic documents of interest. The method involves determining co-occurrences of terms in other documents with the electronic document, and selecting terms as index terms based upon those scores. The method permits the efficient retrieval of electronic documents.

Type: Grant

Filed: November 9, 2004

Date of Patent: January 10, 2012

Assignee: Apple Inc.

Inventor: Jay Michael Ponte
Full-text relevancy ranking

Patent number: 8095529

Abstract: A method and system for ranking relevancy of metadata associated with media on a computer network, such as multimedia and streaming media, include categorizing the metadata into sets of metadata. The categories are broad categories relating to areas such as who, what, when, and where, such as artist, media type, and creation date, creation location. Weights are assigned to each set of metadata. Weights are related to technical information such as bit rate, duration, sampling rate, frequency of occurrence of a specific term, etc. A score is calculated for ranking the relevancy of each set of metadata. The score is calculated in accordance with the assigned weight and category. This score is available for search systems (e.g., search engines) and/or users to determine the relative ranking of search results.

Type: Grant

Filed: January 4, 2005

Date of Patent: January 10, 2012

Assignee: AOL Inc.

Inventors: Theodore George Diamond, Daniel Allen Hendrick, Eric Carl Rehm, Melissa Anne Riesland
Method, system, and apparatus for validation

Patent number: 8095544

Abstract: In a method for validating data, a text of a document is received. At least one fact is extracted from the text. At least one expert refinement is merged with the at least one fact to create at least one modified fact. The at least one modified fact is provided for a review. An expert refinement to the at least one modified fact is captured in response to the review. A superset document based on the at least one pre-existing refinement and the expert refinement is stored.

Type: Grant

Filed: May 30, 2003

Date of Patent: January 10, 2012

Assignee: Dictaphone Corporation

Inventors: Keith W. Boone, Sunitha Chaparala, Sean Gervais, Robert G. Titemore, Harry J. Ogrinc, Jeffrey G. Hopkins, Roubik Manoukian, Cameron Fordyce
Book content item search

Patent number: 8095546

Abstract: Methods, systems, and apparatus, including computer program products are provided for ranking distinct book content items based on implicit links to other distinct book content items. The implicit links are defined based on the identification of matching features in the distinct book content items. In some implementations, the matching features are uncommon phrases in textual content of the distinct book content items. Edges representing implicit links are generated between distinct nodes representing distinct book content items in a weighted graph. Search results for distinct book content items can be ordered based on the edges connected to the distinct nodes in the weighted graph that represent the distinct book content items.

Type: Grant

Filed: January 9, 2009

Date of Patent: January 10, 2012

Assignee: Google Inc.

Inventors: Shumeet Baluja, Yushi Jing
Fast algorithms and metrics for comparing hierarchical clustering information trees and numerical vectors

Patent number: 8095543

Abstract: In various embodiments, a method for determining a similarity between two data sets is disclosed, the steps of which include determining a first list of data clusters for a first hierarchically-organized data set, determining a second list of data clusters for a second hierarchically-organized data set, and determining a similarity between the first and second data sets by calculating a maximum flow between the first list of data clusters and the second list of data clusters.

Type: Grant

Filed: July 31, 2008

Date of Patent: January 10, 2012

Assignee: The United States of America as represented by the Secretary of the Navy

Inventor: Anjum Gupta
System and methodology for a multi-site search engine

Patent number: 8095545

Abstract: Techniques for query processing in a multi-site search engine are described. During an indexing phase, each site of a multi-site search engine indexes a set of assigned web resources and each site calculates, for each term in the set of assigned web resources, a site-specific upper bound ranking score on the contribution of the term to the search engine ranking function for a query containing the term. During a propagation phase, all sites exchange their site-specific upper bound ranking scores with each other. In response to a site receiving a query, the site determines the set of locally matching resources and compares the ranking score of a locally matching resource with the site-specific upper bound ranking scores for the terms of the query that were received during the propagation phase and determines whether to communicate the query to other sites.

Type: Grant

Filed: October 14, 2008

Date of Patent: January 10, 2012

Assignee: Yahoo! Inc.

Inventors: Luca Telloli, Flavio Junqueria, Aristides Gionis, Vassilis Plachouras, Ricardo Baeza-Yates
Searching related documents

Patent number: 8090722

Abstract: Systems, methods, and other embodiments associated with logically expanding a document and determining the relevance of the logically expanded document to a query are described. One method embodiment includes searching an index to locate a document identifier for a document in which a query term appears. The method includes determining whether the index entry includes an expansion identifier, and, if so, producing a logically expanded document. The logically expanded document may include both a document associated with the document identifier and a document associated with the expansion identifier. The method may then determine a relevance value of the logically expanded document with respect to the query and may provide a signal corresponding to the relevance value.

Type: Grant

Filed: March 21, 2007

Date of Patent: January 3, 2012

Assignee: Oracle International Corporation

Inventors: Muralidhar Krishnaprasad, Meeten Bhavsar
Document analysis and multi-word term detector

Patent number: 8090724

Abstract: A term analyzer receives an ordered collection of text-based terms. The ordered collection can contain terms from a document that have been filtered to remove “noise” such as stopwords. The term analyzer analyzes groupings of consecutive text-based terms in the ordered collection to identify occurrences of different combinations of text-based terms in the ordered collection. In addition, the term analyzer maintains frequency information representing the occurrences of the different combinations of text-based terms in the collection. The frequency information can then be used to determine relatively significant keywords and/or keyword phrases in the document. In an example configuration, the term analyzer creates a tree in which a first term in a given grouping of the groupings is defined as a parent node in the tree and a second term in the given grouping is defined as a child node of the parent node in the tree.

Type: Grant

Filed: November 28, 2007

Date of Patent: January 3, 2012

Assignee: Adobe Systems Incorporated

Inventors: Michael J. Welch, Walter Chang
Method and system for matching data sets of non-standard formats

Patent number: 8090725

Abstract: A system and method is described for receiving a plurality of non-standardized data sets and generating respective plurality of standardized profiles that can be used for efficiently comparing and matching one profile against the other plurality of profiles. One application of this invention is to convert job seekers' resumes and job postings into respective profiles and then permitting either a job seeker to search for job postings that most closely match the job seeker's resume or, conversely, permitting an employer to search for job seekers whose resumes most closely match the employer's job posting.

Type: Grant

Filed: April 16, 2010

Date of Patent: January 3, 2012

Assignee: CareerBuilder, LLC

Inventor: Andrew B. Cranfill

prev … 4 5 6 7 8 9 10 11 next