Inverted Lists (epo) Patents (Class 707/E17.086)

Extracting keywords from a document

Patent number: 10387568

Abstract: An unsupervised keyword extraction process is disclosed. A single input document can be analyzed to identify multiple candidate keywords by utilizing splitting terms. A keyword score is calculated for each of the candidate keywords. The keyword score for a particular candidate keyword is determined based on the length of the candidate keywords that contain the candidate keyword and the frequency of the words appearing in the candidate keywords. One or more keywords having the highest keyword scores are selected as the extracted keywords. The extracted keywords can be used in applications, such as refining search results, providing suggested search terms, or improving the match rate of a network page at a search engine.

Type: Grant

Filed: September 19, 2016

Date of Patent: August 20, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Weiwei Cheng, Amanda Dee Bottorff, Sandeep Ranganathan
Method for determining relevant search results

Patent number: 9990442

Abstract: Systems and methods for determining search results. The method may include receiving an at least partial search term, and identifying keywords based on the at least partial search term, wherein each keyword has an associated keyword measure based on the number of times each keyword has been previously searched for within a predetermined time period. For each keyword search results associated with the keyword may be identified, wherein each result has an associated search measure. A relevance measure may be determined for each result using the keyword measure the search measure, and used to provide at least one of the results as a search result for the at least partial search term.

Type: Grant

Filed: September 13, 2016

Date of Patent: June 5, 2018

Assignee: S.L.I. SYSTEMS, INC.

Inventor: Shaun William Ryan
PSEUDO-DOCUMENTS TO FACILITATE DATA DISCOVERY

Publication number: 20130275436

Abstract: Various embodiments promote the discoverability of data that can be contained within a database. In one or more embodiments, data within a database is organized in a structure having a schema. The structure and data can be processed in a manner that renders one or more pseudo-documents each of which constitutes a sub-structure that can be indexed. Once produced and indexed, the pseudo-documents constitute a set of searchable objects each of which relationally points back to its associated structure within the database. Searches can now be performed against the pseudo-documents which, in turn, returns a set of search results. The set of search results can include multiple sub-sets of pseudo-documents, each sub-set of which is associated with a different structure.

Type: Application

Filed: April 11, 2012

Publication date: October 17, 2013

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Lev Novik, John C. Platt
USING AN INVERTED INDEX TO PRODUCE AN ANSWER TO A QUERY

Publication number: 20130097126

Abstract: In response to a query having a search term, an inverted index that is defined on a set of attributes of a database structure is accessed, where the inverted index associates values of the set of attributes with corresponding references to rows of the database structure. It is determined whether any of the attributes in the set is in the search term. In response to determining that any of the attributes in the set is in the search term, the inverted index is used to produce an answer to the query.

Type: Application

Filed: October 17, 2011

Publication date: April 18, 2013

Inventor: D. Blair ELZINGA
METHOD AND APPARATUS FOR CREATING A SEARCH INDEX FOR A COMPOSITE DOCUMENT AND SEARCHING SAME

Publication number: 20130007004

Abstract: A tool for generating at least one search index for a composite document, wherein the composite document comprises multiple component documents. The search index is generated by extracting characters from the document, segregating the characters into tokens of one or more characters, and determining location information of the tokens. The location information can include the page number of the component document and X, Y page coordinates for the tokens. The tool also provides a user interface that allows for searching of the composite document using at least one of the generated indexes. The user interface allows the user to enter one or more search terms and to select the criteria that will be used during the search. Results are presented to the user via a list of document names that are also hyperlinks to the document. The results documents are listed in order of relevancy, and fragments of text that contain the searched terms are also available to the user, for each document.

Type: Application

Filed: June 30, 2011

Publication date: January 3, 2013

Applicant: Landon IP, Inc.

Inventors: Krishmin RAI, George V. SHRECK
CLUSTERING A COLLECTION USING AN INVERTED INDEX OF FEATURES

Publication number: 20120150867

Abstract: Provided are techniques for creating an inverted index for features of a set of data elements, wherein each of the data elements is represented by a vector of features, wherein the inverted index, when queried with a feature, outputs one or more data elements containing the feature. The features of the set of data elements are ranked. For each feature in the ranked list, the inverted index is queried for data elements having the feature and not having any previously selected feature and a cluster of the data elements is created based on results returned in response to the query.

Type: Application

Filed: December 13, 2010

Publication date: June 14, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Danish Contractor, Thomas Hampp-Bahnmueller, Sachindra Joshi, Raghuram Krishnapuram, Kenney Ng
METHODS FOR INDEXING AND SEARCHING BASED ON LANGUAGE LOCALE

Publication number: 20120109970

Abstract: In response to a search query having a search term received from a client, a current language locale is determined. A state machine is built based on the current language locale, where the state machine includes one or more nodes to represent variance of the search term having identical meaning of the search term. Each node of the state machine is traversed to identify one or more postings lists of an inverted index corresponding to each node of the state machine. One or more item identifiers obtained from the one or more postings list are returned to the client, where the item identifiers identify one or more files that contain the variance of the search term represented by the state machine.

Type: Application

Filed: October 27, 2010

Publication date: May 3, 2012

Applicant: APPLE INC.

Inventors: John M. Hörnkvist, Eric R. Koebler
Automatic generation of ontologies using word affinities

Patent number: 8171029

Abstract: In one embodiment, generating an ontology includes accessing an inverted index that comprises inverted index lists for words of a language. An inverted index list corresponding to a word indicates pages that include the word. A word pair comprises a first word and a second word. A first inverted index list and a second inverted index list are searched, where the first inverted index list corresponds to the first word and the second inverted index list corresponds to the second word. An affinity between the first word and the second word is calculated according to the first inverted index list and the second inverted index list. The affinity describes a quantitative relationship between the first word and the second word. The affinity is recorded in an affinity matrix, and the affinity matrix is reported.

Type: Grant

Filed: October 1, 2008

Date of Patent: May 1, 2012

Assignee: Fujitsu Limited

Inventors: David L. Marvit, Jawahar Jain, Stergios Stergiou, Yannis Labrou
INDEXING MULTIPLE TYPES OF DATA TO FACILITATE RAPID RE-INDEXING OF ONE OR MORE TYPES OF DATA

Publication number: 20110219008

Abstract: A method and indexing system indexes the content of a body of documents into a content index, and the metadata of the documents into a metadata index which is a parallel index to the content index. The metadata is copied into a data store that is easily accessible by the indexing system and is stored in native form. The indexing system can dynamically re-index the metadata from the native metadata in the data store to produce a new metadata index which is used to replace the original metadata index. Search queries received by a search engine associated with the indexing system are applied to both the content and metadata index and the results are merged for return.

Type: Application

Filed: March 8, 2010

Publication date: September 8, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: DAVID O. BEEN, MICHAEL BUSCH, OSAMU FURUSAWA, FREDERICK S. GRENNAN, FUMIHIKO TERUI, JUSTO L. PEREZ
Method and apparatus for searching a hierarchical database and an unstructured database with a single search query

Publication number: 20090119257

Abstract: Techniques for searching a hierarchical database and an unstructured database with a single search query are described herein. In one embodiment, a single search query is received that has syntax identifying an unstructured search string within a structured search query to automatically cause a search of the inverted index and use of the result to automatically search the hierarchical database. The unstructured search string is extracted from the single search query and an inverted index is searched according to the unstructured search string, wherein the inverted index includes virtual documents created from data stored in the hierarchical database, wherein each virtual document includes a unique identifier from the hierarchical database used to designate the data in the hierarchical database from which that virtual document was created, wherein a result of the inverted index search includes the unique identifiers of the virtual documents that meet the search.

Type: Application

Filed: November 2, 2007

Publication date: May 7, 2009

Inventor: Christopher Waters
Automatic Generation Of Ontologies Using Word Affinities

Publication number: 20090094262

Abstract: In one embodiment, generating an ontology includes accessing an inverted index that comprises inverted index lists for words of a language. An inverted index list corresponding to a word indicates pages that include the word. A word pair comprises a first word and a second word. A first inverted index list and a second inverted index list are searched, where the first inverted index list corresponds to the first word and the second inverted index list corresponds to the second word. An affinity between the first word and the second word is calculated according to the first inverted index list and the second inverted index list. The affinity describes a quantitative relationship between the first word and the second word. The affinity is recorded in an affinity matrix, and the affinity matrix is reported.

Type: Application

Filed: October 1, 2008

Publication date: April 9, 2009

Applicant: Fujitsu Limited

Inventors: David L. Marvit, Jawahar Jain, Stergios Stergiou, Yannis Labrou