Inverted Index Patents (Class 707/742)
-
Patent number: 8819026Abstract: Systems and methods are disclosed for tracking an object as it traverses a sequential chain. The relationships between the object, its movement through space and time, and the entities associated with the object at a discreet point of time are captured by a sequential chain. A unique identifier may be created that is continuously modified as the object traverses the sequential chain. The unique identifier may be used to capture relationship information between the object and its related entities and movements.Type: GrantFiled: August 25, 2011Date of Patent: August 26, 2014Assignee: SCR Technologies, Inc.Inventor: Randal B. Fischer
-
Techniques for representing keywords in an encrypted search index to prevent histogram-based attacks
Patent number: 8819451Abstract: A method and system for cryptographically indexing, searching for, and retrieving documents is provided. In some embodiments, an encryption system is provided that generates a document index that allows users to retrieve documents by performing encrypted queries for keywords associated with the documents. In some embodiments, each keyword maps to the same number of encrypted document identifiers. In some embodiments, an extractor graph is employed to map an indication of each keyword to a number of buckets storing encrypted document identifiers. In some embodiments, an order-preserving encryption system is provided. The encryption system uses an ordered index that maps encrypted instances of ordered attribute values to documents that are associated with those values. The ordered index enables queries containing query operators that rely on order, such as less than (“<”) or greater than (“>”), to be successfully performed on encrypted attribute values.Type: GrantFiled: May 28, 2009Date of Patent: August 26, 2014Assignee: Microsoft CorporationInventors: Satyanarayana V. Lokam, Ajay Manchepalli, Balasubramanyan Ashok, Sandeep P. Karanth, Raghav Bhaskar -
Publication number: 20140236962Abstract: Systems and methods for regularly updating portions of a merged index are provided. Initially, upon receiving an indication that modifications have occurred to content of web-based documents, dynamic update of index (DUI) objects that identify the documents and expose the modified content are composed by ascertaining relative positions of the modified content within the documents, and packaging identifiers of the documents, the relative positions, and metadata underlying the modified content into a message. The DUI objects are applied to an overloading index that maintains structured records of recent modifications. In particular, portions of the overloading index are targeted utilizing the document identifiers and the relative positions specified by the DUI object, thereby updating the targeted portions within the overloading index corresponding to the modified content without rewriting the entire overloading index.Type: ApplicationFiled: May 2, 2014Publication date: August 21, 2014Applicant: Microsoft CorporationInventors: Abhas Kumar, Pratibha Permandla, Gaurav Sareen, Anna Timasheva, Deepak Shankar
-
Patent number: 8805808Abstract: Inverted indexes for terms and for term separators are separately provided to minimize data redundancy. Search queries are parsed to identify terms and term separators, if any, and the corresponding inverted indexes are searched for responsive documents. Related apparatus, systems, techniques and articles are also described.Type: GrantFiled: June 25, 2013Date of Patent: August 12, 2014Assignee: SAP AGInventors: Frederik Transier, Franz Faerber
-
Publication number: 20140214853Abstract: Systems and methods are disclosed for tracking an object as it traverses a sequential chain. The relationships between the object, its movement through space and time, and the entities associated with the object at a discreet point of time are captured by a sequential chain. A unique identifier may be created that is continuously modified as the object traverses the sequential chain. The unique identifier may be used to capture relationship information between the object and its related entities and movements.Type: ApplicationFiled: April 1, 2014Publication date: July 31, 2014Applicant: SCR Technologies, Inc.Inventor: Randal B. Fischer
-
Patent number: 8775435Abstract: Systems and methods for processing an index are described. A postings list of items containing a particular term are ordered in a desired retrieval order, e.g., most recent first. The ordered items are inserted into an inverted index in the desired retrieval order, resulting in an ordered inverted index from which items may be efficiently retrieved in the desired retrieval order. During retrieval, items may first be retrieved from a live index, and the retrieved items from the live and ordered indexes may be merged. The retrieved items may also be filtered in accordance with the items' file grouping parameters.Type: GrantFiled: September 13, 2011Date of Patent: July 8, 2014Assignee: Apple Inc.Inventors: Wayne Loofbourrow, John Martin Hoernkvist, Eric Richard Koebler, Yan Arrouye
-
Patent number: 8762387Abstract: The disclosed embodiments provide a system that processes data. During operation, the system obtains a set of records, wherein each of the records comprises one or more metrics and at least one dimension associated with the one or more metrics. Next, the system creates, in a data segment comprising the records, an inverted index for a column in the records based on a cardinality of the column. Finally, the system compresses the inverted index based on a jump value associated with record identifiers in the column.Type: GrantFiled: July 31, 2013Date of Patent: June 24, 2014Assignee: LinkedIn CorporationInventors: Dhaval Patel, Sanjay Dubey, Praveen N. Naga, Volodymyr Zhabiuk, Jintae Jung
-
Publication number: 20140164388Abstract: A document index is generated from a set of documents and is used to identify documents that match one or more queries. A tree is generated for each document with a node corresponding to each object of the document. The nodes of the generated trees are merged or combined to generate the document index, which is itself a tree. In addition, an inverted index is generated for each node of the index that identifies the tree(s) that the node originated from. When a query is received, the query is first executed against the document index tree: during the execution, proper set operations are applied to the inverted indices associated with the nodes matched by the query. The resulted set identifies the documents that may match the query. The query is then executed on the identified documents.Type: ApplicationFiled: December 10, 2012Publication date: June 12, 2014Applicant: Microsoft CorporationInventors: Li Zhang, Mihai Budiu, Yuan Yu, Gordon D. Plotkin
-
Patent number: 8751505Abstract: Method, system, and computer program product for indexing and searching entity-relationship data are provided. The method includes: defining a logical document model for entity-relationship data including: representing an entity as a document containing the entity's searchable content and metadata; dually representing the entity as a document and as a category; and representing each relationship instance for the entity as a category set that contains categories of all participating entities in the relationship. The method also includes: translating entity-relationship data into the logical document model; and indexing the entity-relationship data of the populated logical document model as an inverted index. The method may include searching indexed entity-relationship data using a faceted search, wherein the categories are all categories required for supporting faceted navigation.Type: GrantFiled: March 11, 2012Date of Patent: June 10, 2014Assignee: International Business Machines CorporationInventors: David Carmel, Haggai Roitman, Sivan Yogev
-
Patent number: 8744839Abstract: Target word recognition includes: obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; performing segmentation of the characteristic computation data to generate a plurality of text segments; combining the plurality of text segments to form a text data combination set; determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determining a plurality of designated characteristic values for the plurality of text data combinations; based at least in part on the plurality of designated characteristic values and according to at least a criterion, recognizing among the plurality of text data combinations target words whose characteristic values fulfill the criterion.Type: GrantFiled: September 22, 2011Date of Patent: June 3, 2014Assignee: Alibaba Group Holding LimitedInventors: Haibo Sun, Yang Yang, Yining Chen
-
Patent number: 8738631Abstract: A process is disclosed for the computer management of inverted lists and inverted indices, in which the standard representation and processing of inverted lists is changed in order to achieve a simpler, more compact and more efficient architecture.Type: GrantFiled: September 24, 2013Date of Patent: May 27, 2014Inventor: Giovanni M. Sacco
-
Publication number: 20140129567Abstract: In the present invention, scope search can be effectively performed in a database having encrypted registration information. A plurality of values, first identification information to identify the plurality of values, and a key are accepted as input. A value group is generated from the plurality of values. The value group is treated as a word group, and a secure index is generated from the word group, the first identification information, and the key. On the basis of a value to be retrieved and a key, trapdoor information for the value to be retrieved is generated. With respect to the generated secure index, a secure index assessment process is performed using the trapdoor information. When the value to be retrieved is assessed to be contained in the secure index as a result of the assessment process, second identification information to identify the secure index is output.Type: ApplicationFiled: July 27, 2012Publication date: May 8, 2014Applicant: c/o NEC CorporationInventors: Toshinori Araki, Isamu Teranishi
-
Publication number: 20140129566Abstract: A geographic document retrieval method (GDR) can be executed by a computer system to index, retrieve and rank geographical documents. Textual and spatial attributes of geographical documents are indexed separately using inverted index and spatial index, respectively. Spatial attributes of a document are represented as one or more contiguously closed regions of arbitrary shapes. Upon receiving an input query carrying a geographic representation of a location using arbitrary regions, the GDR method retrieves one or more documents by executing an overlap test between arbitrary regions from the query and the arbitrary regions associated with the documents.Type: ApplicationFiled: April 19, 2013Publication date: May 8, 2014Applicant: xAd, Inc.Inventor: xAd, Inc.
-
Publication number: 20140101167Abstract: The present disclosure relate to techniques for establishing an inverted indexing system and related data processing. The techniques may include writing, by a computing device, inverted indexes of a massive amount of data records into at least one inverted file. The computing device may then write description information of the written inverted file into a description file associated with the inverted file, and establish the inverted indexing system based on the inverted file and the description file of the inverted file. The techniques enhance efficiency in establishing the inverted indexing system and in processing data using the systems.Type: ApplicationFiled: October 3, 2013Publication date: April 10, 2014Applicant: Alibaba Group Holding LimitedInventor: Jian Qin
-
Patent number: 8688718Abstract: The disclosed embodiments provide a method and system for processing data. During operation, the system obtains a set of records, wherein each of the records comprises one or more metrics and at least one dimension associated with the one or more metrics. Next, the system creates a data segment comprising at least one of a forward index and an inverted index for a column in the records. The system then stores the data segment in network-accessible storage and assigns the data segment to a partition. Finally, the system enables querying of the data segment through a query node associated with the partition.Type: GrantFiled: July 31, 2013Date of Patent: April 1, 2014Assignee: LinkedIn CorporationInventors: Sanjay Dubey, Dhaval Patel, Praveen N. Naga, Volodymyr Zhabiuk
-
Patent number: 8682902Abstract: According to one embodiment, a storage device includes an interface, a first and second memory blocks and a controller. The interface receives a content search request. The first memory block stores files and inverted files corresponding to contents included in the files. The second memory block stores a file search table. The controller creates the inverted file for each content included in the files and stores IDs of the files including the content in the inverted file. The controller obtains, by search of the content, a corresponding inverted file from the inverted files stored in the first memory block and stores, in the file search table, the IDs of the files included in the obtained inverted file. The controller outputs the IDs of the files stored in the file search table from the interface as a search result for the content search request.Type: GrantFiled: November 20, 2012Date of Patent: March 25, 2014Assignee: Kabushiki Kaisha ToshibaInventors: Kosuke Tatsumura, Atsuhiro Kinoshita
-
Publication number: 20140059054Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhanced parallel latent Dirichlet allocation (PLDA+). A PLDA+ system is a system of multiple processors that are configured to generate topics from multiple documents. The multiple processors are designated as two types: document processors and matrix processors. The documents are distributed among the document processors. Generated topics are distributed among the matrix processors. Tasks performed on the document processors and matrix processors are segregated into two types of tasks: computation-bound tasks and communication-bound tasks. Computation-bound tasks are CPU intensive tasks; communication-bound tasks are network intensive tasks. Data placement and pipeline strategies are employed such that the computation-bound tasks and the communication-bound tasks are distributed to the processors in a balanced manner, and performed in parallel.Type: ApplicationFiled: May 11, 2011Publication date: February 27, 2014Inventors: Zhiyuan Liu, Yuzhou Zhang, Edward Y. Chang
-
Patent number: 8655888Abstract: Provided are a method, system, and article of manufacture for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored, wherein the posting lists are used to process a query on a range of values within the set of values.Type: GrantFiled: December 22, 2011Date of Patent: February 18, 2014Assignee: International Business Machines CorporationInventors: Marcus F. Fontoura, Ronny Lempel, Runping Qi, Jason Y. Zien
-
Patent number: 8645388Abstract: A method for processing a query includes providing an inverted multi-path index for storing path-value pairs. Each path-value pair references at least one structured document stored in a database system, and comprises an index path expression of an indexed element and an indexed value associated with the indexed element. The method includes receiving a clause including a path expression-value pair comprising a path expression associated with an element, determining that the clause can be processed by the inverted multi-path index, processing the clause to identify a path-value pair in the inverted multi-path index matching the path expression-value pair of the clause, and identifying the structured document referenced by the matching path-value pair.Type: GrantFiled: June 16, 2011Date of Patent: February 4, 2014Assignee: EMC CorporationInventors: Edward C. Bueche, Francisco Borges, Petr Pleshachkov, Shanshan Quan, Marc Brette, Venkatesan Chandrasekaran
-
Publication number: 20140032567Abstract: The present document relates to a system and method for searching a document using one or more search terms. In particular, the present document relates to a resource efficient method for searching a document within a database of documents. A method for determining an inverse index on an electronic device including a database is described. The inverse index is configured to map a plurality of text data entities from the database to a search term. The method includes determining a plurality of relevance vectors for a plurality of text data entities from the database. Determining a relevance vector for a text data entity from the database includes: selecting N terms which are descriptive of the text data entity; and determining the relevance vector from the selected N terms. Furthermore, the method includes determining the inverse index comprising a plurality of records.Type: ApplicationFiled: July 29, 2013Publication date: January 30, 2014Applicant: ExB Asset Management GmbHInventors: Ramin ASSADOLLAHI, Stefan BORDAG
-
Patent number: 8615519Abstract: Methods and systems for providing an inverted index for a dataset are disclosed. The inverted index includes a position vector, with fields that correspond to values in the indexed dataset. The fields include data to be used in determining where each value appears in the dataset. The position vector is populated differently for different value types. A 1:1 value appears once in the dataset; a 1:n value appears multiple times. For a 1:1 value, the position vector stores information for where that value appears. For a 1:n value, the position vector stores a pointer, e.g. a memory reference, that identifies a list of locations where the value appears. The list can be encoded or otherwise compressed. A set of indicators can be stored for the fields indicating whether the field has 1:n or 1:1 value information. The indicator is used to control interpretation of the information in a field.Type: GrantFiled: March 29, 2012Date of Patent: December 24, 2013Assignee: SAP AGInventor: Alexander Froemmgen
-
Publication number: 20130339369Abstract: The present disclosure provides techniques to solve problems (e.g., the low efficiency and a waste of resources) derived from conventional methods. These techniques may include extracting, by a computing device, the first N keywords appearing the most in target information published by target users as target words, and creating an inverted index based on information on a page of the target users and the target words, wherein the inverted index includes a target field and a page information field, and N is an integer. The computing device may receive an inquiry phrase and determine target users matching the inquiry phrase in the inverted index based on the inquiry phrase. The computing device may calculate a relevance between the matched target users and the inquiry phrase through the target field and the page information field, and return a certain result based on the relevance.Type: ApplicationFiled: June 17, 2013Publication date: December 19, 2013Inventors: Yaobing Li, Wei Zheng, Huaxing Jin, Feng Lin
-
Patent number: 8610605Abstract: In one aspect, methods and systems for variable-block length encoding of data, such as an inverted index for a file are disclosed. These methods and systems provide for relatively fast encoding and decoding, while also providing for compact storage. Other aspects include a nearly 1:1 inverted index comprising a position vector and a data store, wherein values that have a unique location mapping are represented directly in the position vector, while for 1:n values (n>1), the position vector can include a pointer, and potentially some portion of information that would typically be stored in the data area, in order to fully use fixed width portions of the position vector (where a maximum pointer size is smaller than a maximum location identifier size).Type: GrantFiled: March 29, 2012Date of Patent: December 17, 2013Assignee: SAP AGInventor: Alexander Froemmgen
-
Patent number: 8612479Abstract: A systems and methods are described detect fraud in existing logs of raw data. There can be several disparate logs, each including data of disparate data types and generated by different and possibly unrelated software enterprise applications. The fraud management system aggregates and organizes the raw log data, extends the raw data with reference data, archives the data in a manner that facilitates efficient access and processing of the data, allows for investigation of potentially fraudulent usage scenarios, and uses the results of the investigation to identify patterns of data that correspond to correspond to high risk usage scenarios and/or process steps. In subsequent processing, archived data can be compared against the identified patterns corresponding to high risk usage scenarios to detect matches, and the invention thereby automatically detects high risk usage scenarios and issues appropriate alerts and reports.Type: GrantFiled: May 15, 2007Date of Patent: December 17, 2013Assignee: FIS Financial Compliance Solutions, LLCInventors: Jwahar R. Bammi, Bagepalli C. Krishna, Robert Posniak, Joseph Walsh
-
Patent number: 8577891Abstract: In response to a search query having a search term received from a client, a current language locale is determined. A state machine is built based on the current language locale, where the state machine includes one or more nodes to represent variance of the search term having identical meaning of the search term. Each node of the state machine is traversed to identify one or more postings lists of an inverted index corresponding to each node of the state machine. One or more item identifiers obtained from the one or more postings list are returned to the client, where the item identifiers identify one or more files that contain the variance of the search term represented by the state machine.Type: GrantFiled: October 27, 2010Date of Patent: November 5, 2013Assignee: Apple Inc.Inventors: John M. Hörnkvist, Eric R. Koebler
-
Publication number: 20130290345Abstract: Inverted indexes for terms and for term separators are separately provided to minimize data redundancy. Search queries are parsed to identify terms and term separators, if any, and the corresponding inverted indexes are searched for responsive documents. Related apparatus, systems, techniques and articles are also described.Type: ApplicationFiled: June 25, 2013Publication date: October 31, 2013Applicant: SAP AGInventors: Frederik Transier, Franz Faerber
-
Patent number: 8566324Abstract: A process is disclosed for the computer management of inverted lists and inverted indices, in which the standard representation and processing of inverted lists is changed in order to achieve a simpler, more compact and more efficient architecture.Type: GrantFiled: September 12, 2010Date of Patent: October 22, 2013Inventor: Giovanni M Sacco
-
Publication number: 20130275436Abstract: Various embodiments promote the discoverability of data that can be contained within a database. In one or more embodiments, data within a database is organized in a structure having a schema. The structure and data can be processed in a manner that renders one or more pseudo-documents each of which constitutes a sub-structure that can be indexed. Once produced and indexed, the pseudo-documents constitute a set of searchable objects each of which relationally points back to its associated structure within the database. Searches can now be performed against the pseudo-documents which, in turn, returns a set of search results. The set of search results can include multiple sub-sets of pseudo-documents, each sub-set of which is associated with a different structure.Type: ApplicationFiled: April 11, 2012Publication date: October 17, 2013Applicant: Microsoft CorporationInventors: Surajit Chaudhuri, Lev Novik, John C. Platt
-
Publication number: 20130262470Abstract: In an inverted list of each node in a taxonomy, among each node, an inverted list of the highest node is a list of integer values indicating an identifier of search subject data, and an inverted list of a node other than the highest node, in place of the identifier, is a list of integer values indicating a position in an inverted list corresponding to a node that is higher by one than the node. Furthermore, a list of integer values in an inverted list of each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before the integer value in the block is converted into a bit string of a variable length integer code.Type: ApplicationFiled: June 16, 2011Publication date: October 3, 2013Applicant: NEC CORPORATIONInventors: Yukitaka Kusumura, Hironori Mizuguchi, Dai Kusui, Yusuke Muraoka
-
Publication number: 20130262471Abstract: A catalog record is bridged to information stored in at least one inverted index by receiving an application user interface call associated with a predetermined filter request including a record identifier identifying a record in a relational database. A bitset is generated based on item identifiers in the record. The bitset is applied to at least one inverted index to obtain metadata associated with the item identifiers.Type: ApplicationFiled: March 27, 2013Publication date: October 3, 2013Applicant: The Echo Nest CorporationInventors: Brian Whitman, Tyler Williams, Hui Ted Cao
-
Patent number: 8549000Abstract: Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.Type: GrantFiled: November 14, 2011Date of Patent: October 1, 2013Assignee: Google Inc.Inventor: Adam J. Weissman
-
Publication number: 20130254211Abstract: Techniques and tools are described for matching documents against monitors. An index can be generated from a plurality of monitors, where the index represents the query logic of the plurality of monitors. The index can be searched using the documents as search queries. The searching can comprise matching the documents against the monitors using the query logic represented in the index. An index can be distributed to a plurality of computing devices to be searched at the plurality of computing devices, where each computing device searches a subset of a plurality of documents against the full index. Searching at the plurality of computing devices can be performed in parallel, and results can be aggregated at a central location.Type: ApplicationFiled: March 23, 2012Publication date: September 26, 2013Applicant: Jive Software, Inc.Inventor: Lance Riedel
-
Patent number: 8538969Abstract: A data format is optimized for storing data such as website traffic data. The data format enables easy access to and filtering of data, for example in generating website traffic reports. The data format also provides significant data compression. A method for generating a data file according to the data format employs linear compression and indexing to efficiently store the data. Data stored according to the format can be easily retrieved, particularly when a known value is specified and particular entries matching the known value are sought.Type: GrantFiled: November 14, 2005Date of Patent: September 17, 2013Assignee: Adobe Systems IncorporatedInventor: Michael Paul Bailey
-
Publication number: 20130238631Abstract: Method, system, and computer program product for indexing and searching entity-relationship data are provided. The method includes: defining a logical document model for entity-relationship data including: representing an entity as a document containing the entity's searchable content and metadata; dually representing the entity as a document and as a category; and representing each relationship instance for the entity as a category set that contains categories of all participating entities in the relationship. The method also includes: translating entity-relationship data into the logical document model; and indexing the entity-relationship data of the populated logical document model as an inverted index. The method may include searching indexed entity-relationship data using a faceted search, wherein the categories are all categories required for supporting faceted navigation.Type: ApplicationFiled: March 11, 2012Publication date: September 12, 2013Applicant: International Business Machines CorporationInventors: David Carmel, Haggai Roitman, Sivan Yogev
-
Patent number: 8533489Abstract: A Searchable Symmetric Encryption (SSE) mechanism is described which allows efficient dynamic updating of encrypted index information. The encrypted index information includes pointer information that is encrypted using a malleable encryption scheme. The SSE mechanism updates the encrypted index information by modifying at least one instance of the pointer information without decrypting the pointer information, and thereby without revealing the nature of the changes being made. In one implementation, the SSE mechanism includes a main indexing structure and a deletion indexing structure. An updating operation involves patching applied to both the main indexing structure and deletion indexing structure.Type: GrantFiled: September 29, 2010Date of Patent: September 10, 2013Assignee: Microsoft CorporationInventors: Thomas M. Roeder, Seny F. Kamara
-
Publication number: 20130232129Abstract: A similarity analysis framework is described herein which leverages two or more similarity analysis functions to generate synonyms for an entity reference string re. The functions are selected such that the synonyms that are generated by the framework satisfy a core set of synonym-related properties. The functions operate by leveraging query log data. One similarity analysis function takes into consideration the strength of similarity between a particular candidate string se and an entity reference string re even in the presence of sparse query log data, while another function takes into account the classes of se and re. The framework also provides indexing mechanisms that expedite its computations. The framework also provides a reduction module for converting long entity reference strings into shorter strings, where each shorter string (if found) contains a subset of the terms in its longer counterpart.Type: ApplicationFiled: June 4, 2012Publication date: September 5, 2013Applicant: MICROSOFT CORPORATIONInventors: Tao Cheng, Kaushik Chakrabarti, Surajit Chaudhuri, Dong Xin
-
Patent number: 8527512Abstract: A method performs a database query in a relational database, the query being carried out by a database engine and being based on user-defined search criteria. The method includes retrieving a number N of properties of a record within a main database table, the number N being higher than zero, creating a search criteria option for each of the N properties, creating a search criteria table for every search criteria option, creating an index for every search criteria table, and performing the database query based on a user-defined combination of a plurality of the search criteria options.Type: GrantFiled: September 17, 2009Date of Patent: September 3, 2013Assignee: Siemens AktiengesellschaftInventor: Frédéric Depreter
-
Patent number: 8510304Abstract: A transactionally consistent indexer is a tiered middleware framework component that updates a transactional index for a data blob according to a data transaction requested by an application. The transactionally consistent indexer determines index entries to be added or removed from a transaction index based on the application request. The transactionally consistent indexer further inserts each index entry to be added into the transaction index. With respect to each index entry to be removed, the transactionally consistent indexer uses a time stamp or version number of the index entry for ensuring optimistic concurrency during deletion. The transactionally consistent indexer then updates a data blob that is associated with each index entry to be added or each index entry to be removed based on the application request.Type: GrantFiled: August 27, 2010Date of Patent: August 13, 2013Assignee: Amazon Technologies, Inc.Inventors: Gregory J. Briggs, Vincent M. Rohr
-
Patent number: 8510306Abstract: Method, system, and computer program product for faceted search with relationships between categories are provided. The method includes: having a document set of multiple documents, each document having associated categories to which it belongs; grouping multiple categories associated with a document into a category set based on a relationship between the multiple categories; associating the category set with the document; and indexing the category set for retrieval of documents from categories sharing a category set. Wherein indexing the category set includes: having an index entry of a textual representations of a category, wherein the index entry includes a single occurrence for each document to which the category is attached; adding a payload to a document occurrence of a serialization of an identifier of the category sets to which the category belongs associated with the document.Type: GrantFiled: May 30, 2011Date of Patent: August 13, 2013Assignee: International Business Machines CorporationInventors: David Carmel, Haggai Roitman, Sivan Yogev
-
Full text search capabilities integrated into distributed file systems— incrementally indexing files
Patent number: 8504565Abstract: A hierarchical distributed search mechanism is integrated into a distributed file system. Traditional file system APIs (create, open, close, read, write, link, rename, delete, . . . ) and the over-the-wire protocols employed to project these APIs into remote client sites (CIFS, NFS, DDS, Appletalk) are extended to enable the dynamic creation of temporary directories containing links to objects identified by search engines (executing at sites “close” to “their” data) as meeting the search criteria specified by the first parameter of a search function call. The search function, derived from the standard file system API function create, is added to the file system API.Type: GrantFiled: September 9, 2005Date of Patent: August 6, 2013Inventor: William M. Pitts -
Patent number: 8504555Abstract: A computing device includes one or more rich internet application (RIA) client engines. Each RIA client engine includes a corresponding private RIA storage area. The computing device also includes a per-RIA public storage area for each RIA. The per-RIA public storage area including a subset of data items in the private RIA storage area of the corresponding RIA client engine. A search engine of the computing device may search the data items in the one or more per-RIA public storage areas and link to content in the private RIA storage area of the corresponding RIA client engine at a given data item matching a search request.Type: GrantFiled: June 25, 2008Date of Patent: August 6, 2013Assignee: Microsoft CorporationInventor: Jonathan C. Hawkins
-
Patent number: 8498972Abstract: Inverted indexes for terms and for term separators are separately provided to minimize data redundancy. Search queries are parsed to identify terms and term separators, if any, and the corresponding inverted indexes are searched for responsive documents. Related apparatus, systems, techniques and articles are also described.Type: GrantFiled: December 16, 2010Date of Patent: July 30, 2013Assignee: SAP AGInventors: Frederik Transier, Franz Faerber
-
Patent number: 8489610Abstract: An information re-organization system includes a plurality of counters coordinated to meaning attributes, and a re-organization incentive notification unit that updates, in case the information stored in preset storage unit has been updated, value of a counter out of the multiple counters that has the meaning attribute associated with contents updated. The information re-organization system also includes an information re-organization processor that executes, in case the value of the counter section updated has met one of a number of predetermined conditions for information re-organization, a processing for information re-organization corresponding to the condition for information re-organization on the information stored in the preset storage unit.Type: GrantFiled: March 27, 2009Date of Patent: July 16, 2013Assignee: NEC CorporationInventor: Masaki Kan
-
Patent number: 8489597Abstract: A method for encoding XML tree data that includes the step of encoding the semi-structured data into strings of arbitrary length in a way that maintains non-structural and structural information about the XML data, and enables indexing the encoded XML data in a way that facilitates efficient search and browsing.Type: GrantFiled: September 1, 2004Date of Patent: July 16, 2013Assignee: Ori Software Development Ltd.Inventors: Moshe Shadmon, Neal Sample, Brian Cooper, Michael J. Franklin
-
Publication number: 20130151534Abstract: The addition of relative term positions, temporal positions, and segment identifiers to an inverted index allows for temporal and phrase queries of multimedia assets. Segment identifiers enable any search results to be examined in context. The system makes advantageous use of Lucene's binary payload functionality to store temporal data and segment identifiers as additional binary data for each term instance in the inverted index. The payloads are made up of three variable-length integers, which account for twelve extra bytes of metadata, which are stored for each term instance. A content database on a Master/Administrator server node provides the indexes for search into content in response to user events, returning results in JSON format. The search results may then be used to locate and present content segments to a user containing both requested search term results and the time location within the multimedia asset in which the search term(s) is found.Type: ApplicationFiled: December 10, 2012Publication date: June 13, 2013Applicant: Digitalsmiths, Inc.Inventor: Digitalsmiths, Inc.
-
Publication number: 20130151533Abstract: Described herein are various technologies pertaining to provision of query suggestions to a user independent of a query log. Key phrases are automatically identified in documents of a document corpus, and a forward index and inverted index are generated. The forward index indexes key phrases by documents, and the inverted index indexes documents by key phrases. A query is received from a user, and documents relevant to the query are retrieved. Key phrases in the retrieved documents are identified via the forward index, and a subset of the key phrases are selected as query suggestions by determining coverage of the key phrases as identified in the inverted index.Type: ApplicationFiled: December 7, 2011Publication date: June 13, 2013Applicant: Microsoft CorporationInventors: Uppinakuduru Raghavendra Udupa, Bhole Abhijit Narendra, Anuj Kumar Goyal, Bjørn Olstad
-
Patent number: 8463742Abstract: Managing data in a data storage system includes: receiving data to be stored in the data storage system; computing values corresponding to different respective portions of the received data; generating identifiers corresponding to different respective portions of the received data, with an identifier corresponding to a particular portion of data including the computed value corresponding to the particular portion of data and metadata indicating a location where the particular portion of data is being stored in the data storage system; storing at least some of the identifiers in an index until the index reaches a predetermined size; and in response to determining that a first identifier corresponding to a first portion of data, received after the index reached the predetermined size, was not already stored in the index before the first portion of data was received, storing the first identifier in the index and designating for removal at least a second identifier corresponding to a second portion of data to beType: GrantFiled: May 27, 2011Date of Patent: June 11, 2013Assignee: Permabit Technology Corp.Inventors: Jered J. Floyd, Michael Fortson, Assar Westerlund, Jonathan Coburn
-
Publication number: 20130138636Abstract: The present disclosure introduces a method and an apparatus for searching images. With respect to each image in an image searching database, respective labels of respective images are generated based on description information corresponding to the respective images. A corresponding relationship between the generated respective labels and the respective images is stored. Based on a received image searching request, description information corresponding to an image for search in the image searching request is obtained. Based on the description information of the image for search, the label of the image for search is generated. Based on the stored corresponding relationship between the respective labels and the respective images, one or more images corresponding to the label of the image for search are determined. The determined one or more images are sent to the client terminal that sends the image searching request.Type: ApplicationFiled: November 21, 2012Publication date: May 30, 2013Applicant: Alibaba Group Holding LimitedInventor: Alibaba Group Holding Limited
-
Publication number: 20130138660Abstract: There are provided methods and systems for efficient search in a peer-to-peer network topology. In various embodiments, search methods and systems provide for response times and network traffic that are independent from the number of query terms, thereby producing constant run-time searches and bandwidth hits in a P2P network search implementation. By distributing inverse indexes between peers, and storing with each inverse index a Bloom filter populated with selected keywords, multi-term search and analysis can be conducted on one network node without requiring exchange of posting lists between various network nodes.Type: ApplicationFiled: January 18, 2013Publication date: May 30, 2013Inventor: Wolf Garbe
-
Publication number: 20130097174Abstract: Tools and techniques related to calculating valence of expressions within documents. These tools may provide methods that include receiving input documents for processing, and extracting expressions from the documents for valence analysis, with scope relationships occurring between terms contained in the expressions. The methods may calculate calculating valences of the expressions, based on the scope relationships between terms in the expressions.Type: ApplicationFiled: December 3, 2012Publication date: April 18, 2013Applicant: MICROSOFT CORPORATIONInventor: Microsoft Corporation