Data Indexing; Abstracting; Data Reduction (epo) Patents (Class 707/E17.002)
-
Publication number: 20120203787Abstract: An information management apparatus includes: a data receiving section, a collected data storage section, an aggregating section, a feature extracting section, a determining section, and an evaluation data storage section. The data receiving section periodically receives action data showing an action of a user. The collected data storage section stores the action data received by the data receiving section every user. The aggregating section generates a data set every user by aggregating action data containing an approximate content, of the action data stored in the collected data storage section. The feature extracting section extracts an index and a reference showing privacy confidentiality of the data set as a feature to incorporate in the data set. The determining section determines whether or not the privacy confidentiality of the feature of the data set is equal to or higher than a predetermined level. The evaluation data storage section stores the data set which passed the determining section.Type: ApplicationFiled: October 7, 2010Publication date: August 9, 2012Applicant: NEC CORPORATIONInventor: Shinya Miyakawa
-
Publication number: 20120203804Abstract: A method for incrementally unloading classes using a region-based garbage collector is described. In one embodiment, such a method includes maintaining a remembered set for a class set. The remembered set indicates whether instances of the class set are contained in one or more regions in memory, and in which regions the instances are contained. Upon performing an incremental garbage collection process for a subset of the regions in memory, the method examines the remembered set to determine whether the class set includes instances in regions outside of the subset. If the remembered set indicates that the class set includes instances outside of the subset of regions, the method identifies the class set as “live.” This will preclude unloading the class set from the subset of regions. A corresponding computer program product and apparatus are also disclosed herein.Type: ApplicationFiled: March 28, 2012Publication date: August 9, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Peter W. Burka, Jeffrey M. Disher, Daryl J. Maier, Aleksandar Micic, Ryan A. Sciampacone
-
Publication number: 20120203741Abstract: Provided are techniques for selecting a first group of indexes to form a current generation of indexes, selecting indexes from the first group biased to indexes with higher fitness values from the current generation of indexes, forming sub-groups of indexes using the selected indexes, determining fitness values of each of the sub-groups based on the fitness value of each of the indexes, selecting a subset of the sub-groups; and placing the indexes in the selected sub-groups into a new generation of indexes.Type: ApplicationFiled: April 17, 2012Publication date: August 9, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Gaurav Mehrotra, Abhinay R. Nagpal, Sandeep R. Patil, Rulesh F. Rebello
-
Publication number: 20120203803Abstract: A method for incrementally unloading classes using a region-based garbage collector is described. In one embodiment, such a method includes maintaining a remembered set for a class set. The remembered set indicates whether instances of the class set are contained in one or more regions in memory, and in which regions the instances are contained. Upon performing an incremental garbage collection process for a subset of the regions in memory, the method examines the remembered set to determine whether the class set includes instances in regions outside of the subset. If the remembered set indicates that the class set includes instances outside of the subset of regions, the method identifies the class set as “live.” This will preclude unloading the class set from the subset of regions. A corresponding computer program product and apparatus are also disclosed herein.Type: ApplicationFiled: February 8, 2011Publication date: August 9, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Peter W. Burka, Jeffrey M. Disher, Daryl J. Maier, Aleksandar Micic, Ryan A. Sciampacone
-
Publication number: 20120197869Abstract: There is provided a computer-implemented method of executing a query plan against a database. An exemplary method comprises accessing a first subset of rows of a database table using a direct access method for an index. The query plan may comprise the direct access method. The exemplary method also comprises determining a processing cost of accessing the first subset of rows. The exemplary method further comprises modifying the direct access method for the index in response to determining that the processing cost exceeds a specified threshold. Additionally, the exemplary method comprises accessing a second subset of rows of the database table using the modified direct access method.Type: ApplicationFiled: April 16, 2012Publication date: August 2, 2012Inventors: David W. Birdsall, Yung-Li L. Jow, Goetz Graefe
-
Publication number: 20120197852Abstract: In particular embodiments, a method includes accessing sensor data from sensor nodes in a sensor network and aggregating the sensor data for communication to an indexer in the sensor network. The aggregation of the sensor data includes deduplicating the sensor data; validating the sensor data; formatting the sensor; generating metadata for the sensor data; and time-stamping the sensor data. The metadata identifies one or more pre-determined attributes of the sensor data. The method also includes communicating the aggregated sensor data to the indexer in the sensor network. The indexer is configured to index the aggregated sensor data according to a multi-dimensional array for querying of the aggregated sensor data along with other aggregated sensor data. One or more first ones of the dimensions of the multi-dimensional array include time and one or more second ones of the dimensions of the multi-dimensional include one or more of the pre-determined sensor-data attributes.Type: ApplicationFiled: January 28, 2011Publication date: August 2, 2012Applicant: CISCO TECHNOLOGY, INC.Inventors: Debojyoti Dutta, Mainak Sen, Manoj Kumar PANDEY, Tarun Banka, Raja Suresh Krishna Balakrishnan
-
Publication number: 20120197853Abstract: A technique for eliminating duplicate data is provided. Upon receipt of a new data set, one or more anchor points are identified within the data set. A bit-by-bit data comparison is then performed of the region surrounding the anchor point in the received data set with the region surrounding an anchor point stored within a pattern database to identify forward/backward delta values. The duplicate data identified by the anchor point, forward and backward delta values is then replaced in the received data set with a storage indicator.Type: ApplicationFiled: April 10, 2012Publication date: August 2, 2012Inventors: Ling Zheng, Roger Stager, Craig Johnston, Don Trimmer, Yuval Frandzel
-
Publication number: 20120197898Abstract: In particular embodiments, a method includes, from an indexer in a sensor network, accessing a set of sensor data that includes sensor data aggregated together from sensors in the sensor network, one or more time stamps for the sensor data, and metadata for the sensor data identifying one or more pre-determined attributes of the sensor data. The method includes, at the indexer, generating an index of the set of sensor data according to a multi-dimensional array configured for querying of the set of sensor data along with a plurality of other sets of sensor data. One or more first ones of the dimensions of the multi-dimensional array include time, and one or more second ones of the dimensions of the multi-dimensional array include one or more of the pre-determined sensor-data attributes. The method includes, from the indexer, communicating the index of the set of sensor data for use in responding to one or more queries of the set of sensor data along with a plurality of other sets of sensor data.Type: ApplicationFiled: January 28, 2011Publication date: August 2, 2012Applicant: CISCO TECHNOLOGY, INC.Inventors: Manoj Kumar Pandey, Tarun Banka, Debojyoti Dutta, Mainak Sen, Raja Suresh Krishna Balakrishnan
-
Publication number: 20120197899Abstract: A method and apparatus for recommending a short message recipient. The method includes parsing history short messages of a user to generate data associated with contacts, constructing a semantic association database by using the data, identifying a critical object in a new short message text of the user, analyzing an association between the critical object and the contacts by using the semantic association database, and recommending a short message recipient to the user according to a strength of association.Type: ApplicationFiled: January 31, 2012Publication date: August 2, 2012Applicant: International Business Machines CorporationInventors: Ying Li, Jing Luo, Zhong Su, Xiao Xun Zhang
-
Publication number: 20120191702Abstract: Adaptive index density in a database management system is provided, which includes receiving a number of partitions for an index for a database table, the index subject to creation. The adaptive index density also includes selecting a column from the database table, the column selected based upon an estimated frequency of execution of database queries for the column. The adaptive index density further includes calculating an estimated cost of executing each of the database queries for the column, and determining data to reside in each of the partitions of the index responsive to the estimated cost.Type: ApplicationFiled: January 26, 2011Publication date: July 26, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: John G. Musial, Abhinay R. Nagpal, Sandeep R. Patil, Yan W. Stein
-
Publication number: 20120189204Abstract: Digital media content from files, streaming data, broadcast data, optical disks, or other storage devices can be linked to Internet information. Identifiers extracted from the media content can be used to direct Internet searches for more information related to the media content.Type: ApplicationFiled: September 29, 2009Publication date: July 26, 2012Inventors: Brian D. Johnson, Michael J. Payne, David B. Andersen, Suri B. Medapati, Michael J. Espig, Cory J. Booth, Kevin J. Murphy, Sharad K. Garg, Barry O'Mahony
-
Publication number: 20120191675Abstract: The present invention relates to an apparatus and method for eliminating duplication of a file in a distributed storage system. The apparatus and method for eliminating duplication of a file in a distributed storage system according to the present invention calculates a hash value of each chunk for an active file; calculates a secondary hash value by adding the hash values calculated for respective chunks; examines duplication of the file using the hash value of each chunk and the secondary hash value; and eliminates a duplicated file depending on a result of the examination.Type: ApplicationFiled: November 4, 2010Publication date: July 26, 2012Applicant: PSPACE INC.Inventors: Kyung-Soo Kim, Jae-Beom Cheon, Joo-Hyun Kim, Bong-sik Sihn, Bong-Joo Jin, Hyoung-Choul Kim, Young-Gyu Kim, Sun Choi, Gu-Yong Lee
-
Publication number: 20120191695Abstract: A local search engine geographically indexes information for searching by identifying a geocoded web page of a web site and identifying at least one geocodable web page of the web site. The system identifies a geocode contained within content of the geocoded web page of the web site. The geocode indicates a physical location of an entity associated with the web site. The system indexes content of the geocoded web page and content of the geocodable web page. The indexing including associating the geocode contained within content of the geocoded web page to the indexed content of the geocoded web page and the geocodable web page to allow geographical searching of the content of the web pages.Type: ApplicationFiled: April 3, 2012Publication date: July 26, 2012Applicant: Local.com CorporationInventor: Xiongwu Xia
-
Publication number: 20120191721Abstract: Method and system for processing a request associated with a user from a requesting node to an answering node in a telecommunications network. A repository is associated with the answering node, the repository including a data structure including a plurality of user profiles associated with a plurality of users. In the answering node a user profile of the plurality of user profiles is associated with the user. The method comprising the steps of assigning a unique user index to each user profile in the data structure, wherein the user index is representative of the location of the user profile within the data structure, communicating at least one user index to the requesting node, incorporating the user index in the request by the requesting node, transmitting the request from the requesting node to the answering node, and retrieving the user profile associated with the user associated with the request by the answering node on the basis of the user index.Type: ApplicationFiled: June 12, 2009Publication date: July 26, 2012Applicant: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)Inventors: Rogier August Caspar Joseph Noldus, Jos Den Hartog
-
Publication number: 20120191673Abstract: A method of coupling a user file name to a physical data file stored within a storage delivery network, includes: assigning a logical file identification value (LFID) to a data file stored in one or more storage nodes and storing the LFID in a computer readable memory; storing in the computer readable memory a node identification value (Node ID) indicative of where the data file is stored among a plurality of geographically distributed storage nodes and associating the Node ID with the LFID; and storing in the computer readable memory a file name for the data file created by a user and associating the file name with the LFID, wherein the LFID correlates the file name with the Node ID transparently to the user and allows the user to access the data file using just the file name.Type: ApplicationFiled: March 30, 2012Publication date: July 26, 2012Applicant: Nirvanix, Inc.Inventors: Scott P. CHATLEY, Thanh T. Phan, Robert S. Palumbo, Troy C. Gatchell, J. Gabriel Gallagher
-
Publication number: 20120191701Abstract: Database tables can have different types of database indices defined for the database tables and different numbers of database indices. The efficiency of reading the indexes can vary with the different profiles of the indexes, which impacts the costs of access plans that use the indexes. Weights can be predefined to reflect the relative efficiencies of the different characteristics. Costs can be computed in accordance with a variety of techniques (e.g., based on edge traversals). The weights can be predefined to reduce costs, increase costs, or a combination thereof. A database management application or associated application or program can also refine or revise these weights based on statistical data gathered about the operation of the database and/or heuristics that are developed based on observations/research. The corresponding weights can be adjusted accordingly.Type: ApplicationFiled: January 26, 2011Publication date: July 26, 2012Applicant: International Business Machines CorporationInventors: Abhinay R. Nagpal, Sandeep R. Patil, Gopikrishnan Varadarajulu
-
Publication number: 20120191667Abstract: A method and system are disclosed for storage optimization. Data parts and metadata within a source data unit are identified and the data parts are compared with data which is already stored in the physical storage space. In case identical data parts are found within the physical storage, the data parts from the source data unit are linked to the identified data, while the data parts can be discarded, thereby reducing the required storage capacity. The metadata parts can be separately stored in a designated storage area.Type: ApplicationFiled: January 20, 2011Publication date: July 26, 2012Applicant: INFINIDAT LTD.Inventors: Haim KOPYLOVITZ, Julian SATRAN, Yechiel YOCHAI
-
Publication number: 20120191672Abstract: Mechanisms are provided for efficiently improving a dictionary used for data deduplication. Dictionaries are used to hold hash key and location pairs for deduplicated data. Strong hash keys prevent collisions but weak hash keys are more computation and storage efficient. Mechanisms are provided to use both a weak hash key and a strong hash key. Weak hash keys and corresponding location pairs are stored in an improved dictionary while strong hash keys are maintained with the deduplicated data itself. The need for having uniqueness from a strong hash function is balanced with the deduplication dictionary space savings from a weak hash function.Type: ApplicationFiled: March 30, 2012Publication date: July 26, 2012Applicant: DELL Products L.P.Inventor: Vinod Jayaraman
-
Publication number: 20120185515Abstract: For integrating diverse databases, a server and universal index are provided to support a lexicon of variable definitions and formatting information. Subscribing databases establish equivalences between local variables and variables in the universal index, either directly or with translation such as a format conversion. For managing qualifying, preliminary processes can analyze database schema and stored variable values to assess likely matches between variables and universal definitions in the lexicon, presented tentatively to the local operator for approval or rejection. Matches can become approved for use in interaction with other subscribing databases. Processes enable the universal lexicon to be revised, e.g., expanded when a variable does not appear to match an existing definition. The universal index server can function as a data intermediary, or as a source of index definitions. Databases can indicate their compliance with the index during transmission of variable data referenced to index definitions.Type: ApplicationFiled: March 28, 2012Publication date: July 19, 2012Applicant: Database Logic Inc.Inventors: Mark Warne Ferrel, Eric Kenneth Barnum
-
Publication number: 20120185446Abstract: In one example embodiment, a method is illustrated as including retrieving item data from a plurality of listings, the item data filtered from noise data, constructing at least one base cluster having at least one document with common item data stored in a suffix ordering, compacting the at least one base cluster to create a compacted cluster representation having a reduced duplicate suffix ordering amongst the clusters, and merging the compacted cluster representation to generate a merged cluster, the merging based upon a first overlap value applied to the at least one document with common item data.Type: ApplicationFiled: February 3, 2012Publication date: July 19, 2012Inventors: Neelakantan Sundaresan, Kavita Ganesan, Roopnath Grandhi
-
Publication number: 20120185487Abstract: A method for establishing content indexes includes: determining the size of a content space; determining a content address space according to the size of the content space; establishing the mapping relationship from the content space to the content address space and obtaining the content address; monitoring the corresponding content address and accepting the content publication or the content acquisition request of the content mapping space, by the content indexing node.Type: ApplicationFiled: March 28, 2012Publication date: July 19, 2012Applicant: Huawei Technologies Co., Ltd.Inventor: Liang Liang
-
Publication number: 20120179684Abstract: A computer program product for an indexer-agnostic index building system includes a computer readable storage medium to store a computer readable program, wherein the computer readable program, when executed on a computer, causes the computer to perform operations for creating a semantically aggregated index. The operations include: extracting documents from a data source, wherein each document includes a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.Type: ApplicationFiled: January 12, 2011Publication date: July 12, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Asim V. Singh, Kevin B. Wang
-
Publication number: 20120179690Abstract: A high-density, distance-measuring laser system and an associated computer that processes the data collected by the laser system. The computer determines a data partition structure and stores that structure as a header file for the scan before data is collected. As the scan progresses, the computer collects data points until a predetermined threshold is met, at which point a block of data consisting of the data points up to the threshold is written to disk. The computer indexes each data block using all three coordinates of its constituent data points using, preferably, a flexible index, such as an R-tree. When a data block is completely filled, it is written to disk preferably with its index and, as a result, each data block is ready for access and manipulation virtually immediately after having been collected. Also, each data block can be independently manipulated and read from disk.Type: ApplicationFiled: March 20, 2012Publication date: July 12, 2012Applicant: LEICA GEOSYSTEMS AGInventors: Mark Damon Wheeler, Barry Joel Schwarz, Richard William Bukowski, Minghua Wu
-
Publication number: 20120179687Abstract: A system and method to generate and maintain controlled growth DAG are described. The controlled growth DAG conveys information about objects captured by a capture system.Type: ApplicationFiled: March 19, 2012Publication date: July 12, 2012Inventor: Weimin Liu
-
Publication number: 20120179688Abstract: A method, apparatus, article of manufacture, and a memory structure for brokering information between a plurality of clients using identifiers defining a plurality of data constructs is disclosed. An exemplary method comprises accepting a new data construct from an authoring entity, assigning a globally unique identifier to the new data construct, storing the new data construct and the assigned globally unique identifier in a database, and brokering between the authoring entity and a second entity commercially distinct from the authoring entity to provide the second entity access to the new data construct by reference to the assigned globally unique identifier of the new data construct or to provide the authoring entity access to an at least one of a plurality of pre-existing data constructs for use with the new data construct by reference to a globally unique identifier of the existing data construct.Type: ApplicationFiled: March 22, 2012Publication date: July 12, 2012Applicant: Herbert Stettin as Chapter II Trustee for Rothstein Rosenfeldt Adler, P.A.Inventor: Baron R.K. Von Wolfsheild
-
Publication number: 20120179668Abstract: A search index structure which extends a typical composite index by incorporating an index which is optimized for fast retrieval from storage and which eliminates data which is specific to phrase searching. Other data is represented in a manner which allows it to be calculated rather than stored. Associating variable length entries with logical categories allows their length to be inferred from the category rather than stored. Using delta values between document IDs rather than the ID itself generates a compact, dense symbol set which is efficiently compressed by Huffman encoding or a similar compression method. Using an upper threshold to remove large, and thus rare, delta values from the symbol set prior to encoding further improves the encoding performance.Type: ApplicationFiled: March 19, 2012Publication date: July 12, 2012Applicant: Microsoft CorporationInventors: Chadd Creighton Merrigan, Mihai Petriuc, Raif Khassanov, Artsiom Ivanovic Kokhan
-
Publication number: 20120173508Abstract: One of the deficiencies of the existing search engines is that the search engines do not evaluate the trustfulness of comments before the searched comments are returned to end users. In addition, existing search engines overlook the analyzing and aggregating of the comments whose subjects are semantically, hierarchically related. Furthermore, as the use of non-textual comments has become popular nowadays, it is highly desirable that such search engines finding and providing comments have the capability to analyze, evaluate and aggregate both textual and non-textual comments, or heterogeneous comments in other words. The purpose of the invention is to overcome the abovementioned deficiencies of the existing search engines that find and provide comments.Type: ApplicationFiled: October 11, 2011Publication date: July 5, 2012Inventor: Cheng Zhou
-
Publication number: 20120173535Abstract: Techniques provided for allowing external access by other users to private information that is maintained on local storage of a computer and owned by an information owner. The private information is uploaded from the local storage to an externally accessible information source that is accessible by the other users. A request from a user to access the private information is received by the owner, who determines whether to allow access the private information. If so, the owner sends a private information sharing authorization to a collaboration orchestrator, which retrieves the private information from the external source and provides the private information to the user. The owner optionally requests to collaborate with the user before deciding whether to allow access to the private information. One or both of the identities of the owner and user can remain anonymous until agreeing on revealing identities. A system and program product is also provided.Type: ApplicationFiled: January 5, 2011Publication date: July 5, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Arun Ramakrishnan, Rohit Shetty
-
Publication number: 20120173504Abstract: The present invention relies on the two-dimensional information in documents and encodes two-dimensional structures into a one-dimensional synthetic language such that two-dimensional documents can be searched at text search speed. The system comprises: an indexing module, a retrieval module, an encoder, a quantization module, a retrieval engine and a control module coupled by a bus. Electronic documents are first indexed by the indexing module and stored as a synthetic text library. The retrieval module then converts an input image to synthetic text and searches for matches to the synthetic text in the synthetic text library. The matches can be in turn used to retrieve the corresponding electronic documents. In one or more embodiments, the present invention includes a method for comparing the synthetic text to documents that have been converted to synthetic text for a match.Type: ApplicationFiled: March 8, 2012Publication date: July 5, 2012Inventor: Jorge Moraleda
-
Publication number: 20120173537Abstract: Systems and methods for retrieving household data based on an origination identifier. In an embodiment, an origination identifier of a communication is captured. The origination identifier is indexed into a master table comprising a plurality of records. Each of the records comprises an association between an origination identifier and a universal database linkage key, and each universal database linkage key comprises an index into one or more databases. A universal database linkage key associated with the captured origination identifier is retrieved and indexed into one or more databases. Household data associated with the captured origination identifier is retrieved from the one or more databases and communicated to at least one recipient.Type: ApplicationFiled: December 23, 2011Publication date: July 5, 2012Applicant: TARGUS INFORMATION CORPORATIONInventors: James D. Shaffer, George G. Moore
-
Publication number: 20120173539Abstract: Methods and systems for managing an index database. In one exemplary method, an index database is stored on a machine readable volume with an operating system and the files which have been indexed, and then the volume is, after the storing, made available for distribution to licensees or customers. In this manner, the volume will include a previously created index database, allowing a user to begin use of the index database without having to perform an indexing operation.Type: ApplicationFiled: March 13, 2012Publication date: July 5, 2012Inventors: Andrew Carol, Yan Arrouye, Dominic Giampaolo
-
Publication number: 20120173536Abstract: A method to index recorded content at a media device includes extracting, at a remote service provider, event index data from an event being recorded at a media device and associating the event index data with locator code data of the event. The method further includes storing, at the remote service provider, the extracted event index data and the associated locator code data; searching the extracted event index data for a plurality of segments associated with the event, the search being associated with a search request; determining index display data for a presentation of the plurality of segments based on the search request; and transmitting, to the media device, the locator code data associated with the plurality of segments, and the index display data.Type: ApplicationFiled: December 1, 2011Publication date: July 5, 2012Applicant: AT&T Intellectual Property I, LPInventors: Behzad Shahraray, David Gibbon, Lee Begeja, Zhu Liu, Richard V. Cox, Bernard S. Renger
-
Publication number: 20120166384Abstract: According to some embodiments, a system, method, means, and/or computer program code are provided to facilitate a display of information on a client device. For example, a server may retrieve first enterprise data from an enterprise database and store the first enterprise data into a first client based cache at the server, the first client based cache being associated with a first user. Similarly, the server may retrieve second enterprise data from the enterprise database and store the second enterprise data into a second client based cache at the server, the second client based cache being associated with a second user. Subsequent to the storing of the first enterprise data, the server may receive a display request from a first client device associated with the first user and transmit the first enterprise data to the first client device.Type: ApplicationFiled: December 22, 2010Publication date: June 28, 2012Inventors: Karl-Peter Nos, Andreas Riehl, Belenki Michael
-
Publication number: 20120166419Abstract: A system, a program product and an associated method is provided for data processing management in a computing environment having at least a processor. The method comprises creating in the memory an invalidation index having a plurality of rows, each row further comprising a search key field, an ID list field for IDs of records associated with the database, and a count value field. Every time a new reference query is received the processor searches for a row in said invalidation index with an already created search key and then decreases count value of a counter when a match is found and when a match is not found creating a new search key and a new row in an associated invalidation index for said new key.Type: ApplicationFiled: September 30, 2011Publication date: June 28, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Miki Enoki, Yohsuke Ozawa, Hiroshi Horii
-
Publication number: 20120166444Abstract: A high level programming language provides a co-map communication operator that maps an input indexable type to an output indexable type according to a function. The function maps an index space corresponding to the output indexable type to an index space corresponding to the input indexable type. By doing so, the co-map communication operator lifts a function on an index space to a function on an indexable type to allow composability with other communication operators.Type: ApplicationFiled: December 23, 2010Publication date: June 28, 2012Applicant: MICROSOFT CORPORATIONInventors: Paul F. Ringseth, Yosseff Levanoni, Lingli Zhang, Weirong Zhu, Donald J. McCrady
-
Publication number: 20120166404Abstract: Systems, methods, and other embodiments associated with real-time text indexing are described. One example method includes receiving a document for indexing in a search system that includes a mature index and indexing the received document in a staging index. The staging index may be stored in direct access memory associated with query processing that does not degrade query performance even when postings become fragmented. The staging index and the mature text index are accessed to process queries on the search system. The example method may also include periodically merging the staging index into the mature index based on query feedback.Type: ApplicationFiled: December 28, 2010Publication date: June 28, 2012Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: Ravi PALAKODETY, Wesley LIN, Mohammad FAISAL, Garret F. SWART
-
Publication number: 20120166448Abstract: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index and/or indexing operations are adaptable to balance deduplication performance savings, throughput and resource consumption. The indexing service may employ hierarchical chunking using different levels of granularity corresponding to chunk size, a sampled compact index table that contains compact signatures for less than all of the hash index's (or subspace's) hash values, and/or selective subspace indexing based on similarity of a subspace's data to another subspace's data and/or to incoming data chunks.Type: ApplicationFiled: December 28, 2010Publication date: June 28, 2012Applicant: MICROSOFT CORPORATIONInventors: Jin Li, Sudipta Sengupta
-
Publication number: 20120166447Abstract: A data set may be distributed over many data stores, and a query may be distributively evaluated by several data stores with the results combined to form a query result (e.g., utilizing a MapReduce framework). However, such architectures may violate security principles by performing sophisticated processing, including the execution of arbitrary code, on the same machines that store the data. Instead of processing queries, a data store may be configured only to receive requests specifying one or more filtering criteria, and to provide the data items satisfying the filtering criteria. A compute node may apply a query by generating a request including one o more filter criteria, providing the request to a data node, and applying the remainder of the query (including sophisticated processing, and potentially the execution of arbitrary code) to the data items provided by the data node, thereby improving the security and efficiency of query processing.Type: ApplicationFiled: December 28, 2010Publication date: June 28, 2012Applicant: Microsoft CorporationInventors: Nir Nice, Daniel Sitton, Dror Kremer, Michael Feldman
-
Publication number: 20120166445Abstract: A method and apparatus are provided for better web ad matching by combining relevance with consumer click feedback. In one example, the method includes receiving a query page, extracting features from the query page, re-weighting the query page, evaluating the query page in light of each ad in order to score each ad and pick substantially best ad matches of the indexed ads, and returning the substantially best ad matches to the consumer computer.Type: ApplicationFiled: March 7, 2012Publication date: June 28, 2012Inventors: Deepayan Chakrabarti, Deepak K. Agrawal, Vanja Josifovski
-
Publication number: 20120158801Abstract: A Java object is scan-missed during the mark phase of a garbage collection cycle. A list of any unscanned objects, comprising all objects of a particular object type, is created during a sweep phase of the garbage collection cycle. After the garbage collection cycle is completed, and the application resumes, for every PUTFIELD/GETFIELD operation on the object type that is part of a specific parent object, a comparison is made with the relevant information in the unscanned objects list. A scan-miss is identified by determining whether the current object being referenced by the application is a part of the unscanned object list that has been created during the sweep phase of the garbage collection cycle.Type: ApplicationFiled: December 15, 2010Publication date: June 21, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: AMAR DEVEGOWDA, CHARLES R. GRACIE, VENKATARAGHAVAN LAKSHMINARAYANACHAR
-
Publication number: 20120158676Abstract: Objects stored in a zip archive may be extracted in random-access fashion (without involving other objects stored in the zip archive) using the addresses of the objects stored in the central directory of the zip archive. However, zip archives often provide insufficient information to enable random access to the data within an object. This capability may be provided by segmenting the object into sections of a section size, and including in the zip archive a block table specifying, for respective sections, the block size of the corresponding block. A zip archive extractor may achieve random access to the object by using the block table to computing the addresses of blocks comprising the selected portion and extracting only those blocks. Backwards compatibility of the zip archive with other zip archive extractors may be preserved by including the block table within a zip extension of the central directory of the zip archive.Type: ApplicationFiled: December 17, 2010Publication date: June 21, 2012Applicant: Microsoft CorporationInventor: Thomas Alan Bouldin
-
Publication number: 20120158714Abstract: A system may include determination of a plurality of data structures associated with an entity, each of the plurality of data structures associated with a respective validity period, determination of a plurality of non-overlapping time periods based on the validity periods, the plurality of non-overlapping time periods collectively spanning all of the validity periods, determination, for each of the plurality of non-overlapping time periods, of a composite data structure based on each of the data structures associated with a validity period including the non-overlapping time period, assignment of a respective document identifier to each composite data structure, each document identifier indicating the entity, and indexing of the composite data structures within an index.Type: ApplicationFiled: December 16, 2010Publication date: June 21, 2012Inventor: Bruno Dumant
-
Publication number: 20120158677Abstract: This can relate to streaming compressed files via a non-volatile memory (“NVM”) of a media player. In particular, the NVM can stream compressed media files. The NVM can include an NVM controller and an NVM die storing the compressed media file. The NVM controller can read the compressed media file from the NVM die, decompress the media file, and send the decompressed media file to a digital-to-analog converter (“DAC”) for conversion to analog format. Since the decompression can be performed by the NVM itself, an application processor may be significantly removed from the media playback process. In some embodiments, it may only be necessary for the application processor to issue an initial read request and/or receive a completion confirmation from the NVM. This can result in significant power savings for the media player and can free the application processor for performing other functions of the media player.Type: ApplicationFiled: December 20, 2010Publication date: June 21, 2012Applicant: Apple Inc.Inventor: Shachar Ron
-
Publication number: 20120158732Abstract: A data marketplace infrastructure provides a crowd sourcing solution to development, discovery and publication of decision applications. Applications can be submitted from a user to a data warehouse in association with a data feed. One or more discovery properties are determined with regard to each application. The applications are made available to other client systems in association with the data feed. A relevant data feed and a relevant application can be identified based on satisfaction of a discovery request by the one or more determined discovery properties of the application. The application can be selected and downloaded to the user for evaluation and customization. The customized application can then be submitted to the data warehouse for publication with the other applications associated with the data feed.Type: ApplicationFiled: December 17, 2010Publication date: June 21, 2012Applicant: MICROSOFT CORPORATIONInventors: Vijay Mital, Max Uritsky, Suraj Poozhiyil, Moe Khosravy, Robert Fries
-
Publication number: 20120158731Abstract: The present invention extends to methods, systems, and computer program products for deriving document similarity indices. Embodiments of the invention include scalable and efficient mechanisms for deriving and updating a document similarity index for a plurality of documents. The number of maintained similarities can be controlled to conserve CPU and storage resources.Type: ApplicationFiled: December 16, 2010Publication date: June 21, 2012Applicant: Microsoft CorporationInventors: Sorin Gherman, Kunal Mukerjee, Adam Prout
-
Publication number: 20120158696Abstract: The claimed subject matter provides a method and a system for the efficient indexing of error tolerant set containment. An exemplary method comprises obtaining a frequency threshold and a query set. All tokens or token sets within the query set are determined, and then all minimal infrequent tokens or all minimal infrequent tokens sets of data records are found and used to build an index. The minimal infrequent tokens or minimal infrequent tokensets are processed in a fixed order, and then a collection of signatures for each minimal infrequent token or token set is determined.Type: ApplicationFiled: December 21, 2010Publication date: June 21, 2012Applicant: Microsoft CorporationInventors: Arvind Arasu, Parag Agrawal, Kaushik Shriraghav
-
Publication number: 20120158733Abstract: A system for storing, managing, and accessing information on a network by providing an interface between a social network and a content network includes an applications platform. The system provides messaging and social networking facility incorporating enhanced instant messaging, file synchronization, network presence, interactive chat capabilities, text messaging, voice and video messaging, blogging, and email. The system includes a viewer, an indexing facility, and a storage facility. The viewer enables users to traverse content and provides services based upon context of time, place, structure, node, and observed user behavior. The viewer provides a means for users to interact with information on the network and services to manipulate information and transact activities. The indexing facility manages the structure of the network and tracks attributes and controlled vocabularies. The indexing facility supports navigation across the structure and resolves the logical index to a physical storage location.Type: ApplicationFiled: June 15, 2011Publication date: June 21, 2012Applicant: PEER FUSION LLCInventors: Robert E. McGILL, Clifford F. BOYLE, Jamie MAZUR, Jason MAZUR, Alex GERUS, Eugene BERKOV, Kunal BHOMICK
-
Publication number: 20120158800Abstract: A method of organizing a data in a database system using a swarm database system that has one or more nodes comprising one or more processors and memory, the memory of the one or more nodes storing one or more programs to be executed by the one or more processors. Identifying data to store in one or more tables on a bucket, wherein the bucket is a allocation of a partitioned storage in a node of the one or more nodes. Assigning to each of the identified data an identifier and a data storage hierarchical level of a plurality of hierarchical levels.Type: ApplicationFiled: December 16, 2011Publication date: June 21, 2012Inventors: Keith PETERS, Bryn Robert Dole, Michael Markson, Robert Michael Saliba, Rich Skrenta, Robert N. Truel, Gregory B. Lindahl
-
Publication number: 20120150862Abstract: A method for enhancing a search of a set of documents is described. The method allows a user to present a word of interest. The word is then matched to related words in a larger corpus of words and the related words are matched against an index of the document to identify words that appear in both the matched words and the document index. The word selected by the user may be taken from a previously generated index of the document or the word may be presented by the user based on a topic of interest.Type: ApplicationFiled: December 13, 2010Publication date: June 14, 2012Applicant: Xerox CorporationInventor: Steven J. Harrington
-
Publication number: 20120150812Abstract: Content license storage is provided by holding, in a temporary license store on the content consumption device, a plurality of content licenses for a plurality of content streams, wherein each content license of the plurality of content licenses includes a removal date. The method further includes for each content license of the plurality of content licenses corresponding to a content stream of the plurality of content streams which is designated for archived playback, copying the content license into an embedded license store within the content stream to form an archived content stream. The method further includes removing one or more of the plurality of content licenses held at the temporary license store if the removal date included in the content license has been reached, while leaving each content license stored within an archived content stream even if the removal date has been reached.Type: ApplicationFiled: December 13, 2010Publication date: June 14, 2012Applicant: MICROSOFT CORPORATIONInventor: Quintin S. Burns