Data Indexing; Abstracting; Data Reduction (epo) Patents (Class 707/E17.002)

E Subclasses

Of chemical information (epo) (Class 707/E17.003)

Of images (epo) (Class 707/E17.004)

Locality Aware, Two-Level Fingerprint Caching

Publication number: 20140101113

Abstract: The present disclosure provides for implementing a two-level fingerprint caching scheme for a client cache and a server cache. The client cache hit ratio can be improved by pre-populating the client cache with fingerprints that are relevant to the client. Relevant fingerprints include fingerprints used during a recent time period (e.g., fingerprints of segments that are included in the last full backup image and any following incremental backup images created for the client after the last full backup image), and thus are referred to as fingerprints with good temporal locality. Relevant fingerprints also include fingerprints associated with a storage container that has good spatial locality, and thus are referred to as fingerprints with good spatial locality. A pre-set threshold established for the client cache (e.g., threshold Tc) is used to determine whether a storage container (and thus fingerprints associated with the storage container) has good spatial locality.

Type: Application

Filed: October 8, 2012

Publication date: April 10, 2014

Applicant: SYMANTEC CORPORATION

Inventors: Xianbo Zhang, Haibin She, Chao Lei, Xiaobing Song, Shuai Cheng
RANKING SUPERVISED HASHING

Publication number: 20140095490

Abstract: Aspects of the present invention provide a tool for hash-based indexing. In an embodiment, a ranked dataset having a plurality of data items is obtained. Every data item in the ranked dataset has a ranking with respect to every other data item in the ranked dataset. A ranking triplet matrix is created based on the ranked dataset. The ranking triplet matrix has a set of ranking triplets, each of which indicates the relative ranking for a pair of the data items in the ranked dataset. This ranking triplet can be merged with a hash table obtained using a standard hash function and the data items can be indexed based on the results.

Type: Application

Filed: September 28, 2012

Publication date: April 3, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Xu Sun, Jun Wang
RANKING SUPERVISED HASHING

Publication number: 20140095512

Abstract: Aspects of the present invention provide a tool for hash-based indexing. In an embodiment, a ranked dataset having a plurality of data items is obtained. Every data item in the ranked dataset has a ranking with respect to every other data item in the ranked dataset. A ranking triplet matrix is created based on the ranked dataset. The ranking triplet matrix has a set of ranking triplets, each of which indicates the relative ranking for a pair of the data items in the ranked dataset. This ranking triplet can be merged with a hash table obtained using a standard hash function and the data items can be indexed based on the results.

Type: Application

Filed: October 4, 2012

Publication date: April 3, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Xu Sun, Jun Wang
EFFICIENT FILE RECLAMATION IN DEDUPLICATING VIRTUAL MEDIA

Publication number: 20140089269

Abstract: Expired files in the deduplicating virtual media are selectively erased using a backup application for notifying a backup repository of which expired files are no longer required. The space of the expired files is reclaimed for reuse. Virtual space of the expired files is reserved for allowing the backup application to seek past the reclaimed space to subsequent data in the deduplicating virtual media.

Type: Application

Filed: September 24, 2012

Publication date: March 27, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shay H. AKIRAV, Michael HIRSCH
Method and Apparatus for Enhancing Electronic Reading by Identifying Relationships between Sections of Electronic Text

Publication number: 20140089315

Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.

Type: Application

Filed: September 24, 2012

Publication date: March 27, 2014

Inventor: Philip R. Krause
Method and Apparatus for Enhancing Electronic Reading by Identifying Relationships between Sections of Electronic Text

Publication number: 20140089316

Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.

Type: Application

Filed: September 24, 2012

Publication date: March 27, 2014

Inventor: Philip R. Krause
LARGE SCALE FILE STORAGE IN CLOUD COMPUTING

Publication number: 20140089273

Abstract: Storing and retrieving files based on hashes for the files. One method for storing files includes: identifying a file; identifying a hash calculated based on the file; renaming the file based on the hash based on the file; and storing the file in a particular location based on the hash calculated based on the file. Another method for retrieving files includes: identifying a hash for a given file; using the hash, traversing a hierarchical file structure to find a location where the given file should be stored; determining that the file is at the location; and as a result, retrieving the file.

Type: Application

Filed: September 27, 2012

Publication date: March 27, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Ronen Borshack, Anil Francis Thomas, Erez Einav, Philip Ernst Taron
CONCURRENT ACCESS METHODS FOR TREE DATA STRUCTURES

Publication number: 20140074841

Abstract: In one embodiment, non-transitory computer-readable medium stores instructions for implementing a file system, which include operations for acquiring an exclusive lock on a first node in an ordered tree data-structure, and adding an identifier and index of the first node to a path data structure. If the value of the index in the first node is non-zero, then each exclusive lock acquired between the first node and the root of the tree data structure is released. In any case, the operation proceeds to a second node, which is addressed at the index on the first node. In one embodiment, operations further include acquiring an exclusive lock on the second node, and, if the second node is a leaf node, performing updates to the second node, and then releasing each exclusive lock in the data-structure.

Type: Application

Filed: October 16, 2012

Publication date: March 13, 2014

Applicant: Apple Inc.

Inventors: David A. Majnemer, Wenguang Wang
REMOTE ARTIFACT REPOSITORY

Publication number: 20140074849

Abstract: System for generating a pseudo-repository. The system scans a directory to detect compiled binary files, and assembles an index of the compiled binary files based on metadata describing the compiled binary files. Then the system generates a pseudo-repository based on the index that maps each compiled binary file with at least one associated artifact, wherein the pseudo-repository responds to client requests for one of the binary files.

Type: Application

Filed: September 7, 2012

Publication date: March 13, 2014

Inventors: Ondrej Zizka, Lukas Fryc
VISUALIZATION OF DATA FROM CLUSTERS

Publication number: 20140074850

Abstract: Embodiments are directed towards the visualization of machine data received from computing clusters. Embodiments may enable improved analysis of computing cluster performance, error detection, troubleshooting, error prediction, or the like. Individual cluster nodes may generate machine data that includes information and data regarding the operation and status of the cluster node. The machine data is received from each cluster node for indexing by one or more indexing applications. The indexed machine data including the complete data set may be stored in one or more index stores. A visualization application enables a user to select one or more analysis lenses that may be used to generate visualizations of the machine data. The visualization application employs the analysis lens to produce visualizations of the computing cluster machine data.

Type: Application

Filed: October 25, 2012

Publication date: March 13, 2014

Applicant: Splunk Inc.

Inventors: Cary Glen Noel, Kirubakaran Pakkirisamy, Alex Raitz, Pierre Tsai
COMPRESSION OF TIMING DATA OF DATABASE SYSTEMS AND ENVIRONMENTS

Publication number: 20140067777

Abstract: Timing data associated with a database or database system can be stored in a reduced or compressed form which can be decompressed back to a full or original form. In doing so, timing data can be compressed by using a subset of a full set of possible values (e.g., a determined range which is more likely to occur) instead of using a full set of possible values. Timing data can also be compressed by eliminating redundant, insignificant duplicate and/or common values, for example, between one or more components (e.g., start and end times of a period of time) of the timing data.

Type: Application

Filed: September 6, 2012

Publication date: March 6, 2014

Inventors: Cameron Lewis, Elizabeth Brealey, Michael Reed
STORAGE AND RETRIEVAL OF SENSOR DATA AND COMPUTED PARAMETERS FOR USE IN CONDITION BASED MAINTENANCE SYSTEMS

Publication number: 20140067821

Abstract: A system and method for storing and accessing data in an embedded system of an aircraft extracts identifiers from headers in stored data, and stores the identifiers in a separately indexable array.

Type: Application

Filed: September 13, 2012

Publication date: March 6, 2014

Applicant: GE AVIATION SYSTEMS LLC

Inventor: Benjamin James Sykes
EFFICIENT XML TREE INDEXING STRUCTURE OVER XML CONTENT

Publication number: 20140067819

Abstract: A method and apparatus are provided for building and using a persistent XML tree index for navigating an XML document. The XML tree index is stored separately from the XML document content, and thus is able to optimize performance through the use of fixed-sized index entries. The XML document hierarchy need not be constructed in volatile memory, so creating and using the XML tree index scales even for large documents. To evaluate a path expression including descendent or ancestral syntax, navigation links can be read from persistent storage and used directly to find the nodes specified in the path expression. The use of an abstract navigational interface allows applications to be written that are independent of the storage implementation of the index and the content. Thus, the XML tree index can index documents stored at least in a database, a persistent file system, or as a sequence of in memory.

Type: Application

Filed: September 5, 2012

Publication date: March 6, 2014

Applicant: ORACLE INTERNATIONAL CORPORATION

Inventors: Anguel Novoselsky, Zhen Hua Liu, Thomas Baby
Hardware accelerated application-based pattern matching for real time classification and recording of network traffic

Patent number: 8666985

Abstract: An indexing database utilizes a non-transitory storage medium. A pattern matching processing unit generates preclassification data for the network data packets utilizing pattern matching analysis. At least one processing unit implements a storage process that receives the network data packets, stores the network data packets in at least one of the slots, and transfers the network data packets to a packet capture repository when slots in a shared memory are full. A preclassification process requests from the pattern matching processing unit the preclassification data. An indexing process determines, based upon the preclassification data, whether to invoke or omit additional analysis of the network data packets, and performs at least one of aggregation, classification, or annotation of the network data packets in the shared memory to maintain one or more indices in the indexing database.

Type: Grant

Filed: March 15, 2012

Date of Patent: March 4, 2014

Assignee: Solera Networks, Inc.

Inventors: Matthew S. Wood, Joseph H. Levy, McKay Marston
INDEXING PREVIEW

Publication number: 20140052733

Abstract: Embodiments are directed towards previewing results generated from indexing data raw data before the corresponding index data is added to an index store. Raw data may be received from a preview data source. After an initial set of configuration information may be established, the preview data may be submitted to an index processing pipeline. A previewing application may generate preview results used on the preview index data and the configuration information. The preview results may enable previewing how the data is being processed by the indexing application. If the preview results are not acceptable, the configuration information may be modified. The preview application enables modification of the configuration information until the generated preview results may be acceptable. If the configuration information is acceptable, the preview data may be processed and indexed in one or more index stores.

Type: Application

Filed: August 17, 2012

Publication date: February 20, 2014

Applicant: Splunk Inc.

Inventors: Mitchell Neuman Blank, JR., Leonid Budchenko, David Carasso, Micah James Delfino, Johnvey Hwang, Stephen Phillip Sorkin, Eric Timothy Woo
Virtual Machine Image Access De-Duplication

Publication number: 20140052698

Abstract: A system and an article of manufacture for de-duplicating virtual machine image accesses include identifying one or more identical blocks in two or more images in a virtual machine image repository, generating a block map for mapping different blocks with identical content into a same block, deploying a virtual machine image by reconstituting an image from the block map and fetching any unique blocks remotely on-demand, and de-duplicating virtual machine image accesses by storing the deployed virtual machine image in a local disk cache.

Type: Application

Filed: August 17, 2012

Publication date: February 20, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Han Chen, Alexei A. Karve, Minkyong Kim, Andrzej P. Kochut, Hui Lei, Jayaram Kallapalayam Radhakrishnan, Zhiming Shen, Zhe Zhang
Analytics Data Indexing System and Methods

Publication number: 20140052732

Abstract: Provided is a method that includes a method for updating index data. The method includes receiving index data, including an index value indicative of user activity on a network site and an index time corresponding to a time used for calculating the index value, receiving an update index time corresponding to a time used for updating the index data, determining an updated index value using an exponential decay of the index value from the index time to the update index time, wherein the updated index value comprises a decayed value of the index value corresponding to the update time, and storing updated index data including the updated index value and the update index time.

Type: Application

Filed: August 26, 2011

Publication date: February 20, 2014

Inventor: William R. Softky
ESTIMATION OF DATA REDUCTION RATE IN A DATA STORAGE SYSTEM

Publication number: 20140052699

Abstract: Systems and methods for estimating data reduction ratio for a data set is provided. The method comprises selecting a plurality of m elements from a data set comprising a plurality of N elements; associating an identifier hi for each of the plurality of m elements; associating an identifier he for each of the plurality of elements in the data set; tracking number of times an element i appears in a base set that includes the plurality of m elements selected from the data set; calculating a value counti that indicates the number of times an identifier he matches an identifier hi; and estimating data reduction ratio for the plurality of N elements in the data set, based on number of m number elements selected from the data set and the value counti.

Type: Application

Filed: August 20, 2012

Publication date: February 20, 2014

Applicant: International Business Machines Corporation

Inventors: Danny Harnik, Oded Margalit, Dalit Naor, Dmitry Sotnikov, Gil Vernik
AUTOMATED SUBSTITUTION OF TERMS BY COMPOUND EXPRESSIONS DURING INDEXING OF INFORMATION FOR COMPUTERIZED SEARCH

Publication number: 20140046951

Abstract: Methods, software and devices for indexing responses for later providing to users in response to queries are disclosed. For each stored response, representative queries are stored in association with that response, where each representative query represents a possible query for searching for information addressed by that response. Representative queries are selectively modified by substituting terms by corresponding chosen substitute expressions, where a substitute expression is chosen for a particular term in one of the representative queries based on past substitutions in others of said representative queries. For each response, a Boolean expression is formed from those representative queries associated with that response, as selectively modified, where the Boolean expression is satisfied by each of those representative queries.

Type: Application

Filed: August 8, 2012

Publication date: February 13, 2014

Applicant: Intelliresponse Systems Inc.

Inventors: Darren Redfern, Chad Ternent
DE-DUPLICATING ATTACHMENTS ON MESSAGE DELIVERY AND AUTOMATED REPAIR OF ATTACHMENTS

Publication number: 20140046911

Abstract: Systems and techniques of de-duplicating file and/or blobs within a file system are presented. In one embodiment, an email system is disclosed wherein the email system receives email messages comprising a set of associated attachments. The system determines whether the associated attachments have been previously stored in the email system, the state of the stored attachment, and if the state of the attachment is appropriate for sharing copies of the attachment, then providing a reference to the attachment upon a request to share the attachment. In another embodiment, the system may detect whether stored attachments are corrupted and, if so, attempt to repair the attachment, and possibly, prior to sharing references to the attachment.

Type: Application

Filed: August 13, 2012

Publication date: February 13, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Kristof Roomp, Gruia Pitigoi-Aron, Ivaylo Dimitrov, Brandon Pai, Cheng Ho, Kumar Pasumarthy, Lincoln Liu, Alok Dhariwal, John Rodrigues
AGGREGATING DATA IN A MEDIATION SYSTEM

Publication number: 20140040213

Abstract: Records received from one or more sources in a network are processed. For each of multiple intervals of time, a matching procedure is attempted on sets of one or more records, including comparing identifiers associated with different records to generate the sets and determining whether or not a completeness criterion is satisfied for one or more of the sets. The processing also includes, for at least some of the intervals of time, processing at least one complete set, consisting of one or more of the received records on which the matching procedure is first attempted during the interval of time and one or more records stored in a data store before the interval of time, and for at least some of the intervals of time, processing at least one incomplete set, consisting of one or more records stored in the data store before the interval of time.

Type: Application

Filed: August 2, 2012

Publication date: February 6, 2014

Applicant: Ab Initio Software LLC

Inventor: Larry Paul Rossi
DE-DUPLICATION USING A PARTIAL DIGEST TABLE

Publication number: 20140032507

Abstract: Data de-duplication is done on a data set. The data de-duplication is done using a partial digest table. Some digests are selective removed from the partial digest table when a pre-determined condition occurs.

Type: Application

Filed: July 26, 2012

Publication date: January 30, 2014

Inventors: Douglas L. Voigt, Siamak Nazari
SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS FOR REDUCING HASH TABLE WORKING-SET SIZE FOR IMPROVED LATENCY AND SCALABILITY IN A PROCESSING SYSTEM

Publication number: 20140032569

Abstract: System, method and computer program products for storing data by computing a plurality of hash functions of data values in a data item, and determining a corresponding memory location for one of the plurality of hash functions of data values in the data item. Each memory location is of a cacheline size wherein a data item is stored in a memory location. Each memory location can store a plurality of data items. A key portion of all data items is contiguously stored within the memory location, and a payload portion is contiguously stored within the memory location. Payload portions are packed as bit-aligned in a fixed-sized memory location, comprising a bucket in a bucketized hash table, each bucket sized to store multiple key portions and payload portions that are packed as bit-aligned in a fixed-sized bucket. Corresponding key portions are stored as compressed keys in said fixed-sized bucket.

Type: Application

Filed: July 25, 2012

Publication date: January 30, 2014

Applicant: International Business Machines Corporation

Inventors: Min-Soo Kim, Lin Qiao, Vijayshankar Raman, Eugene J. Shekita
SYSTEM AND METHOD FOR COMBINING DEDUPLICATION AND ENCRYPTION OF DATA

Publication number: 20140032925

Abstract: The embodiments herein relate to data management and, more particularly, to global deduplication and encryption of data in data management systems. The user equipments (UE) are grouped under certain deduplication groups based on certain parameters such as rate of data exchange, frequency of data exchange, social closeness, work closeness, similarity of data and interests and so on, between those UEs. Further, specific deduplication and encryption parameters such as encryption method, encryption key, signature computation method, block computation method and so on are assigned to each group. Further, deduplication and encryption of data in each group is performed using the deduplication and encryption modes and parameters assigned to each group. The deduplication and encryption of data is performed in at least one of the UEs and/or a server. Further, the parameters used for deduplication and encryption are stored in specific databases and are encrypted for better security.

Type: Application

Filed: July 25, 2012

Publication date: January 30, 2014

Inventors: Ankur Panchbudhe, Anand A. Kekre
APPARATUS AND METHODS FOR USER GENERATED CONTENT INDEXING

Publication number: 20140032562

Abstract: A method and client device is disclosed for indexing content of a multimedia file. The method comprises using a client device to segment the content of the multimedia file into a plurality of segments and to determine structure-searchable data for each segment. Determining structure searchable data for a segment comprises (1) identifying one or more features of respective multimedia types in the segment; (2) correlating each of the identified features to one or more respective keywords; and (3) calculating one or more respective relevance factors for each of the keywords, where at least one of the relevance factors is based on one or more characteristics of the client device. The method also comprises the client device transmitting the structure-searchable data (including the keywords, relevance factors, and respective media types of the identified features) to an indexing server.

Type: Application

Filed: July 26, 2012

Publication date: January 30, 2014

Applicant: Telefonaktiebolaget LM Ericsson (publ)

Inventors: Tommy ARNGREN, David LINDEGREN, Joakim SÖDERBERG, Marika STÅLNACKE
Automated Remediation with an Appliance

Publication number: 20140032449

Abstract: In one embodiment, a method includes receiving information associated with the operation of one or more network devices, indexing the information for analysis, analyzing the information to determine a pattern in the information, generating one or more labels for at least a portion of the information based at least in part on the pattern, and making the information and labels available to a remediation system.

Type: Application

Filed: July 27, 2012

Publication date: January 30, 2014

Applicant: DELL PRODUCTS L.P.

Inventors: Martin Kacin, David Douglas Kloba
HISTORICAL VIEW OF OPEN FILES

Publication number: 20140019455

Abstract: Managing versions of an electronic entity comprising many independently managed, but mutually-dependent, subcomponents can be challenging. File management functionality is provided for use with an integrated development environment to produce a visual indication of the relationships among the subcomponents. The approach described herein provides an improvement over source code control systems and backup systems in the ability to revert the state of one or more files as their content existed at an historical time point. The technique does not require a user to predict in advance at which time points the content state of one or more files will be interesting as historical time points for future use.

Type: Application

Filed: July 12, 2012

Publication date: January 16, 2014

Applicant: Oracle International Corporation

Inventor: Neil James Cochrane
STORY ELEMENT INDEXING AND USES THEREOF

Publication number: 20140019893

Abstract: A story index of story elements is provided in which each story element is able to be referenced in a story by name and by language that does not include the name. The story index may also contain references to the same story elements in other associated stories, including other stories in a series or that are in a different type of media. An associated story presentation application program may enable a viewer to view the entries in the story index for a specified story element and to then view the specified story element at any of the referenced locations. The application may enable purchase or downloading of the associated stories.

Type: Application

Filed: July 11, 2012

Publication date: January 16, 2014

Applicant: Cellco Partnership d/b/a Verizon Wireless

Inventors: Agust K. GUDMUNDSSON, Virginia Benson Chanda
FILE SERVER AND FILE MANAGEMENT METHOD

Publication number: 20140019425

Abstract: The file server identifies two or more files, each including duplicated data among a plurality of files that have been stored into the logical storage device as a file group based on the file system information. The file server deletes copies of the duplicated data other than shared data that is one copy of the duplicated data included in the two or more files from the logical storage device. The file server makes a file, which is not a shared file of the file group, referring to the shared file that is a file configured by the shared data. The file server creates a group link that associates the m files that belong to the file group with each other.

Type: Application

Filed: July 10, 2012

Publication date: January 16, 2014

Inventors: Koji Honami, Masahiro Shimizu
Automated Electronic Discovery Collections and Preservations

Publication number: 20140012832

Abstract: A computer-implemented method, system and computer program product for collecting information from data sources by receiving a collection request at a collection tool to collect information, where the collection request includes data source information indicating a data source from which to retrieve the information. The data source information in the collection request is associated with one or more electronic data repositories in response to the data source indicated by the data source information being previously unidentified to the collection tool. The information is collected from the one or more associated electronic data repositories.

Type: Application

Filed: July 6, 2012

Publication date: January 9, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Roman Kisin, Andrey Pogodin, Pierre Raynaud-Richard
MINIMIZATION OF EPIGENETIC SURPRISAL DATA OF EPIGENETIC DATA WITHIN A TIME SERIES

Publication number: 20140006365

Abstract: A method, computer program product and system of minimizing epigenetic surprisal data either by comparing epigenetic surprisal data to a fixed baseline epigenetic data, so that all of the comparisons were made to the same baseline epigenetic data or by comparing epigenetic surprisal data to a rolling baseline of epigenetic surprisal data—that is, after each comparison the baseline is changed to the data from the time point which had been compared previously.

Type: Application

Filed: June 29, 2012

Publication date: January 2, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Robert R. Friedlander, James R. Kraemer
MEDIA STREAM INDEX MERGING

Publication number: 20140006364

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for merging media stream indexes of a media stream are described in this specification. In one aspect, a method includes receiving a first media stream index at a first server system, including a first list of sequentially arranged fragment identifiers corresponding to at least a portion of multiple sequentially arranged fragments. Fragment identifiers that are potentially missing from the first index can be identified. A second media stream index including a second list of sequentially arranged fragment identifiers corresponding to at least a portion of the multiple sequentially arranged fragments can be requested from a second server system. The first and second list of the sequentially arranged fragment identifiers can be compared and the first list of sequentially arranged fragment identifiers can be reconstructed based on the comparison.

Type: Application

Filed: June 28, 2012

Publication date: January 2, 2014

Applicant: ADOBE SYSTEMS INCORPORATED

Inventors: Glenn Eguchi, Asa Whillock, Kevin Streeter, Mohammed Pithapurwala, Noam Lorberbaum, Seth Hodgson, Srinivas Manapragada
METHOD AND APPARATUS FOR MULTIDIMENSIONAL DATA STORAGE AND FILE SYSTEM WITH A DYNAMIC ORDERED TREE STRUCTURE

Publication number: 20140006411

Abstract: An approach is provided to determine one or more dynamic ordered tree structures and transition tree structures (e.g., based on one or more transitions of a device) to facilitate querying and/or accessing data stores. An apparatus and method determines to generate at least one index structure, determines to associate index objects of the generated index structure with one or more data objects of at least one data store, determines to generate at least one transition index structure based on the at least one generated index structure, and determines to associate the transition index structure with index objects corresponding to one or more data objects of at least one data store based on a transition of a device. Also, the method and apparatus determines to generate at least one query, and determines to generate at least one transition index structure where a current index structure to resolve the query is absent.

Type: Application

Filed: June 29, 2012

Publication date: January 2, 2014

Applicant: Nokia Corporation

Inventors: Sergey Boldyrev, Pavandeep Kalra
OPTIMIZED DATA PLACEMENT FOR INDIVIDUAL FILE ACCESSES ON DEDUPLICATION-ENABLED SEQUENTIAL STORAGE SYSTEMS

Publication number: 20140006363

Abstract: Data deduplication for data storage tapes comprises determining the read throughput of a deduplicated set of individual files on a single data storage tape, and determining a placement of deduplicated file data on a single data storage tape to reduce an average number of per-file gaps on the tape. Deduplicated file data is placed on the single data storage tape based on said placement to increase an average read throughput for a deduplicated set of individual files.

Type: Application

Filed: June 29, 2012

Publication date: January 2, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: MIHAIL C. CONSTANTINESCU, ABDULLAH GHARAIBEH, MAOHUA LU, DAVID A. PEASE, ANURAG SHARMA
Low-Overhead Enhancement of Reliability of Journaled File System Using Solid State Storage and De-Duplication

Publication number: 20140006362

Abstract: A mechanism is provided in a data processing system for reliable asynchronous solid-state device based de-duplication. Responsive to receiving a write request to write data to the file system, the mechanism sends the write request to the file system, and in parallel, computes a hash key for the write data. The mechanism looks up the hash key in a de-duplication table. The de-duplication table is stored in a memory or a solid-state storage device. Responsive to the hash key not existing in the de-duplication table, the mechanism writes the write data to a storage device, writes a journal transaction comprising the hash key, and updates the de-duplication table to reference the write data in the storage device.

Type: Application

Filed: June 28, 2012

Publication date: January 2, 2014

Applicant: International Business Machines Corporation

Inventors: Ranjit M. Noronha, Ajay K. Singh
INDEX REPLICATION IN DISTRIBUTED SEARCH ENGINES

Publication number: 20140006349

Abstract: Briefly, embodiments of methods or systems to replicate indexes in distributed search engines are described.

Type: Application

Filed: June 28, 2012

Publication date: January 2, 2014

Applicant: Yahoo! Inc.

Inventors: Vincent Leroy, Matthieu Morel, Flavio Junqueira
Correlation Engine and Method for Granular Meta-Content Having Arbitrary Non-Uniform Granularity

Publication number: 20130346414

Abstract: One disclosed method includes receiving correlation instructions related to a plurality of meta-content elements that are associated with a primary content. The primary content may be multimedia content such as, but not limited to, an audiovisual content. The method includes performing a correlation in response to receiving the instructions. The correlation is between the meta-content elements, where the meta-content elements each have an arbitrary granularity defining meta-content segments. The method returns a result based on the correlation. Another disclosed method include receiving a request having correlation instructions related to a plurality of meta-content elements, where the meta-content elements are associated with a primary content. Again, each meta-content element has an arbitrary granularity defining meta-content segments.

Type: Application

Filed: June 21, 2012

Publication date: December 26, 2013

Applicant: General Instrument Corporation

Inventors: Alfonso Martinez Smith, Paul C. Davis, Joshua B. Hurwitz, Douglas A. Kuhlman, Hiren M. Mandalia, Loren J. Rittle, Krunal S. Shah
MEMORY COMPACTION MECHANISM FOR MAIN MEMORY DATABASES

Publication number: 20130346378

Abstract: The present invention extends to methods, systems, and computer program products for performing memory compaction in a main memory database. The main memory database stores records within pages which are organized in doubly linked lists within partition heaps. The memory compaction process uses quasi-updates to move records from a page to the emptied to an active page in a partition heap. The quasi-updates create a new version of the record in the active page, the new version having the same data contents as the old version of the record. The creation of the new version can be performed using a transaction that employs wait for dependencies to allow the old version of the record to be read while the transaction is creating the new version thereby minimizing the effect of the memory compaction process on other transactions in the main memory database.

Type: Application

Filed: June 21, 2012

Publication date: December 26, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Dimitrios Tsirogiannis, Per-Ake Larson
PACKING DEDUPLICATED DATA INTO FINITE-SIZED CONTAINERS

Publication number: 20130339316

Abstract: Deduplicated data is packed into finite-sized containers. A similarity score is calculated between files that are similarly of the deduplicated data. The similarity score is used for grouping the similarly compared files of the deduplicated data into subsets for destaging each of the subsets from a deduplication system to one a finite-sized container.

Type: Application

Filed: June 19, 2012

Publication date: December 19, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael HIRSCH, Thorsten KRAUSE
PROTOCOL COMPLIANT ARCHIVING

Publication number: 20130339308

Abstract: Disclosed herein are techniques for archiving data objects. It is determined whether a data object was rejected by an archiving module due to an information field thereof violating a protocol. If it is determined that the data object was rejected due to violation of the protocol, a compliant information field that complies with the protocol is generated such that the compliant information field causes the archiving module to permit archiving of the data object violating the protocol.

Type: Application

Filed: June 19, 2012

Publication date: December 19, 2013

Inventors: Richard Herschel Schwartz, Tarcio Constant, Scott Alan Lemieux
REDUCING DECOMPRESSION LATENCY IN A COMPRESSION STORAGE SYSTEM

Publication number: 20130339322

Abstract: In a compression processing storage system, using a pool of compression cores, the compression cores are assigned to process either compression operations, decompression operations, or decompression and compression operations, which are scheduled for processing. A maximum number of the compression cores are set for processing only the decompression operations, thereby lowering a decompression latency. A minimal number of the compression cores are allocated for processing the compression operations, thereby increasing compression latency. Upon reaching a throughput limit for the compression operations that causes the minimal number of the plurality of compression cores to reach a busy status, the minimal number of the plurality of compression cores for processing the compression operations is increased.

Type: Application

Filed: June 14, 2012

Publication date: December 19, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan AMIT, Amir LIDOR, Sergey MARENKOV, Rostislav RAIKHMAN
METHOD, SYSTEM, AND COMPUTER-READABLE MEDIUM FOR PROVIDING A SCALABLE BIO-INFORMATICS SEQUENCE SEARCH ON CLOUD

Publication number: 20130339321

Abstract: The present invention relates to a computer-implemented method, system and computer readable medium for providing a scalable bio-informatics sequence search on cloud. The method comprises the steps of partitioning a genome data into a plurality of datasets and storing the plurality of data sets in a database. Receiving at least one sequence search request input and searching for a genome sequence in the database corresponding to the search request input and scaling of the sequence search based on the sequence search request input.

Type: Application

Filed: June 13, 2012

Publication date: December 19, 2013

Applicant: Infosys Limited

Inventors: S/shri. Shyam Kumar Doddavula, Madhavi Rani, Anirban Ghosh, Akansha Jain, Santonu Sarkar, Mudit Kaushik, Harsh Vachhani
Linking Data Elements Based on Similarity Data Values and Semantic Annotations

Publication number: 20130332467

Abstract: Data elements from data sources and having a data value set are linked by using hash functions to determine a dimensionally reduced instance signature for each data element based on all data values associated with that data element to yield a plurality of dimensionally reduced instance signatures of equivalent fixed size such that similarities among the data values in the data value sets across all data elements is maintained among the plurality of instance signatures. Candidate pairs of data elements to link are identified using the plurality of instance signatures in locality sensitive hash functions, and a similarity index is generated for each candidate pair using a pre-determined measure of similarity. Candidate pairs of data elements having a similarity index above a given threshold are linked.

Type: Application

Filed: July 8, 2012

Publication date: December 12, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Mihaela Ancuta Bornea, Songyun Duan, Achille Belly Fokoue-Nkoutche, Oktie Hassanzadeh, Anastasios Kementsietsidis, Kavitha Srinivas, Michael J. Ward
EFFICIENT PARTITIONING TECHNIQUES FOR MASSIVELY DISTRIBUTED COMPUTATION

Publication number: 20130332446

Abstract: A repartitioning optimizer identifies alternative repartitioning strategies and selects optimal ones, accounting for network transfer utilization and partition sizes in addition to traditional metrics. If prior partitioning was hash-based, the repartitioning optimizer can determine whether a hash-based repartitioning can result in not every computing device providing data to every other computing device. If prior partitioning was range-based, the repartitioning optimizer can determine whether a range-based repartitioning can generate similarly sized output partitions while aligning input and output partition boundaries, increasing the number of computing devices that do not provide data to every other computing device. Individual computing devices, as they are performing a repartitioning, assign a repartitioning index to each individual data element, which represents the computing device to which such a data element is destined.

Type: Application

Filed: June 11, 2012

Publication date: December 12, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Jingren Zhou, Nicolas Bruno, Wei Lin
GENERATING CONTENT RECOMMENDATIONS

Publication number: 20130332462

Abstract: A system to generate content recommendations by identifying content and selecting a content entry for the content. The system comprises identifying a keyword in the content entry, generating a tag for the content based on the keyword, generating a plurality of recommendations based on the tag, and displaying the recommendations.

Type: Application

Filed: June 12, 2012

Publication date: December 12, 2013

Inventors: David Paul Billmaier, Jason Christopher Hall, Alexander Charies Barclay, John Max Kellum, Henry Hideyuki Yamamoto
SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS FOR FAST AND SCALABLE PROXIMAL SEARCH FOR SEARCH QUERIES

Publication number: 20130318090

Abstract: Embodiments of the invention provide a system, method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.

Type: Application

Filed: May 24, 2012

Publication date: November 28, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
DATA DEPULICATION USING SHORT TERM HISTORY

Publication number: 20130318050

Abstract: Exemplary system, and computer program product embodiments for data deduplication using short term history in a computing environment are provided. In one embodiment, by way of example only, a hash value is calculated on data chunks for a read operation. The calculated hash value is stored in a storage media. The calculated hash value is looked up in the storage media to verify if a current write operation was previously written and/or read. Additional system and computer program product embodiments are disclosed and provide related advantages.

Type: Application

Filed: May 24, 2012

Publication date: November 28, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan AMIT, Chaim KOIFMAN
CONTEXT SENSITIVE REUSABLE INLINE DATA DEDUPLICATION

Publication number: 20130311432

Abstract: A computer identifies a relationship among a subset of a set of data blocks, a basis of the relationship forming a context shared by the subset of data blocks. The computer selects a code data structure from a set of code data structures using the context. The context is associated with the code data structure, and the code data structure includes a set of codes. The computer computes, for a first data block in the subset of data blocks, a first code corresponding to a content of the first data block. The computer determines whether the first code matches a stored code in the code data structure. The computer replaces, responsive to the first code matching the stored code, the first data block with a reference to an instance of the first data block. The computer causes the reference to be stored in a target data processing system.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Vishal Chittranjan Aslot, Adekunle Bello, Brian W. Hart, Robert Wright Thompson
Determining a Cause of an Incident Based on Text Analytics of Documents

Publication number: 20130311479

Abstract: According to one embodiment of the present invention, a system analyzes one or more change records based on text analytics using dictionaries and rules for the analysis in order to generate an index of analyzed data that represents the one or more change records. The change records each include a change and corresponding time frame for occurrence of the change. Information from a request is applied to the index of analyzed data to determine one or more candidate causes for the incident and the corresponding time frame for occurrence of the change. A time associated with the request is correlated with the corresponding time frame for occurrence of the change to identify the one or more candidate causes in the one or more change records as causes for the incident. Embodiments of the present invention further include a method and computer program product for determining causes of an incident.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Applicant: International Business Machines Corporation

Inventors: Dhruv A. Bhatt, Kristin E. McNeil, Nitaben A. Patel
MINIMIZATION OF SURPRISAL DATA THROUGH APPLICATION OF HIERARCHY FILTER PATTERN

Publication number: 20130311435

Abstract: A method, computer product, and computer system of minimizing surprisal data comprising: at a source, reading and identifying characteristics of a genetic sequence of an organism; receiving an input of rank of at least two identified characteristics of the genetic sequence of the organism; generating a hierarchy of ranked, identified characteristics based on the rank of the at least two identified characteristics of the genetic sequence of the organism; comparing the hierarchy of ranked, identified characteristics to a repository of reference genomes; and if at least one reference genome from the repository matches the hierarchy of ranked, identified characteristics, breaking the matched reference genomes into pieces, combining pieces associated with the identified characteristics from at least one matched reference genome to form a filter pattern to be compared to the nucleotides of the genetic sequence of the organism, to obtain differences and create surprisal data.

Type: Application

Filed: June 8, 2012

Publication date: November 21, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Robert R. Friedlander, James R. Kraemer

prev 1 2 3 4 5 6 … next