Data Indexing; Abstracting; Data Reduction (epo) Patents (Class 707/E17.002)
-
Publication number: 20140101113Abstract: The present disclosure provides for implementing a two-level fingerprint caching scheme for a client cache and a server cache. The client cache hit ratio can be improved by pre-populating the client cache with fingerprints that are relevant to the client. Relevant fingerprints include fingerprints used during a recent time period (e.g., fingerprints of segments that are included in the last full backup image and any following incremental backup images created for the client after the last full backup image), and thus are referred to as fingerprints with good temporal locality. Relevant fingerprints also include fingerprints associated with a storage container that has good spatial locality, and thus are referred to as fingerprints with good spatial locality. A pre-set threshold established for the client cache (e.g., threshold Tc) is used to determine whether a storage container (and thus fingerprints associated with the storage container) has good spatial locality.Type: ApplicationFiled: October 8, 2012Publication date: April 10, 2014Applicant: SYMANTEC CORPORATIONInventors: Xianbo Zhang, Haibin She, Chao Lei, Xiaobing Song, Shuai Cheng
-
Publication number: 20140095490Abstract: Aspects of the present invention provide a tool for hash-based indexing. In an embodiment, a ranked dataset having a plurality of data items is obtained. Every data item in the ranked dataset has a ranking with respect to every other data item in the ranked dataset. A ranking triplet matrix is created based on the ranked dataset. The ranking triplet matrix has a set of ranking triplets, each of which indicates the relative ranking for a pair of the data items in the ranked dataset. This ranking triplet can be merged with a hash table obtained using a standard hash function and the data items can be indexed based on the results.Type: ApplicationFiled: September 28, 2012Publication date: April 3, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Xu Sun, Jun Wang
-
Publication number: 20140095512Abstract: Aspects of the present invention provide a tool for hash-based indexing. In an embodiment, a ranked dataset having a plurality of data items is obtained. Every data item in the ranked dataset has a ranking with respect to every other data item in the ranked dataset. A ranking triplet matrix is created based on the ranked dataset. The ranking triplet matrix has a set of ranking triplets, each of which indicates the relative ranking for a pair of the data items in the ranked dataset. This ranking triplet can be merged with a hash table obtained using a standard hash function and the data items can be indexed based on the results.Type: ApplicationFiled: October 4, 2012Publication date: April 3, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Xu Sun, Jun Wang
-
Publication number: 20140089269Abstract: Expired files in the deduplicating virtual media are selectively erased using a backup application for notifying a backup repository of which expired files are no longer required. The space of the expired files is reclaimed for reuse. Virtual space of the expired files is reserved for allowing the backup application to seek past the reclaimed space to subsequent data in the deduplicating virtual media.Type: ApplicationFiled: September 24, 2012Publication date: March 27, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Shay H. AKIRAV, Michael HIRSCH
-
Publication number: 20140089315Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.Type: ApplicationFiled: September 24, 2012Publication date: March 27, 2014Inventor: Philip R. Krause
-
Publication number: 20140089316Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.Type: ApplicationFiled: September 24, 2012Publication date: March 27, 2014Inventor: Philip R. Krause
-
Publication number: 20140089273Abstract: Storing and retrieving files based on hashes for the files. One method for storing files includes: identifying a file; identifying a hash calculated based on the file; renaming the file based on the hash based on the file; and storing the file in a particular location based on the hash calculated based on the file. Another method for retrieving files includes: identifying a hash for a given file; using the hash, traversing a hierarchical file structure to find a location where the given file should be stored; determining that the file is at the location; and as a result, retrieving the file.Type: ApplicationFiled: September 27, 2012Publication date: March 27, 2014Applicant: MICROSOFT CORPORATIONInventors: Ronen Borshack, Anil Francis Thomas, Erez Einav, Philip Ernst Taron
-
Publication number: 20140074841Abstract: In one embodiment, non-transitory computer-readable medium stores instructions for implementing a file system, which include operations for acquiring an exclusive lock on a first node in an ordered tree data-structure, and adding an identifier and index of the first node to a path data structure. If the value of the index in the first node is non-zero, then each exclusive lock acquired between the first node and the root of the tree data structure is released. In any case, the operation proceeds to a second node, which is addressed at the index on the first node. In one embodiment, operations further include acquiring an exclusive lock on the second node, and, if the second node is a leaf node, performing updates to the second node, and then releasing each exclusive lock in the data-structure.Type: ApplicationFiled: October 16, 2012Publication date: March 13, 2014Applicant: Apple Inc.Inventors: David A. Majnemer, Wenguang Wang
-
Publication number: 20140074849Abstract: System for generating a pseudo-repository. The system scans a directory to detect compiled binary files, and assembles an index of the compiled binary files based on metadata describing the compiled binary files. Then the system generates a pseudo-repository based on the index that maps each compiled binary file with at least one associated artifact, wherein the pseudo-repository responds to client requests for one of the binary files.Type: ApplicationFiled: September 7, 2012Publication date: March 13, 2014Inventors: Ondrej Zizka, Lukas Fryc
-
Publication number: 20140074850Abstract: Embodiments are directed towards the visualization of machine data received from computing clusters. Embodiments may enable improved analysis of computing cluster performance, error detection, troubleshooting, error prediction, or the like. Individual cluster nodes may generate machine data that includes information and data regarding the operation and status of the cluster node. The machine data is received from each cluster node for indexing by one or more indexing applications. The indexed machine data including the complete data set may be stored in one or more index stores. A visualization application enables a user to select one or more analysis lenses that may be used to generate visualizations of the machine data. The visualization application employs the analysis lens to produce visualizations of the computing cluster machine data.Type: ApplicationFiled: October 25, 2012Publication date: March 13, 2014Applicant: Splunk Inc.Inventors: Cary Glen Noel, Kirubakaran Pakkirisamy, Alex Raitz, Pierre Tsai
-
Publication number: 20140067777Abstract: Timing data associated with a database or database system can be stored in a reduced or compressed form which can be decompressed back to a full or original form. In doing so, timing data can be compressed by using a subset of a full set of possible values (e.g., a determined range which is more likely to occur) instead of using a full set of possible values. Timing data can also be compressed by eliminating redundant, insignificant duplicate and/or common values, for example, between one or more components (e.g., start and end times of a period of time) of the timing data.Type: ApplicationFiled: September 6, 2012Publication date: March 6, 2014Inventors: Cameron Lewis, Elizabeth Brealey, Michael Reed
-
Publication number: 20140067821Abstract: A system and method for storing and accessing data in an embedded system of an aircraft extracts identifiers from headers in stored data, and stores the identifiers in a separately indexable array.Type: ApplicationFiled: September 13, 2012Publication date: March 6, 2014Applicant: GE AVIATION SYSTEMS LLCInventor: Benjamin James Sykes
-
Publication number: 20140067819Abstract: A method and apparatus are provided for building and using a persistent XML tree index for navigating an XML document. The XML tree index is stored separately from the XML document content, and thus is able to optimize performance through the use of fixed-sized index entries. The XML document hierarchy need not be constructed in volatile memory, so creating and using the XML tree index scales even for large documents. To evaluate a path expression including descendent or ancestral syntax, navigation links can be read from persistent storage and used directly to find the nodes specified in the path expression. The use of an abstract navigational interface allows applications to be written that are independent of the storage implementation of the index and the content. Thus, the XML tree index can index documents stored at least in a database, a persistent file system, or as a sequence of in memory.Type: ApplicationFiled: September 5, 2012Publication date: March 6, 2014Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: Anguel Novoselsky, Zhen Hua Liu, Thomas Baby
-
Patent number: 8666985Abstract: An indexing database utilizes a non-transitory storage medium. A pattern matching processing unit generates preclassification data for the network data packets utilizing pattern matching analysis. At least one processing unit implements a storage process that receives the network data packets, stores the network data packets in at least one of the slots, and transfers the network data packets to a packet capture repository when slots in a shared memory are full. A preclassification process requests from the pattern matching processing unit the preclassification data. An indexing process determines, based upon the preclassification data, whether to invoke or omit additional analysis of the network data packets, and performs at least one of aggregation, classification, or annotation of the network data packets in the shared memory to maintain one or more indices in the indexing database.Type: GrantFiled: March 15, 2012Date of Patent: March 4, 2014Assignee: Solera Networks, Inc.Inventors: Matthew S. Wood, Joseph H. Levy, McKay Marston
-
Publication number: 20140052733Abstract: Embodiments are directed towards previewing results generated from indexing data raw data before the corresponding index data is added to an index store. Raw data may be received from a preview data source. After an initial set of configuration information may be established, the preview data may be submitted to an index processing pipeline. A previewing application may generate preview results used on the preview index data and the configuration information. The preview results may enable previewing how the data is being processed by the indexing application. If the preview results are not acceptable, the configuration information may be modified. The preview application enables modification of the configuration information until the generated preview results may be acceptable. If the configuration information is acceptable, the preview data may be processed and indexed in one or more index stores.Type: ApplicationFiled: August 17, 2012Publication date: February 20, 2014Applicant: Splunk Inc.Inventors: Mitchell Neuman Blank, JR., Leonid Budchenko, David Carasso, Micah James Delfino, Johnvey Hwang, Stephen Phillip Sorkin, Eric Timothy Woo
-
Publication number: 20140052698Abstract: A system and an article of manufacture for de-duplicating virtual machine image accesses include identifying one or more identical blocks in two or more images in a virtual machine image repository, generating a block map for mapping different blocks with identical content into a same block, deploying a virtual machine image by reconstituting an image from the block map and fetching any unique blocks remotely on-demand, and de-duplicating virtual machine image accesses by storing the deployed virtual machine image in a local disk cache.Type: ApplicationFiled: August 17, 2012Publication date: February 20, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Han Chen, Alexei A. Karve, Minkyong Kim, Andrzej P. Kochut, Hui Lei, Jayaram Kallapalayam Radhakrishnan, Zhiming Shen, Zhe Zhang
-
Publication number: 20140052732Abstract: Provided is a method that includes a method for updating index data. The method includes receiving index data, including an index value indicative of user activity on a network site and an index time corresponding to a time used for calculating the index value, receiving an update index time corresponding to a time used for updating the index data, determining an updated index value using an exponential decay of the index value from the index time to the update index time, wherein the updated index value comprises a decayed value of the index value corresponding to the update time, and storing updated index data including the updated index value and the update index time.Type: ApplicationFiled: August 26, 2011Publication date: February 20, 2014Inventor: William R. Softky
-
Publication number: 20140052699Abstract: Systems and methods for estimating data reduction ratio for a data set is provided. The method comprises selecting a plurality of m elements from a data set comprising a plurality of N elements; associating an identifier hi for each of the plurality of m elements; associating an identifier he for each of the plurality of elements in the data set; tracking number of times an element i appears in a base set that includes the plurality of m elements selected from the data set; calculating a value counti that indicates the number of times an identifier he matches an identifier hi; and estimating data reduction ratio for the plurality of N elements in the data set, based on number of m number elements selected from the data set and the value counti.Type: ApplicationFiled: August 20, 2012Publication date: February 20, 2014Applicant: International Business Machines CorporationInventors: Danny Harnik, Oded Margalit, Dalit Naor, Dmitry Sotnikov, Gil Vernik
-
Publication number: 20140046951Abstract: Methods, software and devices for indexing responses for later providing to users in response to queries are disclosed. For each stored response, representative queries are stored in association with that response, where each representative query represents a possible query for searching for information addressed by that response. Representative queries are selectively modified by substituting terms by corresponding chosen substitute expressions, where a substitute expression is chosen for a particular term in one of the representative queries based on past substitutions in others of said representative queries. For each response, a Boolean expression is formed from those representative queries associated with that response, as selectively modified, where the Boolean expression is satisfied by each of those representative queries.Type: ApplicationFiled: August 8, 2012Publication date: February 13, 2014Applicant: Intelliresponse Systems Inc.Inventors: Darren Redfern, Chad Ternent
-
Publication number: 20140046911Abstract: Systems and techniques of de-duplicating file and/or blobs within a file system are presented. In one embodiment, an email system is disclosed wherein the email system receives email messages comprising a set of associated attachments. The system determines whether the associated attachments have been previously stored in the email system, the state of the stored attachment, and if the state of the attachment is appropriate for sharing copies of the attachment, then providing a reference to the attachment upon a request to share the attachment. In another embodiment, the system may detect whether stored attachments are corrupted and, if so, attempt to repair the attachment, and possibly, prior to sharing references to the attachment.Type: ApplicationFiled: August 13, 2012Publication date: February 13, 2014Applicant: MICROSOFT CORPORATIONInventors: Kristof Roomp, Gruia Pitigoi-Aron, Ivaylo Dimitrov, Brandon Pai, Cheng Ho, Kumar Pasumarthy, Lincoln Liu, Alok Dhariwal, John Rodrigues
-
Publication number: 20140040213Abstract: Records received from one or more sources in a network are processed. For each of multiple intervals of time, a matching procedure is attempted on sets of one or more records, including comparing identifiers associated with different records to generate the sets and determining whether or not a completeness criterion is satisfied for one or more of the sets. The processing also includes, for at least some of the intervals of time, processing at least one complete set, consisting of one or more of the received records on which the matching procedure is first attempted during the interval of time and one or more records stored in a data store before the interval of time, and for at least some of the intervals of time, processing at least one incomplete set, consisting of one or more records stored in the data store before the interval of time.Type: ApplicationFiled: August 2, 2012Publication date: February 6, 2014Applicant: Ab Initio Software LLCInventor: Larry Paul Rossi
-
Publication number: 20140032507Abstract: Data de-duplication is done on a data set. The data de-duplication is done using a partial digest table. Some digests are selective removed from the partial digest table when a pre-determined condition occurs.Type: ApplicationFiled: July 26, 2012Publication date: January 30, 2014Inventors: Douglas L. Voigt, Siamak Nazari
-
Publication number: 20140032569Abstract: System, method and computer program products for storing data by computing a plurality of hash functions of data values in a data item, and determining a corresponding memory location for one of the plurality of hash functions of data values in the data item. Each memory location is of a cacheline size wherein a data item is stored in a memory location. Each memory location can store a plurality of data items. A key portion of all data items is contiguously stored within the memory location, and a payload portion is contiguously stored within the memory location. Payload portions are packed as bit-aligned in a fixed-sized memory location, comprising a bucket in a bucketized hash table, each bucket sized to store multiple key portions and payload portions that are packed as bit-aligned in a fixed-sized bucket. Corresponding key portions are stored as compressed keys in said fixed-sized bucket.Type: ApplicationFiled: July 25, 2012Publication date: January 30, 2014Applicant: International Business Machines CorporationInventors: Min-Soo Kim, Lin Qiao, Vijayshankar Raman, Eugene J. Shekita
-
Publication number: 20140032925Abstract: The embodiments herein relate to data management and, more particularly, to global deduplication and encryption of data in data management systems. The user equipments (UE) are grouped under certain deduplication groups based on certain parameters such as rate of data exchange, frequency of data exchange, social closeness, work closeness, similarity of data and interests and so on, between those UEs. Further, specific deduplication and encryption parameters such as encryption method, encryption key, signature computation method, block computation method and so on are assigned to each group. Further, deduplication and encryption of data in each group is performed using the deduplication and encryption modes and parameters assigned to each group. The deduplication and encryption of data is performed in at least one of the UEs and/or a server. Further, the parameters used for deduplication and encryption are stored in specific databases and are encrypted for better security.Type: ApplicationFiled: July 25, 2012Publication date: January 30, 2014Inventors: Ankur Panchbudhe, Anand A. Kekre
-
Publication number: 20140032562Abstract: A method and client device is disclosed for indexing content of a multimedia file. The method comprises using a client device to segment the content of the multimedia file into a plurality of segments and to determine structure-searchable data for each segment. Determining structure searchable data for a segment comprises (1) identifying one or more features of respective multimedia types in the segment; (2) correlating each of the identified features to one or more respective keywords; and (3) calculating one or more respective relevance factors for each of the keywords, where at least one of the relevance factors is based on one or more characteristics of the client device. The method also comprises the client device transmitting the structure-searchable data (including the keywords, relevance factors, and respective media types of the identified features) to an indexing server.Type: ApplicationFiled: July 26, 2012Publication date: January 30, 2014Applicant: Telefonaktiebolaget LM Ericsson (publ)Inventors: Tommy ARNGREN, David LINDEGREN, Joakim SÖDERBERG, Marika STÅLNACKE
-
Publication number: 20140032449Abstract: In one embodiment, a method includes receiving information associated with the operation of one or more network devices, indexing the information for analysis, analyzing the information to determine a pattern in the information, generating one or more labels for at least a portion of the information based at least in part on the pattern, and making the information and labels available to a remediation system.Type: ApplicationFiled: July 27, 2012Publication date: January 30, 2014Applicant: DELL PRODUCTS L.P.Inventors: Martin Kacin, David Douglas Kloba
-
Publication number: 20140019455Abstract: Managing versions of an electronic entity comprising many independently managed, but mutually-dependent, subcomponents can be challenging. File management functionality is provided for use with an integrated development environment to produce a visual indication of the relationships among the subcomponents. The approach described herein provides an improvement over source code control systems and backup systems in the ability to revert the state of one or more files as their content existed at an historical time point. The technique does not require a user to predict in advance at which time points the content state of one or more files will be interesting as historical time points for future use.Type: ApplicationFiled: July 12, 2012Publication date: January 16, 2014Applicant: Oracle International CorporationInventor: Neil James Cochrane
-
Publication number: 20140019893Abstract: A story index of story elements is provided in which each story element is able to be referenced in a story by name and by language that does not include the name. The story index may also contain references to the same story elements in other associated stories, including other stories in a series or that are in a different type of media. An associated story presentation application program may enable a viewer to view the entries in the story index for a specified story element and to then view the specified story element at any of the referenced locations. The application may enable purchase or downloading of the associated stories.Type: ApplicationFiled: July 11, 2012Publication date: January 16, 2014Applicant: Cellco Partnership d/b/a Verizon WirelessInventors: Agust K. GUDMUNDSSON, Virginia Benson Chanda
-
Publication number: 20140019425Abstract: The file server identifies two or more files, each including duplicated data among a plurality of files that have been stored into the logical storage device as a file group based on the file system information. The file server deletes copies of the duplicated data other than shared data that is one copy of the duplicated data included in the two or more files from the logical storage device. The file server makes a file, which is not a shared file of the file group, referring to the shared file that is a file configured by the shared data. The file server creates a group link that associates the m files that belong to the file group with each other.Type: ApplicationFiled: July 10, 2012Publication date: January 16, 2014Inventors: Koji Honami, Masahiro Shimizu
-
Publication number: 20140012832Abstract: A computer-implemented method, system and computer program product for collecting information from data sources by receiving a collection request at a collection tool to collect information, where the collection request includes data source information indicating a data source from which to retrieve the information. The data source information in the collection request is associated with one or more electronic data repositories in response to the data source indicated by the data source information being previously unidentified to the collection tool. The information is collected from the one or more associated electronic data repositories.Type: ApplicationFiled: July 6, 2012Publication date: January 9, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Roman Kisin, Andrey Pogodin, Pierre Raynaud-Richard
-
Publication number: 20140006365Abstract: A method, computer program product and system of minimizing epigenetic surprisal data either by comparing epigenetic surprisal data to a fixed baseline epigenetic data, so that all of the comparisons were made to the same baseline epigenetic data or by comparing epigenetic surprisal data to a rolling baseline of epigenetic surprisal data—that is, after each comparison the baseline is changed to the data from the time point which had been compared previously.Type: ApplicationFiled: June 29, 2012Publication date: January 2, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Robert R. Friedlander, James R. Kraemer
-
Publication number: 20140006364Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for merging media stream indexes of a media stream are described in this specification. In one aspect, a method includes receiving a first media stream index at a first server system, including a first list of sequentially arranged fragment identifiers corresponding to at least a portion of multiple sequentially arranged fragments. Fragment identifiers that are potentially missing from the first index can be identified. A second media stream index including a second list of sequentially arranged fragment identifiers corresponding to at least a portion of the multiple sequentially arranged fragments can be requested from a second server system. The first and second list of the sequentially arranged fragment identifiers can be compared and the first list of sequentially arranged fragment identifiers can be reconstructed based on the comparison.Type: ApplicationFiled: June 28, 2012Publication date: January 2, 2014Applicant: ADOBE SYSTEMS INCORPORATEDInventors: Glenn Eguchi, Asa Whillock, Kevin Streeter, Mohammed Pithapurwala, Noam Lorberbaum, Seth Hodgson, Srinivas Manapragada
-
Publication number: 20140006411Abstract: An approach is provided to determine one or more dynamic ordered tree structures and transition tree structures (e.g., based on one or more transitions of a device) to facilitate querying and/or accessing data stores. An apparatus and method determines to generate at least one index structure, determines to associate index objects of the generated index structure with one or more data objects of at least one data store, determines to generate at least one transition index structure based on the at least one generated index structure, and determines to associate the transition index structure with index objects corresponding to one or more data objects of at least one data store based on a transition of a device. Also, the method and apparatus determines to generate at least one query, and determines to generate at least one transition index structure where a current index structure to resolve the query is absent.Type: ApplicationFiled: June 29, 2012Publication date: January 2, 2014Applicant: Nokia CorporationInventors: Sergey Boldyrev, Pavandeep Kalra
-
Publication number: 20140006363Abstract: Data deduplication for data storage tapes comprises determining the read throughput of a deduplicated set of individual files on a single data storage tape, and determining a placement of deduplicated file data on a single data storage tape to reduce an average number of per-file gaps on the tape. Deduplicated file data is placed on the single data storage tape based on said placement to increase an average read throughput for a deduplicated set of individual files.Type: ApplicationFiled: June 29, 2012Publication date: January 2, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: MIHAIL C. CONSTANTINESCU, ABDULLAH GHARAIBEH, MAOHUA LU, DAVID A. PEASE, ANURAG SHARMA
-
Publication number: 20140006362Abstract: A mechanism is provided in a data processing system for reliable asynchronous solid-state device based de-duplication. Responsive to receiving a write request to write data to the file system, the mechanism sends the write request to the file system, and in parallel, computes a hash key for the write data. The mechanism looks up the hash key in a de-duplication table. The de-duplication table is stored in a memory or a solid-state storage device. Responsive to the hash key not existing in the de-duplication table, the mechanism writes the write data to a storage device, writes a journal transaction comprising the hash key, and updates the de-duplication table to reference the write data in the storage device.Type: ApplicationFiled: June 28, 2012Publication date: January 2, 2014Applicant: International Business Machines CorporationInventors: Ranjit M. Noronha, Ajay K. Singh
-
Publication number: 20140006349Abstract: Briefly, embodiments of methods or systems to replicate indexes in distributed search engines are described.Type: ApplicationFiled: June 28, 2012Publication date: January 2, 2014Applicant: Yahoo! Inc.Inventors: Vincent Leroy, Matthieu Morel, Flavio Junqueira
-
Publication number: 20130346414Abstract: One disclosed method includes receiving correlation instructions related to a plurality of meta-content elements that are associated with a primary content. The primary content may be multimedia content such as, but not limited to, an audiovisual content. The method includes performing a correlation in response to receiving the instructions. The correlation is between the meta-content elements, where the meta-content elements each have an arbitrary granularity defining meta-content segments. The method returns a result based on the correlation. Another disclosed method include receiving a request having correlation instructions related to a plurality of meta-content elements, where the meta-content elements are associated with a primary content. Again, each meta-content element has an arbitrary granularity defining meta-content segments.Type: ApplicationFiled: June 21, 2012Publication date: December 26, 2013Applicant: General Instrument CorporationInventors: Alfonso Martinez Smith, Paul C. Davis, Joshua B. Hurwitz, Douglas A. Kuhlman, Hiren M. Mandalia, Loren J. Rittle, Krunal S. Shah
-
Publication number: 20130346378Abstract: The present invention extends to methods, systems, and computer program products for performing memory compaction in a main memory database. The main memory database stores records within pages which are organized in doubly linked lists within partition heaps. The memory compaction process uses quasi-updates to move records from a page to the emptied to an active page in a partition heap. The quasi-updates create a new version of the record in the active page, the new version having the same data contents as the old version of the record. The creation of the new version can be performed using a transaction that employs wait for dependencies to allow the old version of the record to be read while the transaction is creating the new version thereby minimizing the effect of the memory compaction process on other transactions in the main memory database.Type: ApplicationFiled: June 21, 2012Publication date: December 26, 2013Applicant: MICROSOFT CORPORATIONInventors: Dimitrios Tsirogiannis, Per-Ake Larson
-
Publication number: 20130339316Abstract: Deduplicated data is packed into finite-sized containers. A similarity score is calculated between files that are similarly of the deduplicated data. The similarity score is used for grouping the similarly compared files of the deduplicated data into subsets for destaging each of the subsets from a deduplication system to one a finite-sized container.Type: ApplicationFiled: June 19, 2012Publication date: December 19, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael HIRSCH, Thorsten KRAUSE
-
Publication number: 20130339308Abstract: Disclosed herein are techniques for archiving data objects. It is determined whether a data object was rejected by an archiving module due to an information field thereof violating a protocol. If it is determined that the data object was rejected due to violation of the protocol, a compliant information field that complies with the protocol is generated such that the compliant information field causes the archiving module to permit archiving of the data object violating the protocol.Type: ApplicationFiled: June 19, 2012Publication date: December 19, 2013Inventors: Richard Herschel Schwartz, Tarcio Constant, Scott Alan Lemieux
-
Publication number: 20130339322Abstract: In a compression processing storage system, using a pool of compression cores, the compression cores are assigned to process either compression operations, decompression operations, or decompression and compression operations, which are scheduled for processing. A maximum number of the compression cores are set for processing only the decompression operations, thereby lowering a decompression latency. A minimal number of the compression cores are allocated for processing the compression operations, thereby increasing compression latency. Upon reaching a throughput limit for the compression operations that causes the minimal number of the plurality of compression cores to reach a busy status, the minimal number of the plurality of compression cores for processing the compression operations is increased.Type: ApplicationFiled: June 14, 2012Publication date: December 19, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan AMIT, Amir LIDOR, Sergey MARENKOV, Rostislav RAIKHMAN
-
Publication number: 20130339321Abstract: The present invention relates to a computer-implemented method, system and computer readable medium for providing a scalable bio-informatics sequence search on cloud. The method comprises the steps of partitioning a genome data into a plurality of datasets and storing the plurality of data sets in a database. Receiving at least one sequence search request input and searching for a genome sequence in the database corresponding to the search request input and scaling of the sequence search based on the sequence search request input.Type: ApplicationFiled: June 13, 2012Publication date: December 19, 2013Applicant: Infosys LimitedInventors: S/shri. Shyam Kumar Doddavula, Madhavi Rani, Anirban Ghosh, Akansha Jain, Santonu Sarkar, Mudit Kaushik, Harsh Vachhani
-
Publication number: 20130332467Abstract: Data elements from data sources and having a data value set are linked by using hash functions to determine a dimensionally reduced instance signature for each data element based on all data values associated with that data element to yield a plurality of dimensionally reduced instance signatures of equivalent fixed size such that similarities among the data values in the data value sets across all data elements is maintained among the plurality of instance signatures. Candidate pairs of data elements to link are identified using the plurality of instance signatures in locality sensitive hash functions, and a similarity index is generated for each candidate pair using a pre-determined measure of similarity. Candidate pairs of data elements having a similarity index above a given threshold are linked.Type: ApplicationFiled: July 8, 2012Publication date: December 12, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Mihaela Ancuta Bornea, Songyun Duan, Achille Belly Fokoue-Nkoutche, Oktie Hassanzadeh, Anastasios Kementsietsidis, Kavitha Srinivas, Michael J. Ward
-
Publication number: 20130332446Abstract: A repartitioning optimizer identifies alternative repartitioning strategies and selects optimal ones, accounting for network transfer utilization and partition sizes in addition to traditional metrics. If prior partitioning was hash-based, the repartitioning optimizer can determine whether a hash-based repartitioning can result in not every computing device providing data to every other computing device. If prior partitioning was range-based, the repartitioning optimizer can determine whether a range-based repartitioning can generate similarly sized output partitions while aligning input and output partition boundaries, increasing the number of computing devices that do not provide data to every other computing device. Individual computing devices, as they are performing a repartitioning, assign a repartitioning index to each individual data element, which represents the computing device to which such a data element is destined.Type: ApplicationFiled: June 11, 2012Publication date: December 12, 2013Applicant: MICROSOFT CORPORATIONInventors: Jingren Zhou, Nicolas Bruno, Wei Lin
-
Publication number: 20130332462Abstract: A system to generate content recommendations by identifying content and selecting a content entry for the content. The system comprises identifying a keyword in the content entry, generating a tag for the content based on the keyword, generating a plurality of recommendations based on the tag, and displaying the recommendations.Type: ApplicationFiled: June 12, 2012Publication date: December 12, 2013Inventors: David Paul Billmaier, Jason Christopher Hall, Alexander Charies Barclay, John Max Kellum, Henry Hideyuki Yamamoto
-
Publication number: 20130318090Abstract: Embodiments of the invention provide a system, method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.Type: ApplicationFiled: May 24, 2012Publication date: November 28, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
-
Publication number: 20130318050Abstract: Exemplary system, and computer program product embodiments for data deduplication using short term history in a computing environment are provided. In one embodiment, by way of example only, a hash value is calculated on data chunks for a read operation. The calculated hash value is stored in a storage media. The calculated hash value is looked up in the storage media to verify if a current write operation was previously written and/or read. Additional system and computer program product embodiments are disclosed and provide related advantages.Type: ApplicationFiled: May 24, 2012Publication date: November 28, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan AMIT, Chaim KOIFMAN
-
Publication number: 20130311432Abstract: A computer identifies a relationship among a subset of a set of data blocks, a basis of the relationship forming a context shared by the subset of data blocks. The computer selects a code data structure from a set of code data structures using the context. The context is associated with the code data structure, and the code data structure includes a set of codes. The computer computes, for a first data block in the subset of data blocks, a first code corresponding to a content of the first data block. The computer determines whether the first code matches a stored code in the code data structure. The computer replaces, responsive to the first code matching the stored code, the first data block with a reference to an instance of the first data block. The computer causes the reference to be stored in a target data processing system.Type: ApplicationFiled: May 21, 2012Publication date: November 21, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Vishal Chittranjan Aslot, Adekunle Bello, Brian W. Hart, Robert Wright Thompson
-
Publication number: 20130311479Abstract: According to one embodiment of the present invention, a system analyzes one or more change records based on text analytics using dictionaries and rules for the analysis in order to generate an index of analyzed data that represents the one or more change records. The change records each include a change and corresponding time frame for occurrence of the change. Information from a request is applied to the index of analyzed data to determine one or more candidate causes for the incident and the corresponding time frame for occurrence of the change. A time associated with the request is correlated with the corresponding time frame for occurrence of the change to identify the one or more candidate causes in the one or more change records as causes for the incident. Embodiments of the present invention further include a method and computer program product for determining causes of an incident.Type: ApplicationFiled: May 21, 2012Publication date: November 21, 2013Applicant: International Business Machines CorporationInventors: Dhruv A. Bhatt, Kristin E. McNeil, Nitaben A. Patel
-
Publication number: 20130311435Abstract: A method, computer product, and computer system of minimizing surprisal data comprising: at a source, reading and identifying characteristics of a genetic sequence of an organism; receiving an input of rank of at least two identified characteristics of the genetic sequence of the organism; generating a hierarchy of ranked, identified characteristics based on the rank of the at least two identified characteristics of the genetic sequence of the organism; comparing the hierarchy of ranked, identified characteristics to a repository of reference genomes; and if at least one reference genome from the repository matches the hierarchy of ranked, identified characteristics, breaking the matched reference genomes into pieces, combining pieces associated with the identified characteristics from at least one matched reference genome to form a filter pattern to be compared to the nucleotides of the genetic sequence of the organism, to obtain differences and create surprisal data.Type: ApplicationFiled: June 8, 2012Publication date: November 21, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Robert R. Friedlander, James R. Kraemer