Data Indexing; Abstracting; Data Reduction (epo) Patents (Class 707/E17.002)
  • Publication number: 20140101113
    Abstract: The present disclosure provides for implementing a two-level fingerprint caching scheme for a client cache and a server cache. The client cache hit ratio can be improved by pre-populating the client cache with fingerprints that are relevant to the client. Relevant fingerprints include fingerprints used during a recent time period (e.g., fingerprints of segments that are included in the last full backup image and any following incremental backup images created for the client after the last full backup image), and thus are referred to as fingerprints with good temporal locality. Relevant fingerprints also include fingerprints associated with a storage container that has good spatial locality, and thus are referred to as fingerprints with good spatial locality. A pre-set threshold established for the client cache (e.g., threshold Tc) is used to determine whether a storage container (and thus fingerprints associated with the storage container) has good spatial locality.
    Type: Application
    Filed: October 8, 2012
    Publication date: April 10, 2014
    Applicant: SYMANTEC CORPORATION
    Inventors: Xianbo Zhang, Haibin She, Chao Lei, Xiaobing Song, Shuai Cheng
  • Publication number: 20140095490
    Abstract: Aspects of the present invention provide a tool for hash-based indexing. In an embodiment, a ranked dataset having a plurality of data items is obtained. Every data item in the ranked dataset has a ranking with respect to every other data item in the ranked dataset. A ranking triplet matrix is created based on the ranked dataset. The ranking triplet matrix has a set of ranking triplets, each of which indicates the relative ranking for a pair of the data items in the ranked dataset. This ranking triplet can be merged with a hash table obtained using a standard hash function and the data items can be indexed based on the results.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Xu Sun, Jun Wang
  • Publication number: 20140095512
    Abstract: Aspects of the present invention provide a tool for hash-based indexing. In an embodiment, a ranked dataset having a plurality of data items is obtained. Every data item in the ranked dataset has a ranking with respect to every other data item in the ranked dataset. A ranking triplet matrix is created based on the ranked dataset. The ranking triplet matrix has a set of ranking triplets, each of which indicates the relative ranking for a pair of the data items in the ranked dataset. This ranking triplet can be merged with a hash table obtained using a standard hash function and the data items can be indexed based on the results.
    Type: Application
    Filed: October 4, 2012
    Publication date: April 3, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Xu Sun, Jun Wang
  • Publication number: 20140089269
    Abstract: Expired files in the deduplicating virtual media are selectively erased using a backup application for notifying a backup repository of which expired files are no longer required. The space of the expired files is reclaimed for reuse. Virtual space of the expired files is reserved for allowing the backup application to seek past the reclaimed space to subsequent data in the deduplicating virtual media.
    Type: Application
    Filed: September 24, 2012
    Publication date: March 27, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Shay H. AKIRAV, Michael HIRSCH
  • Publication number: 20140089315
    Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.
    Type: Application
    Filed: September 24, 2012
    Publication date: March 27, 2014
    Inventor: Philip R. Krause
  • Publication number: 20140089316
    Abstract: An apparatus, method and article of manufacture of the present invention detects the presence of references to the same concept in separate sections of text, and, with no input required from the reader, presents the reader with information concerning the detected references to the concept. The information provided may comprise information related to the location of the reference to the concept in other sections of text, and the reader also is provided the ability to move from one reference to a concept directly to another reference to the same concept.
    Type: Application
    Filed: September 24, 2012
    Publication date: March 27, 2014
    Inventor: Philip R. Krause
  • Publication number: 20140089273
    Abstract: Storing and retrieving files based on hashes for the files. One method for storing files includes: identifying a file; identifying a hash calculated based on the file; renaming the file based on the hash based on the file; and storing the file in a particular location based on the hash calculated based on the file. Another method for retrieving files includes: identifying a hash for a given file; using the hash, traversing a hierarchical file structure to find a location where the given file should be stored; determining that the file is at the location; and as a result, retrieving the file.
    Type: Application
    Filed: September 27, 2012
    Publication date: March 27, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Ronen Borshack, Anil Francis Thomas, Erez Einav, Philip Ernst Taron
  • Publication number: 20140074841
    Abstract: In one embodiment, non-transitory computer-readable medium stores instructions for implementing a file system, which include operations for acquiring an exclusive lock on a first node in an ordered tree data-structure, and adding an identifier and index of the first node to a path data structure. If the value of the index in the first node is non-zero, then each exclusive lock acquired between the first node and the root of the tree data structure is released. In any case, the operation proceeds to a second node, which is addressed at the index on the first node. In one embodiment, operations further include acquiring an exclusive lock on the second node, and, if the second node is a leaf node, performing updates to the second node, and then releasing each exclusive lock in the data-structure.
    Type: Application
    Filed: October 16, 2012
    Publication date: March 13, 2014
    Applicant: Apple Inc.
    Inventors: David A. Majnemer, Wenguang Wang
  • Publication number: 20140074849
    Abstract: System for generating a pseudo-repository. The system scans a directory to detect compiled binary files, and assembles an index of the compiled binary files based on metadata describing the compiled binary files. Then the system generates a pseudo-repository based on the index that maps each compiled binary file with at least one associated artifact, wherein the pseudo-repository responds to client requests for one of the binary files.
    Type: Application
    Filed: September 7, 2012
    Publication date: March 13, 2014
    Inventors: Ondrej Zizka, Lukas Fryc
  • Publication number: 20140074850
    Abstract: Embodiments are directed towards the visualization of machine data received from computing clusters. Embodiments may enable improved analysis of computing cluster performance, error detection, troubleshooting, error prediction, or the like. Individual cluster nodes may generate machine data that includes information and data regarding the operation and status of the cluster node. The machine data is received from each cluster node for indexing by one or more indexing applications. The indexed machine data including the complete data set may be stored in one or more index stores. A visualization application enables a user to select one or more analysis lenses that may be used to generate visualizations of the machine data. The visualization application employs the analysis lens to produce visualizations of the computing cluster machine data.
    Type: Application
    Filed: October 25, 2012
    Publication date: March 13, 2014
    Applicant: Splunk Inc.
    Inventors: Cary Glen Noel, Kirubakaran Pakkirisamy, Alex Raitz, Pierre Tsai
  • Publication number: 20140067777
    Abstract: Timing data associated with a database or database system can be stored in a reduced or compressed form which can be decompressed back to a full or original form. In doing so, timing data can be compressed by using a subset of a full set of possible values (e.g., a determined range which is more likely to occur) instead of using a full set of possible values. Timing data can also be compressed by eliminating redundant, insignificant duplicate and/or common values, for example, between one or more components (e.g., start and end times of a period of time) of the timing data.
    Type: Application
    Filed: September 6, 2012
    Publication date: March 6, 2014
    Inventors: Cameron Lewis, Elizabeth Brealey, Michael Reed
  • Publication number: 20140067821
    Abstract: A system and method for storing and accessing data in an embedded system of an aircraft extracts identifiers from headers in stored data, and stores the identifiers in a separately indexable array.
    Type: Application
    Filed: September 13, 2012
    Publication date: March 6, 2014
    Applicant: GE AVIATION SYSTEMS LLC
    Inventor: Benjamin James Sykes
  • Publication number: 20140067819
    Abstract: A method and apparatus are provided for building and using a persistent XML tree index for navigating an XML document. The XML tree index is stored separately from the XML document content, and thus is able to optimize performance through the use of fixed-sized index entries. The XML document hierarchy need not be constructed in volatile memory, so creating and using the XML tree index scales even for large documents. To evaluate a path expression including descendent or ancestral syntax, navigation links can be read from persistent storage and used directly to find the nodes specified in the path expression. The use of an abstract navigational interface allows applications to be written that are independent of the storage implementation of the index and the content. Thus, the XML tree index can index documents stored at least in a database, a persistent file system, or as a sequence of in memory.
    Type: Application
    Filed: September 5, 2012
    Publication date: March 6, 2014
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Anguel Novoselsky, Zhen Hua Liu, Thomas Baby
  • Patent number: 8666985
    Abstract: An indexing database utilizes a non-transitory storage medium. A pattern matching processing unit generates preclassification data for the network data packets utilizing pattern matching analysis. At least one processing unit implements a storage process that receives the network data packets, stores the network data packets in at least one of the slots, and transfers the network data packets to a packet capture repository when slots in a shared memory are full. A preclassification process requests from the pattern matching processing unit the preclassification data. An indexing process determines, based upon the preclassification data, whether to invoke or omit additional analysis of the network data packets, and performs at least one of aggregation, classification, or annotation of the network data packets in the shared memory to maintain one or more indices in the indexing database.
    Type: Grant
    Filed: March 15, 2012
    Date of Patent: March 4, 2014
    Assignee: Solera Networks, Inc.
    Inventors: Matthew S. Wood, Joseph H. Levy, McKay Marston
  • Publication number: 20140052733
    Abstract: Embodiments are directed towards previewing results generated from indexing data raw data before the corresponding index data is added to an index store. Raw data may be received from a preview data source. After an initial set of configuration information may be established, the preview data may be submitted to an index processing pipeline. A previewing application may generate preview results used on the preview index data and the configuration information. The preview results may enable previewing how the data is being processed by the indexing application. If the preview results are not acceptable, the configuration information may be modified. The preview application enables modification of the configuration information until the generated preview results may be acceptable. If the configuration information is acceptable, the preview data may be processed and indexed in one or more index stores.
    Type: Application
    Filed: August 17, 2012
    Publication date: February 20, 2014
    Applicant: Splunk Inc.
    Inventors: Mitchell Neuman Blank, JR., Leonid Budchenko, David Carasso, Micah James Delfino, Johnvey Hwang, Stephen Phillip Sorkin, Eric Timothy Woo
  • Publication number: 20140052698
    Abstract: A system and an article of manufacture for de-duplicating virtual machine image accesses include identifying one or more identical blocks in two or more images in a virtual machine image repository, generating a block map for mapping different blocks with identical content into a same block, deploying a virtual machine image by reconstituting an image from the block map and fetching any unique blocks remotely on-demand, and de-duplicating virtual machine image accesses by storing the deployed virtual machine image in a local disk cache.
    Type: Application
    Filed: August 17, 2012
    Publication date: February 20, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Han Chen, Alexei A. Karve, Minkyong Kim, Andrzej P. Kochut, Hui Lei, Jayaram Kallapalayam Radhakrishnan, Zhiming Shen, Zhe Zhang
  • Publication number: 20140052732
    Abstract: Provided is a method that includes a method for updating index data. The method includes receiving index data, including an index value indicative of user activity on a network site and an index time corresponding to a time used for calculating the index value, receiving an update index time corresponding to a time used for updating the index data, determining an updated index value using an exponential decay of the index value from the index time to the update index time, wherein the updated index value comprises a decayed value of the index value corresponding to the update time, and storing updated index data including the updated index value and the update index time.
    Type: Application
    Filed: August 26, 2011
    Publication date: February 20, 2014
    Inventor: William R. Softky
  • Publication number: 20140052699
    Abstract: Systems and methods for estimating data reduction ratio for a data set is provided. The method comprises selecting a plurality of m elements from a data set comprising a plurality of N elements; associating an identifier hi for each of the plurality of m elements; associating an identifier he for each of the plurality of elements in the data set; tracking number of times an element i appears in a base set that includes the plurality of m elements selected from the data set; calculating a value counti that indicates the number of times an identifier he matches an identifier hi; and estimating data reduction ratio for the plurality of N elements in the data set, based on number of m number elements selected from the data set and the value counti.
    Type: Application
    Filed: August 20, 2012
    Publication date: February 20, 2014
    Applicant: International Business Machines Corporation
    Inventors: Danny Harnik, Oded Margalit, Dalit Naor, Dmitry Sotnikov, Gil Vernik
  • Publication number: 20140046951
    Abstract: Methods, software and devices for indexing responses for later providing to users in response to queries are disclosed. For each stored response, representative queries are stored in association with that response, where each representative query represents a possible query for searching for information addressed by that response. Representative queries are selectively modified by substituting terms by corresponding chosen substitute expressions, where a substitute expression is chosen for a particular term in one of the representative queries based on past substitutions in others of said representative queries. For each response, a Boolean expression is formed from those representative queries associated with that response, as selectively modified, where the Boolean expression is satisfied by each of those representative queries.
    Type: Application
    Filed: August 8, 2012
    Publication date: February 13, 2014
    Applicant: Intelliresponse Systems Inc.
    Inventors: Darren Redfern, Chad Ternent
  • Publication number: 20140046911
    Abstract: Systems and techniques of de-duplicating file and/or blobs within a file system are presented. In one embodiment, an email system is disclosed wherein the email system receives email messages comprising a set of associated attachments. The system determines whether the associated attachments have been previously stored in the email system, the state of the stored attachment, and if the state of the attachment is appropriate for sharing copies of the attachment, then providing a reference to the attachment upon a request to share the attachment. In another embodiment, the system may detect whether stored attachments are corrupted and, if so, attempt to repair the attachment, and possibly, prior to sharing references to the attachment.
    Type: Application
    Filed: August 13, 2012
    Publication date: February 13, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Kristof Roomp, Gruia Pitigoi-Aron, Ivaylo Dimitrov, Brandon Pai, Cheng Ho, Kumar Pasumarthy, Lincoln Liu, Alok Dhariwal, John Rodrigues
  • Publication number: 20140040213
    Abstract: Records received from one or more sources in a network are processed. For each of multiple intervals of time, a matching procedure is attempted on sets of one or more records, including comparing identifiers associated with different records to generate the sets and determining whether or not a completeness criterion is satisfied for one or more of the sets. The processing also includes, for at least some of the intervals of time, processing at least one complete set, consisting of one or more of the received records on which the matching procedure is first attempted during the interval of time and one or more records stored in a data store before the interval of time, and for at least some of the intervals of time, processing at least one incomplete set, consisting of one or more records stored in the data store before the interval of time.
    Type: Application
    Filed: August 2, 2012
    Publication date: February 6, 2014
    Applicant: Ab Initio Software LLC
    Inventor: Larry Paul Rossi
  • Publication number: 20140032507
    Abstract: Data de-duplication is done on a data set. The data de-duplication is done using a partial digest table. Some digests are selective removed from the partial digest table when a pre-determined condition occurs.
    Type: Application
    Filed: July 26, 2012
    Publication date: January 30, 2014
    Inventors: Douglas L. Voigt, Siamak Nazari
  • Publication number: 20140032569
    Abstract: System, method and computer program products for storing data by computing a plurality of hash functions of data values in a data item, and determining a corresponding memory location for one of the plurality of hash functions of data values in the data item. Each memory location is of a cacheline size wherein a data item is stored in a memory location. Each memory location can store a plurality of data items. A key portion of all data items is contiguously stored within the memory location, and a payload portion is contiguously stored within the memory location. Payload portions are packed as bit-aligned in a fixed-sized memory location, comprising a bucket in a bucketized hash table, each bucket sized to store multiple key portions and payload portions that are packed as bit-aligned in a fixed-sized bucket. Corresponding key portions are stored as compressed keys in said fixed-sized bucket.
    Type: Application
    Filed: July 25, 2012
    Publication date: January 30, 2014
    Applicant: International Business Machines Corporation
    Inventors: Min-Soo Kim, Lin Qiao, Vijayshankar Raman, Eugene J. Shekita
  • Publication number: 20140032925
    Abstract: The embodiments herein relate to data management and, more particularly, to global deduplication and encryption of data in data management systems. The user equipments (UE) are grouped under certain deduplication groups based on certain parameters such as rate of data exchange, frequency of data exchange, social closeness, work closeness, similarity of data and interests and so on, between those UEs. Further, specific deduplication and encryption parameters such as encryption method, encryption key, signature computation method, block computation method and so on are assigned to each group. Further, deduplication and encryption of data in each group is performed using the deduplication and encryption modes and parameters assigned to each group. The deduplication and encryption of data is performed in at least one of the UEs and/or a server. Further, the parameters used for deduplication and encryption are stored in specific databases and are encrypted for better security.
    Type: Application
    Filed: July 25, 2012
    Publication date: January 30, 2014
    Inventors: Ankur Panchbudhe, Anand A. Kekre
  • Publication number: 20140032562
    Abstract: A method and client device is disclosed for indexing content of a multimedia file. The method comprises using a client device to segment the content of the multimedia file into a plurality of segments and to determine structure-searchable data for each segment. Determining structure searchable data for a segment comprises (1) identifying one or more features of respective multimedia types in the segment; (2) correlating each of the identified features to one or more respective keywords; and (3) calculating one or more respective relevance factors for each of the keywords, where at least one of the relevance factors is based on one or more characteristics of the client device. The method also comprises the client device transmitting the structure-searchable data (including the keywords, relevance factors, and respective media types of the identified features) to an indexing server.
    Type: Application
    Filed: July 26, 2012
    Publication date: January 30, 2014
    Applicant: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Tommy ARNGREN, David LINDEGREN, Joakim SÖDERBERG, Marika STÅLNACKE
  • Publication number: 20140032449
    Abstract: In one embodiment, a method includes receiving information associated with the operation of one or more network devices, indexing the information for analysis, analyzing the information to determine a pattern in the information, generating one or more labels for at least a portion of the information based at least in part on the pattern, and making the information and labels available to a remediation system.
    Type: Application
    Filed: July 27, 2012
    Publication date: January 30, 2014
    Applicant: DELL PRODUCTS L.P.
    Inventors: Martin Kacin, David Douglas Kloba
  • Publication number: 20140019455
    Abstract: Managing versions of an electronic entity comprising many independently managed, but mutually-dependent, subcomponents can be challenging. File management functionality is provided for use with an integrated development environment to produce a visual indication of the relationships among the subcomponents. The approach described herein provides an improvement over source code control systems and backup systems in the ability to revert the state of one or more files as their content existed at an historical time point. The technique does not require a user to predict in advance at which time points the content state of one or more files will be interesting as historical time points for future use.
    Type: Application
    Filed: July 12, 2012
    Publication date: January 16, 2014
    Applicant: Oracle International Corporation
    Inventor: Neil James Cochrane
  • Publication number: 20140019893
    Abstract: A story index of story elements is provided in which each story element is able to be referenced in a story by name and by language that does not include the name. The story index may also contain references to the same story elements in other associated stories, including other stories in a series or that are in a different type of media. An associated story presentation application program may enable a viewer to view the entries in the story index for a specified story element and to then view the specified story element at any of the referenced locations. The application may enable purchase or downloading of the associated stories.
    Type: Application
    Filed: July 11, 2012
    Publication date: January 16, 2014
    Applicant: Cellco Partnership d/b/a Verizon Wireless
    Inventors: Agust K. GUDMUNDSSON, Virginia Benson Chanda
  • Publication number: 20140019425
    Abstract: The file server identifies two or more files, each including duplicated data among a plurality of files that have been stored into the logical storage device as a file group based on the file system information. The file server deletes copies of the duplicated data other than shared data that is one copy of the duplicated data included in the two or more files from the logical storage device. The file server makes a file, which is not a shared file of the file group, referring to the shared file that is a file configured by the shared data. The file server creates a group link that associates the m files that belong to the file group with each other.
    Type: Application
    Filed: July 10, 2012
    Publication date: January 16, 2014
    Inventors: Koji Honami, Masahiro Shimizu
  • Publication number: 20140012832
    Abstract: A computer-implemented method, system and computer program product for collecting information from data sources by receiving a collection request at a collection tool to collect information, where the collection request includes data source information indicating a data source from which to retrieve the information. The data source information in the collection request is associated with one or more electronic data repositories in response to the data source indicated by the data source information being previously unidentified to the collection tool. The information is collected from the one or more associated electronic data repositories.
    Type: Application
    Filed: July 6, 2012
    Publication date: January 9, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Roman Kisin, Andrey Pogodin, Pierre Raynaud-Richard
  • Publication number: 20140006365
    Abstract: A method, computer program product and system of minimizing epigenetic surprisal data either by comparing epigenetic surprisal data to a fixed baseline epigenetic data, so that all of the comparisons were made to the same baseline epigenetic data or by comparing epigenetic surprisal data to a rolling baseline of epigenetic surprisal data—that is, after each comparison the baseline is changed to the data from the time point which had been compared previously.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Robert R. Friedlander, James R. Kraemer
  • Publication number: 20140006364
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for merging media stream indexes of a media stream are described in this specification. In one aspect, a method includes receiving a first media stream index at a first server system, including a first list of sequentially arranged fragment identifiers corresponding to at least a portion of multiple sequentially arranged fragments. Fragment identifiers that are potentially missing from the first index can be identified. A second media stream index including a second list of sequentially arranged fragment identifiers corresponding to at least a portion of the multiple sequentially arranged fragments can be requested from a second server system. The first and second list of the sequentially arranged fragment identifiers can be compared and the first list of sequentially arranged fragment identifiers can be reconstructed based on the comparison.
    Type: Application
    Filed: June 28, 2012
    Publication date: January 2, 2014
    Applicant: ADOBE SYSTEMS INCORPORATED
    Inventors: Glenn Eguchi, Asa Whillock, Kevin Streeter, Mohammed Pithapurwala, Noam Lorberbaum, Seth Hodgson, Srinivas Manapragada
  • Publication number: 20140006411
    Abstract: An approach is provided to determine one or more dynamic ordered tree structures and transition tree structures (e.g., based on one or more transitions of a device) to facilitate querying and/or accessing data stores. An apparatus and method determines to generate at least one index structure, determines to associate index objects of the generated index structure with one or more data objects of at least one data store, determines to generate at least one transition index structure based on the at least one generated index structure, and determines to associate the transition index structure with index objects corresponding to one or more data objects of at least one data store based on a transition of a device. Also, the method and apparatus determines to generate at least one query, and determines to generate at least one transition index structure where a current index structure to resolve the query is absent.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Applicant: Nokia Corporation
    Inventors: Sergey Boldyrev, Pavandeep Kalra
  • Publication number: 20140006363
    Abstract: Data deduplication for data storage tapes comprises determining the read throughput of a deduplicated set of individual files on a single data storage tape, and determining a placement of deduplicated file data on a single data storage tape to reduce an average number of per-file gaps on the tape. Deduplicated file data is placed on the single data storage tape based on said placement to increase an average read throughput for a deduplicated set of individual files.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: MIHAIL C. CONSTANTINESCU, ABDULLAH GHARAIBEH, MAOHUA LU, DAVID A. PEASE, ANURAG SHARMA
  • Publication number: 20140006362
    Abstract: A mechanism is provided in a data processing system for reliable asynchronous solid-state device based de-duplication. Responsive to receiving a write request to write data to the file system, the mechanism sends the write request to the file system, and in parallel, computes a hash key for the write data. The mechanism looks up the hash key in a de-duplication table. The de-duplication table is stored in a memory or a solid-state storage device. Responsive to the hash key not existing in the de-duplication table, the mechanism writes the write data to a storage device, writes a journal transaction comprising the hash key, and updates the de-duplication table to reference the write data in the storage device.
    Type: Application
    Filed: June 28, 2012
    Publication date: January 2, 2014
    Applicant: International Business Machines Corporation
    Inventors: Ranjit M. Noronha, Ajay K. Singh
  • Publication number: 20140006349
    Abstract: Briefly, embodiments of methods or systems to replicate indexes in distributed search engines are described.
    Type: Application
    Filed: June 28, 2012
    Publication date: January 2, 2014
    Applicant: Yahoo! Inc.
    Inventors: Vincent Leroy, Matthieu Morel, Flavio Junqueira
  • Publication number: 20130346414
    Abstract: One disclosed method includes receiving correlation instructions related to a plurality of meta-content elements that are associated with a primary content. The primary content may be multimedia content such as, but not limited to, an audiovisual content. The method includes performing a correlation in response to receiving the instructions. The correlation is between the meta-content elements, where the meta-content elements each have an arbitrary granularity defining meta-content segments. The method returns a result based on the correlation. Another disclosed method include receiving a request having correlation instructions related to a plurality of meta-content elements, where the meta-content elements are associated with a primary content. Again, each meta-content element has an arbitrary granularity defining meta-content segments.
    Type: Application
    Filed: June 21, 2012
    Publication date: December 26, 2013
    Applicant: General Instrument Corporation
    Inventors: Alfonso Martinez Smith, Paul C. Davis, Joshua B. Hurwitz, Douglas A. Kuhlman, Hiren M. Mandalia, Loren J. Rittle, Krunal S. Shah
  • Publication number: 20130346378
    Abstract: The present invention extends to methods, systems, and computer program products for performing memory compaction in a main memory database. The main memory database stores records within pages which are organized in doubly linked lists within partition heaps. The memory compaction process uses quasi-updates to move records from a page to the emptied to an active page in a partition heap. The quasi-updates create a new version of the record in the active page, the new version having the same data contents as the old version of the record. The creation of the new version can be performed using a transaction that employs wait for dependencies to allow the old version of the record to be read while the transaction is creating the new version thereby minimizing the effect of the memory compaction process on other transactions in the main memory database.
    Type: Application
    Filed: June 21, 2012
    Publication date: December 26, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Dimitrios Tsirogiannis, Per-Ake Larson
  • Publication number: 20130339316
    Abstract: Deduplicated data is packed into finite-sized containers. A similarity score is calculated between files that are similarly of the deduplicated data. The similarity score is used for grouping the similarly compared files of the deduplicated data into subsets for destaging each of the subsets from a deduplication system to one a finite-sized container.
    Type: Application
    Filed: June 19, 2012
    Publication date: December 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael HIRSCH, Thorsten KRAUSE
  • Publication number: 20130339308
    Abstract: Disclosed herein are techniques for archiving data objects. It is determined whether a data object was rejected by an archiving module due to an information field thereof violating a protocol. If it is determined that the data object was rejected due to violation of the protocol, a compliant information field that complies with the protocol is generated such that the compliant information field causes the archiving module to permit archiving of the data object violating the protocol.
    Type: Application
    Filed: June 19, 2012
    Publication date: December 19, 2013
    Inventors: Richard Herschel Schwartz, Tarcio Constant, Scott Alan Lemieux
  • Publication number: 20130339322
    Abstract: In a compression processing storage system, using a pool of compression cores, the compression cores are assigned to process either compression operations, decompression operations, or decompression and compression operations, which are scheduled for processing. A maximum number of the compression cores are set for processing only the decompression operations, thereby lowering a decompression latency. A minimal number of the compression cores are allocated for processing the compression operations, thereby increasing compression latency. Upon reaching a throughput limit for the compression operations that causes the minimal number of the plurality of compression cores to reach a busy status, the minimal number of the plurality of compression cores for processing the compression operations is increased.
    Type: Application
    Filed: June 14, 2012
    Publication date: December 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan AMIT, Amir LIDOR, Sergey MARENKOV, Rostislav RAIKHMAN
  • Publication number: 20130339321
    Abstract: The present invention relates to a computer-implemented method, system and computer readable medium for providing a scalable bio-informatics sequence search on cloud. The method comprises the steps of partitioning a genome data into a plurality of datasets and storing the plurality of data sets in a database. Receiving at least one sequence search request input and searching for a genome sequence in the database corresponding to the search request input and scaling of the sequence search based on the sequence search request input.
    Type: Application
    Filed: June 13, 2012
    Publication date: December 19, 2013
    Applicant: Infosys Limited
    Inventors: S/shri. Shyam Kumar Doddavula, Madhavi Rani, Anirban Ghosh, Akansha Jain, Santonu Sarkar, Mudit Kaushik, Harsh Vachhani
  • Publication number: 20130332467
    Abstract: Data elements from data sources and having a data value set are linked by using hash functions to determine a dimensionally reduced instance signature for each data element based on all data values associated with that data element to yield a plurality of dimensionally reduced instance signatures of equivalent fixed size such that similarities among the data values in the data value sets across all data elements is maintained among the plurality of instance signatures. Candidate pairs of data elements to link are identified using the plurality of instance signatures in locality sensitive hash functions, and a similarity index is generated for each candidate pair using a pre-determined measure of similarity. Candidate pairs of data elements having a similarity index above a given threshold are linked.
    Type: Application
    Filed: July 8, 2012
    Publication date: December 12, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Mihaela Ancuta Bornea, Songyun Duan, Achille Belly Fokoue-Nkoutche, Oktie Hassanzadeh, Anastasios Kementsietsidis, Kavitha Srinivas, Michael J. Ward
  • Publication number: 20130332446
    Abstract: A repartitioning optimizer identifies alternative repartitioning strategies and selects optimal ones, accounting for network transfer utilization and partition sizes in addition to traditional metrics. If prior partitioning was hash-based, the repartitioning optimizer can determine whether a hash-based repartitioning can result in not every computing device providing data to every other computing device. If prior partitioning was range-based, the repartitioning optimizer can determine whether a range-based repartitioning can generate similarly sized output partitions while aligning input and output partition boundaries, increasing the number of computing devices that do not provide data to every other computing device. Individual computing devices, as they are performing a repartitioning, assign a repartitioning index to each individual data element, which represents the computing device to which such a data element is destined.
    Type: Application
    Filed: June 11, 2012
    Publication date: December 12, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Jingren Zhou, Nicolas Bruno, Wei Lin
  • Publication number: 20130332462
    Abstract: A system to generate content recommendations by identifying content and selecting a content entry for the content. The system comprises identifying a keyword in the content entry, generating a tag for the content based on the keyword, generating a plurality of recommendations based on the tag, and displaying the recommendations.
    Type: Application
    Filed: June 12, 2012
    Publication date: December 12, 2013
    Inventors: David Paul Billmaier, Jason Christopher Hall, Alexander Charies Barclay, John Max Kellum, Henry Hideyuki Yamamoto
  • Publication number: 20130318090
    Abstract: Embodiments of the invention provide a system, method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.
    Type: Application
    Filed: May 24, 2012
    Publication date: November 28, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
  • Publication number: 20130318050
    Abstract: Exemplary system, and computer program product embodiments for data deduplication using short term history in a computing environment are provided. In one embodiment, by way of example only, a hash value is calculated on data chunks for a read operation. The calculated hash value is stored in a storage media. The calculated hash value is looked up in the storage media to verify if a current write operation was previously written and/or read. Additional system and computer program product embodiments are disclosed and provide related advantages.
    Type: Application
    Filed: May 24, 2012
    Publication date: November 28, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan AMIT, Chaim KOIFMAN
  • Publication number: 20130311432
    Abstract: A computer identifies a relationship among a subset of a set of data blocks, a basis of the relationship forming a context shared by the subset of data blocks. The computer selects a code data structure from a set of code data structures using the context. The context is associated with the code data structure, and the code data structure includes a set of codes. The computer computes, for a first data block in the subset of data blocks, a first code corresponding to a content of the first data block. The computer determines whether the first code matches a stored code in the code data structure. The computer replaces, responsive to the first code matching the stored code, the first data block with a reference to an instance of the first data block. The computer causes the reference to be stored in a target data processing system.
    Type: Application
    Filed: May 21, 2012
    Publication date: November 21, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Vishal Chittranjan Aslot, Adekunle Bello, Brian W. Hart, Robert Wright Thompson
  • Publication number: 20130311479
    Abstract: According to one embodiment of the present invention, a system analyzes one or more change records based on text analytics using dictionaries and rules for the analysis in order to generate an index of analyzed data that represents the one or more change records. The change records each include a change and corresponding time frame for occurrence of the change. Information from a request is applied to the index of analyzed data to determine one or more candidate causes for the incident and the corresponding time frame for occurrence of the change. A time associated with the request is correlated with the corresponding time frame for occurrence of the change to identify the one or more candidate causes in the one or more change records as causes for the incident. Embodiments of the present invention further include a method and computer program product for determining causes of an incident.
    Type: Application
    Filed: May 21, 2012
    Publication date: November 21, 2013
    Applicant: International Business Machines Corporation
    Inventors: Dhruv A. Bhatt, Kristin E. McNeil, Nitaben A. Patel
  • Publication number: 20130311435
    Abstract: A method, computer product, and computer system of minimizing surprisal data comprising: at a source, reading and identifying characteristics of a genetic sequence of an organism; receiving an input of rank of at least two identified characteristics of the genetic sequence of the organism; generating a hierarchy of ranked, identified characteristics based on the rank of the at least two identified characteristics of the genetic sequence of the organism; comparing the hierarchy of ranked, identified characteristics to a repository of reference genomes; and if at least one reference genome from the repository matches the hierarchy of ranked, identified characteristics, breaking the matched reference genomes into pieces, combining pieces associated with the identified characteristics from at least one matched reference genome to form a filter pattern to be compared to the nucleotides of the genetic sequence of the organism, to obtain differences and create surprisal data.
    Type: Application
    Filed: June 8, 2012
    Publication date: November 21, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Robert R. Friedlander, James R. Kraemer