Indexing The Archive Patents (Class 707/673)
  • Publication number: 20100306217
    Abstract: In one embodiment, a mechanism for separating content from noisy context in template-based documents for search indexing is disclosed. In one embodiment, a method includes selecting a plurality of documents for index comparison, identifying one or more identical elements found in each of the plurality of documents, and removing the one or more identical elements from consideration in an indexing process of the plurality of documents.
    Type: Application
    Filed: May 28, 2009
    Publication date: December 2, 2010
    Inventor: James P. Schneider
  • Publication number: 20100306238
    Abstract: Techniques are disclosed for generating an index that supports both incremental document indexing and incremental term indexing. Documents and search terms may be received for which an index is to be generated. From this information, an index may be generated, partitioned in a first dimension by documents to create master segments and in a second dimension by search terms to create slave segments. A request to update the index to include a new document or a new search term may be received. The new document or new search term may be added to the index without modifying the entire index. Further, document identifiers may be synchronized across all segments. Synchronization refers to maintaining consistency of document identifiers across segments, despite renumbering of document identifiers during certain operations such as merging segments.
    Type: Application
    Filed: May 29, 2009
    Publication date: December 2, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES, CORPORATION
    Inventors: SREERAM V. BALAKRISHNAN, Michael Busch
  • Publication number: 20100306222
    Abstract: A system and method for accelerating searches of B-trees. An auxiliary index that is optimized for use with a cache is used in conjunction with a B-tree. A hash type of auxiliary index maintains pointers to key entries in the B-tree leaf nodes. The hash type of index may be searched, and a resulting pointer is used to locate records of the B-tree, bypassing a search of the B-tree. A top level type of auxiliary index maintains pointers to leaf nodes or internal nodes of the B-tree. A top level index may be searched, and a search of the B-tree is performed beginning with the node found by using the top level index. A monitoring mechanism may automatically start, change, or discard the auxiliary index based on an amount of cache memory, types of searches, or other factors. The auxiliary index may be optimized for high performance in read only searches, while the B-tree provides transaction durability.
    Type: Application
    Filed: May 29, 2009
    Publication date: December 2, 2010
    Applicant: Microsoft Corporation
    Inventors: Craig Freedman, Cristian Diaconu, Michael Zwilling
  • Publication number: 20100306273
    Abstract: An apparatus, system, and method are disclosed for efficient content indexing of streaming XML document content. A forest generator generates an XML pattern forest from a set of structured index path expressions, the XML pattern forest includes trees and twigs generated from structured index path expressions uniquely associated with a namespace indicator for an XML node. The XML node is identified in a stream of at least one XML document. A comparison module compares the XML node to nodes of trees and twigs of the XML pattern forest. A determination module determines a match between the XML node and an index node in one of a tree and a twig of the XML pattern forest. The index node has a path from an ancestor node to the index node that matches the axis steps of at least one of the structured index path expressions.
    Type: Application
    Filed: June 1, 2009
    Publication date: December 2, 2010
    Applicant: International Business Machines Corporation
    Inventors: James P. Branigan, David P. Charboneau, Simon K. Johnston
  • Publication number: 20100293145
    Abstract: A method includes identifying with a server a first range of data blocks in a storage device array corresponding to data files selected for replication, the first range of data blocks being managed by a source host device; mapping the first range of data blocks to a second range of data blocks in the storage device array managed by the destination host device; copying the data blocks from the first range that contain the data files selected for replication to the corresponding data blocks in the second range; deleting files in the copied data blocks of the second range that have not been selected for replication; and condensing the second range of data blocks.
    Type: Application
    Filed: July 2, 2009
    Publication date: November 18, 2010
    Applicant: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
    Inventors: Abhik Das, Rajesh Anantha Krishnaiyer
  • Patent number: 7836022
    Abstract: Provided are techniques for receiving a request to archive a child table. In response to receiving the request, a join operation is performed on the child table and parent archive data to create child archive data.
    Type: Grant
    Filed: October 12, 2005
    Date of Patent: November 16, 2010
    Assignee: International Business Machines Corporation
    Inventor: Allan E. Gillespie
  • Publication number: 20100287171
    Abstract: A method and system stores and retrieves data items associated with a primary key, using search indices at multiple storage locations. A server receives a primary key, identifies one or more segments of the primary key, and hashes each segment with one or more hash functions to obtain a sequence of hash values. The hash values are used as keys to index a chain of search indices that are stored in multiple storage locations. One or more of the hash values in the sequence are used to form a host name, and the host name is mapped to an address of a server that stores a first search index in the chain. The last search index in the chain contains the data items associated with the primary key, or provides a reference to one or more locations at which the data items can be found.
    Type: Application
    Filed: May 11, 2009
    Publication date: November 11, 2010
    Applicant: Red Hat, Inc.
    Inventor: James P. Schneider
  • Patent number: 7814074
    Abstract: The present invention provides for a system and method for assuring integrity of deduplicated data objects stored within a storage system. A data object is copied to secondary storage media, and a digital signature such as a checksum is generated of the data object. Then, deduplication is performed upon the data object and the data object is split into chunks. The chunks are combined when the data object is subsequently accessed, and a signature is generated for the reassembled data object. The reassembled data object is provided if the newly generated signature is identical to the originally generated signature, and otherwise a backup copy of the data object is provided from secondary storage media.
    Type: Grant
    Filed: March 14, 2008
    Date of Patent: October 12, 2010
    Assignee: International Business Machines Corporation
    Inventors: Matthew J. Anglin, David M. Cannon
  • Patent number: 7783608
    Abstract: Storage system and method are provided which integrate CAS name space with NAS name space in GNS. The storage system implements archive application functionalities, such as: 1) The name space of CAS can be integrated with NASs under GNS; 2) The storage system is equipped with CAS interface to receive the CAS command from an archive application; 3) The storage system is equipped with index creation and search functionalities; during file archiving from NAS to CAS, a detailed indexing is created; 4) During a file archiving from NAS to CAS, default metadata for the archived file is added; and 5) During a file restore, the storage system can maintain a pointer to the location on CAS, and the pointer is used at a file re-archiving to utilize original metadata and index for the re-archiving file.
    Type: Grant
    Filed: August 9, 2007
    Date of Patent: August 24, 2010
    Assignee: Hitachi, Ltd.
    Inventor: Hidehisa Shitomi
  • Patent number: 7752192
    Abstract: The present invention provides a computer implemented method, an apparatus, and a computer usable program product for indexing data. A controller identifies a set of data to be indexed, wherein a set of data structure trees represents the set of data. The controller merges the set of data structure trees to form a unified tree, wherein the unified tree contains a node for each unit of data in the set of data. The controller assigns an identifier to the node for each unit of data in the set of data that describes the node within the unified tree. The controller then serializes the unified tree to form a set of sequential series that represents the set of data structure trees, wherein the set of sequential series forms an index for the set of data.
    Type: Grant
    Filed: March 2, 2007
    Date of Patent: July 6, 2010
    Assignee: International Business Machines Corporation
    Inventors: Xiaohui Gu, Lipyeow Lim, Haixun Wang, Min Wang
  • Patent number: 7747579
    Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.
    Type: Grant
    Filed: November 28, 2006
    Date of Patent: June 29, 2010
    Assignee: CommVault Systems, Inc.
    Inventors: Anand Prahlad, Jeremy Alan Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
  • Publication number: 20100138393
    Abstract: A modular computer storage system and method is provided for managing and directing data archiving functions, which is scalable and comprehends various storage media as well as diverse operating systems on a plurality of client devices. A client component is associated with one or more client devices for generating archival request. A file processor directs one or more storage devices, through one or more media components, which control the actual physical level backup on various storage devices. Each media component creates a library indexing system for locating stored data. A management component coordinates the archival functions between the various client components and the file processor, including setting scheduling policies, aging policies, index pruning policies, drive cleaning policies, configuration information, and keeping track of running and waiting jobs.
    Type: Application
    Filed: February 9, 2010
    Publication date: June 3, 2010
    Applicant: COMMVAULT SYSTEMS, INC.
    Inventors: John Crescenti, Srinivas Kavuri, David Alan Oshinsky, Anand Prahlad
  • Patent number: 7716171
    Abstract: Managing backup data comprises accessing a snapshot of a data set, wherein the data set includes at least one object and the snapshot includes a replica of the data set, and adding to an index associated with the snapshot, with respect to each of one or more objects included in the snapshot, index data indicating at least where the object is located within the snapshot.
    Type: Grant
    Filed: August 18, 2005
    Date of Patent: May 11, 2010
    Assignee: EMC Corporation
    Inventor: Nathan Kryger
  • Patent number: 7707204
    Abstract: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.
    Type: Grant
    Filed: December 13, 2005
    Date of Patent: April 27, 2010
    Assignee: Microsoft Corporation
    Inventors: Hang Li, Jianfeng Gao, Yunbo Cao
  • Patent number: 7707188
    Abstract: A data archival system for the automated archiving of data files. The data archival system includes a central processing hub for receiving the data files, a data archival facility connected to the central processing hub; and an archival manager which is configured to cause the transmission of the data files from the central processing hub to the data archival facility in response to an archive request; the archival of the data files transmitted to the data archival facility in response to an archive request; the retrieval of the data files previously archived in response to a retrieval request; and the transmission of the retrieved data files from the data archival facility to the central processing hub.
    Type: Grant
    Filed: December 20, 2002
    Date of Patent: April 27, 2010
    Assignee: Schlumberger Technology Corporation
    Inventors: Yogendra C. Pandya, Cyril Laroche-Py
  • Patent number: 7668884
    Abstract: Systems and methods for data classification to facilitate and improve data management within an enterprise are described. The disclosed systems and methods evaluate and define data management operations based on data characteristics rather than data location, among other things. Also provided are methods for generating a data structure of metadata that describes system data and storage operations. This data structure may be consulted to determine changes in system data rather than scanning the data files themselves.
    Type: Grant
    Filed: November 28, 2006
    Date of Patent: February 23, 2010
    Assignee: CommVault Systems, Inc.
    Inventors: Anand Prahlad, Jeremy A. Schwartz, David Ngo, Brian Brockway, Marcus S. Muller