Indexing The Archive Patents (Class 707/673)
  • Patent number: 8108355
    Abstract: To provide an index for a table in a database system, the index is partially sorted in an initial phase of building the index. Subsequently, in response to accessing portions of the index to process a database query, further sorting of the accessed portions of the index is performed.
    Type: Grant
    Filed: October 27, 2006
    Date of Patent: January 31, 2012
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Bin Zhang
  • Patent number: 8099397
    Abstract: An apparatus, system, and method are disclosed for improved Portable Document Format (“PDF”) document archiving. The method includes scanning a source PDF document for a shared resource. The source PDF document includes a plurality of records. The shared resource includes a common resource referenced by way of a resource pointer associated with a record of the source PDF document. The method includes copying the shared resource to a resource group associated with the source PDF document. The method also includes short-circuiting a link between content for the shared resource and the resource pointer in each record that points to the shared resource. The method includes extracting a record from the source PDF document. The extracted record is void of content for the shared resource in response to the short-circuited link. Thus, records may be stored in a standalone format without excessive storage space requirements.
    Type: Grant
    Filed: August 26, 2009
    Date of Patent: January 17, 2012
    Assignee: International Business Machines Corporation
    Inventors: Gregory S. Felderman, Brian K. Hoyt
  • Patent number: 8090694
    Abstract: A method to index locally recorded content at a media device includes extracting, at a remote service provider, event index data from an event being locally recorded at a media device and associating the event index data with locator code data of the event. The method further includes storing, at the remote service provider, the extracted event index data and the associated locator code data; searching the extracted event index data for a plurality of segments associated with the event, the search being associated with a search request; determining index display data for a presentation of the plurality of segments based on the search request; and transmitting, to the media device, the locator code data associated with the plurality of segments, and the index display data.
    Type: Grant
    Filed: November 2, 2006
    Date of Patent: January 3, 2012
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Behzad Shahraray, David Gibbon, Lee Begeja, Zhu Liu, Richard V. Cox, Bernard S. Renger
  • Patent number: 8090695
    Abstract: As described herein, a high-availability server system includes at least a source server system and a target server system that dynamically restore message object search indexes. Both the source server system and the target server system store copies of a mailbox database and a search index for the mailbox database. As changes are requested to the mailbox database, events are added to event lists maintained at the source node and the target node. When the data storage system at the target server system enters an error state, the source server system sends to the target server system a set of data that the target server system can use to generate a copy of search index. The target server system may then resume applying events in the event list to the search index. In this way, it may not be necessary to completely re-index the mailbox database at the target node.
    Type: Grant
    Filed: December 5, 2008
    Date of Patent: January 3, 2012
    Assignee: Microsoft Corporation
    Inventors: Ashish Consul, Suryanarayana M. Gorti
  • Patent number: 8086571
    Abstract: A table lookup indexing system for the transmission of data packets in a network switch. Data is received in an input port and is divided into two parts, an index portion and a bucket portion. The index portion selects a particular bucket and the combination of the index portion and bucket portion selects a specific entry in the table.
    Type: Grant
    Filed: August 5, 2009
    Date of Patent: December 27, 2011
    Assignee: Broadcom Corporation
    Inventor: Govind Malalur
  • Patent number: 8082229
    Abstract: Various embodiments of a method, system and computer program product backup and recover a database. A database is distributed in a plurality of storage devices. A target designation designating a target database is received. One or more storage devices of the plurality of storage devices, storing at least a portion of the target database, are selected. A quiesce point is established by completing an ongoing transaction for the target database and inhibiting a further transaction. In response to establishing the quiesce point, a backup is generated by collectively copying data on each storage device of the one or more selected storage devices. The backup associated with a quiesce point indication indicating backed up data of the said each storage device of the one or more selected storage devices in accordance with the quiesce point, are recorded.
    Type: Grant
    Filed: February 3, 2006
    Date of Patent: December 20, 2011
    Assignee: International Business Machines Corporation
    Inventors: Soh Kaijima, Takashi Saitoh, Kenji Seta
  • Publication number: 20110307451
    Abstract: A method and system for efficiently archiving and retrieving objects using a distributed network of devices wherein the users define attributes, distribution lists, subscribers to content and objects. The objects can be archived, searched for, tagged, indexed, attributed, restored and mined. Objects have signatures that indicate where they came from and where they are stored. Attributes include system attributes which may geo-reference objects. Attributes and signatures can be associated with alerts and notifications to subscribers who register interest in receiving alerts about objects, object signatures or attributes.
    Type: Application
    Filed: June 10, 2010
    Publication date: December 15, 2011
    Applicant: EnduraData, Inc,
    Inventors: Abderrahman Aba El Haddi, Anass Taouil, Jeffrey Brian Marckel, Zakaria Baani
  • Patent number: 8065277
    Abstract: Methods and systems for storing information extracted from a file are presented. These methods and systems can be used to store content and metadata extracted from a file, and to associate the content and metadata so a holistic image of the file may be maintained. Additionally, these methods and systems may allow the location of a file to be stored and associated with the content or metadata of the file. Methods, systems and databases of this type may be especially useful in avoiding duplication of data by allowing the content and metadata of files to be compared to previously stored content and metadata.
    Type: Grant
    Filed: January 16, 2004
    Date of Patent: November 22, 2011
    Inventors: Daniel John Gardner, Maurilio Torres
  • Patent number: 8065267
    Abstract: A step or means for associating a file with a cell in a table format by, for example, pasting an icon representing the file, wherein related data to be simultaneously referenced along with the data in the cell with which the file is associated is read according to a data entry positioning rule of a table format. Further, the step/means indicates merging file common condition data with record data in the file by adding the read related data to its each constituent record as the common condition value of the data file with which the corresponding cell is associated, and includes naming a file by converting a character string representing the read related data into a character string according to a prescribed rule and positioning it in a predetermined position in a template character string.
    Type: Grant
    Filed: January 12, 2006
    Date of Patent: November 22, 2011
    Inventor: Masatsugu Noda
  • Patent number: 8051045
    Abstract: Methods and apparatus, including computer program products, for archiving data from a database. One method includes identifying a data record to be archived; determining the contents of an archive record, the archive record having values for a first plurality of attributes in the data record; storing the archive record in a data archive; determining the contents of an index record, the index record comprising values for a second plurality of attributes in the data record; adding the index record to a dictionary-based archive index with a reference to the location of the archive record in the data archive; deleting the data record from the database; accepting a query for a desired archive record; and performing a search of the archive index to find the desired archive record.
    Type: Grant
    Filed: August 31, 2005
    Date of Patent: November 1, 2011
    Assignee: SAP AG
    Inventor: Hartmut K. Vogler
  • Patent number: 8046365
    Abstract: An apparatus stores one or more document information of which access right is managed by an access right management apparatus, and generates an index of stored document information. The apparatus receives user identification information, and sends the user identification information, and information for identifying document information of which index has not been generated to the access right management apparatus. The apparatus receives access right information associated with the user from the access right management apparatus, and generates index of the identified document information based on the received access right information.
    Type: Grant
    Filed: March 12, 2007
    Date of Patent: October 25, 2011
    Assignee: Canon Kabushiki Kaisha
    Inventor: Shigemi Saito
  • Patent number: 8046361
    Abstract: An improved system and method for classifying tags of content using a hyperlinked corpus of classified web pages is provided. An anchor text index may be searched to find anchor texts that may match text of the tag, documents referenced by the matching anchor texts may be found, and the documents referenced by the matching anchor texts may be grouped to disambiguate multiple classifications that result from matching the anchor texts with the categories of the reference documents. To resolve ambiguity between multiple classifications, weighted classifications may be used where each document may be assigned a positive weight for a mapping to a category to indicate the confidence of the classification of the document to the category. The classification for the grouping of the documents referenced by the matching anchor texts with greatest frequency may be selected and output as the classification for the tag.
    Type: Grant
    Filed: April 18, 2008
    Date of Patent: October 25, 2011
    Assignee: Yahoo! Inc.
    Inventors: Börkur Sigurbjörnsson, Roelof van Zwol, Simon E. Overell
  • Patent number: 8037035
    Abstract: A computer-readable, non-transitory medium stores a program that manages compressed file groups on a plurality of slave servers. The file groups include compressed files that are to be searched and have character strings. Each of the compressed file groups is expanded, using a Huffman tree that was used for compressing the compressed file group. A common compression parameter is generated based on appearance frequency, by summing, for each character, the appearance frequency in each of the compressed file groups. The expanded files are recompressed using the common Huffman tree such that sums of the access frequencies of the compressed files that are origins of the recompressed files are substantially equivalent among various slave servers. New archives including the re-compressed files are transmitted to the respective slave servers.
    Type: Grant
    Filed: January 28, 2009
    Date of Patent: October 11, 2011
    Assignee: Fujitsu Limited
    Inventors: Masahiro Kataoka, Tatsuhiro Sato, Takashi Tsubokura
  • Patent number: 8037031
    Abstract: A method and system for creating an index of content without interfering with the source of the content includes an offline content indexing system that creates an index of content from an offline copy of data. The system may associate additional properties or tags with data that are not part of traditional indexing of content, such as the time the content was last available or user attributes associated with the content. Users can search the created index to locate content that is no longer available or based on the associate attributes.
    Type: Grant
    Filed: December 20, 2010
    Date of Patent: October 11, 2011
    Assignee: CommVault Systems, Inc.
    Inventors: Parag Gokhale, Rajiv Kottomtharayil, Deepak R. Attarde, Jun H. Ahn
  • Publication number: 20110231372
    Abstract: In one embodiment, input is received from a user defining a classification and an analytic for the classification. Multiple classifications and analytics may be defined by a user. A definition of relevance parameters is determined that characterize the classification and a set of analytics measures associated with the analytic. The definition may be for the classification. Unstructured data and structured data are analyzed based on the definition of the relevance parameters to determine relevant data in the unstructured data and the structured data. The relevant data being data that is determined to be relevant to the classification defined by the user. An index of the terms from the relevant data is determined. The index is useable by an analytics tool to provide results for queries of the unstructured data and structured data. The query may be used within the classification such that targeted results are provided using the index and the relevant data to the classification.
    Type: Application
    Filed: March 21, 2011
    Publication date: September 22, 2011
    Inventors: Joan Wrabetz, Aloke Guha
  • Patent number: 8019731
    Abstract: A method and system for updating an archive of a computer file to reflect changes made to the file includes selecting one of a plurality of comparison methods as a preferred comparison method. The comparison methods include a first comparison method wherein the file is compared to an archive of the file and a second comparison method wherein a first set of tokens statistically representative of the file is computed and compared to a second set of tokens statistically representative of the archive of the file. The method further includes carrying out the preferred comparison method to generate indicia of differences between the file and the archive of the file for updating the archive of the file.
    Type: Grant
    Filed: September 22, 2010
    Date of Patent: September 13, 2011
    Assignee: Computer Associates Think, Inc.
    Inventor: Karl D. Forster
  • Patent number: 8018455
    Abstract: A multi-user animation process receives input from multiple remote clients to manipulate avatars through a modeled 3-D environment. Each user is represented by an avatar. The 3-D environment and avatar position/location data is provided to client workstations, which display a simulated environment visible to all participants. A text or speech-based bulletin board application is coupled to the animation process. The bulletin board application receives text or speech input from the multiple remote users and publishes the input in a public forum. The bulletin board application maintains multiple forums organized by topic. Access or participation to particular forums is coordinated with the animation process, such that each user may be permitted access to a forum only when the user's avatar is located within a designated room or region of the modeled 3-D environment.
    Type: Grant
    Filed: October 4, 2010
    Date of Patent: September 13, 2011
    Inventor: Brian Mark Shuster
  • Patent number: 8015146
    Abstract: In a networked information system, a portion of the information processing is offloaded from servers to a storage system to reduce network traffic and conserve server resources. The information system includes a storage system storing files or objects and having a function which automatically extracts portions of text from the files and transmits the extracted text to the servers. The text extraction is responsive to file requests from the servers. The extracted text and files are stored on the storage system, decreasing the need to send entire files across the network. Thus, by transmitting smaller extracted text data instead of entire files over the network, network performance can be increased through the reduction of traffic. Additionally, the processing strain on physical resources of the servers can be reduced by extracting the text at the storage system rather than at the servers.
    Type: Grant
    Filed: June 16, 2008
    Date of Patent: September 6, 2011
    Assignee: Hitachi, Ltd.
    Inventor: Yasuyuki Mimatsu
  • Patent number: 8010501
    Abstract: A computer implemented method for transforming an inverted index of a collection of documents into a smaller inverted index of documents. The smaller index contains links to all and only to those documents appearing in a subset of the original collection of documents. The method avoids reprocessing the subset to create the smaller inverted index by intersecting each inverted list with the list of document references from the desired subset. If this intersection is empty then the list is removed from the new smaller index, otherwise the list containing only the intersected reference list is included in the new inverted index. The method is also extended to deal with creating multiple smaller inverted indexes and with propagating updates changes in the first collection of documents down into the smaller inverted index or indexes.
    Type: Grant
    Filed: September 4, 2007
    Date of Patent: August 30, 2011
    Assignee: Exalead
    Inventors: François Bourdoncle, Florian Douetteau, Stéphane Donze
  • Patent number: 8001088
    Abstract: A scalable infrastructure indexes and tracks media data and metadata in a distributed, multi-user system. An indexer is associated with particular storage locations, such as a disk, or a directory on a disk, to maintain an index of media files or metadata stored in those storage locations. The indexer monitors activity on any storage location with which it is associated. Any additions, deletions or modifications to files in that storage location cause the indexer to update its index. This index then can be accessed by any of a number of applications in the same manner as conventional indexes. There may be different indexers for different storage locations. Separate indexers may be provided for media files and compositions that use those media files.
    Type: Grant
    Filed: April 4, 2003
    Date of Patent: August 16, 2011
    Assignee: Avid Technology, Inc.
    Inventor: Roger Tawa, Jr.
  • Patent number: 7996368
    Abstract: A device list is created including one or more device objects, wherein each device object represents a physical device coupled to a computer system, wherein each device object includes one or more device attributes of the physical device. The device list is indexed into using a device attribute.
    Type: Grant
    Filed: September 6, 2005
    Date of Patent: August 9, 2011
    Assignee: Cyress Semiconductor Corporation
    Inventors: Greg Nalder, Eric Luttmann
  • Patent number: 7996369
    Abstract: A computer process, called VGRAM, improves the performance of these string search algorithms in computers by using a carefully chosen dictionary of variable-length grams based on their frequencies in the string collection. A dynamic programming algorithm for computing a tight lower bound on the number of common grams shared by two similar strings in order to improve query performance is disclosed. A method for automatically computing a dictionary of high-quality grams for a workload of queries. Improvement on query performance is achieved by these techniques by a cost-based quantitative approach to deciding good grams for approximate string queries. An approach for answering approximate queries efficiently based on discarding gram lists, and another is based on combining correlated lists. An indexing structure is reduced to a given amount of space, while retaining efficient query processing by using algorithms in a computer based on discarding gram lists and combining correlated lists.
    Type: Grant
    Filed: December 14, 2008
    Date of Patent: August 9, 2011
    Assignee: The Regents of the University of California
    Inventors: Chen Li, Bin Wang, Xaochun Yang, Alexander Behm, Shengyue Ji, Jiaheng Lu
  • Patent number: 7996418
    Abstract: Technologies are described herein for suggesting long-tail tags. A first group of tags and a second group of tags are identified from a plurality of tags. The first group of tags includes frequently-assigned tags having a higher frequency of being assigned to an asset. The second group of tags includes long-tail tags having a lower frequency of being assigned to the asset than the frequently-assigned tags. The frequently-assigned tags and a sample of the long-tail tags are suggested to a user upon receiving a request from the user to tag the asset.
    Type: Grant
    Filed: April 30, 2008
    Date of Patent: August 9, 2011
    Assignee: Microsoft Corporation
    Inventors: Alex David Weinstein, Dmitry Yevgenyevich Ryabkov
  • Patent number: 7992036
    Abstract: An apparatus, system, and method are disclosed for restoring cluster server data at a volume level. A setup module opens at least one source volume of a cluster server for a volume-level restore, flushes each buffer for the at least one source volume, closes the at least one source volume, disables file system checks for the cluster disks, saves disk signatures of the cluster disks, and disables device-level checks for the cluster disks. A copy module copies data with a volume-level restore from the at least one snapshot volume to the at least one source volume. A reset module rewrites the saved disk signatures to the cluster disks, re-enables the device-level checks for the cluster disks, and resets at least one volume attribute on the at least one source volume.
    Type: Grant
    Filed: January 22, 2007
    Date of Patent: August 2, 2011
    Assignee: International Business Machines Corporation
    Inventors: Neeta Garimella, Delbert Barron Hoobler, III
  • Patent number: 7987165
    Abstract: An indexing system, including a server for providing access to at least one site, a server agent for creating an index file of data relating to the site, and a central index for storing index information from the index file. The server agent initiates communication with the central index to transfer the index file from the server agent to the central index.
    Type: Grant
    Filed: December 18, 2000
    Date of Patent: July 26, 2011
    Assignee: Youramigo Limited
    Inventors: Robert James Steele, David Martin Powers
  • Patent number: 7979398
    Abstract: Techniques provide a file plan including a plurality of containers, wherein each container is capable of providing management information for record information objects assigned to the container, wherein the record information objects represent documents, wherein one of the containers points to a physical record. An electronic record associated with the physical record is stored. The physical record is automatically associated with the electronic record by updating the file plan.
    Type: Grant
    Filed: December 22, 2006
    Date of Patent: July 12, 2011
    Assignee: International Business Machines Corporation
    Inventor: Tod DeBie
  • Patent number: 7974973
    Abstract: Apparatus, methods, and computer readable medium for monitoring a database and for determining aggregate I/O wait times (i.e. for a ‘target’ index or table) associated at least one I/O category selected from a plurality of I/O categories are disclosed herein.
    Type: Grant
    Filed: August 7, 2008
    Date of Patent: July 5, 2011
    Assignee: Precise Software Solutions Inc.
    Inventors: Ehud Eshet, Rafi Balbirsky, Sigal Gelbart, Ori Rosen, Ilan Shiber
  • Patent number: 7974969
    Abstract: Apparatus, methods, and computer readable medium for monitoring a database and for determining an estimated index-overhead for a given index is provided. A description of database performance may be presented to a user in accordance with the determined index overhead. Furthermore, in some embodiments, apparatus, methods and computer-code for (i) determining fractional aggregate index-wait time in accordance with database statement execution plans and (ii) presenting a description of database performance in accordance with the fractional aggregated index-wait time are also disclosed.
    Type: Grant
    Filed: August 7, 2008
    Date of Patent: July 5, 2011
    Assignee: Precise Software Solutions Inc.
    Inventors: Rafi Balbirsky, Ilanit Nulman
  • Patent number: 7970742
    Abstract: Techniques for history enabling a table in a database system so that past versions of rows of the history-enabled table are available for temporal querying. The table is history enabled by adding a start time column to the table and creating a history table for the history-enabled table. The history table's rows are copies of rows of the history-enabled table that have changed and include start time and end time fields whose values indicate a period in which the history table's row was in the history-enabled table. Temporal queries are performed on a view which is the union of the history-enabled table and the history table. The temporal queries are speeded up by period of time indexes in which the leaves are grouped based on time period size, identifiers are assigned to the groups, and the keys of the index include the group identifiers.
    Type: Grant
    Filed: December 1, 2005
    Date of Patent: June 28, 2011
    Assignee: Oracle International Corporation
    Inventors: Robert Hanckel, Jayanta Banerjee, Siva Ravada
  • Patent number: 7958093
    Abstract: A system and method for optimizing a storage system to support short data object lifetimes and highly utilized storage space are provided. With the system and method, data objects are clustered based on when they are anticipated to be deleted. When an application stores data, the application provides an indicator of the expected lifetime of the data, which may be a retention value, a relative priority of the data object, or the like. Data objects having similar expected lifetimes are clustered together in common data structures so that clusters of objects may be deleted efficiently in a single operation. Expected lifetimes may be changed by applications automatically. The system automatically determines how to handle these changes in expected lifetime using one or more of copying the data object, reclassifying the container in which the data object is held, and ignoring the change in expected lifetime for a time to investigate further changes in expected lifetime of other data objects.
    Type: Grant
    Filed: September 17, 2004
    Date of Patent: June 7, 2011
    Assignee: International Business Machines Corporation
    Inventors: Kay Schwendimann Anderson, Frederick Douglis, Nagui Halim, John Davis Palmer, Elizabeth Suzanne Richards, David Tao, William Harold Tetzlaff, John Michael Tracey, Joel Leonard Wolf
  • Patent number: 7950062
    Abstract: A system (and a method) is disclosed for fingerprinting based entity extraction using a rolling hash technique. The system is configured to receive an input stream of a predetermined length comprising characters, and a hash table having indexed entries. The system isolates, through a defined fixed window length, a set of characters of the input stream. A hash key is generated and used to index into the hash table. The system compares the isolated set of characters of the input stream with the entry corresponding to the index into the hash table to determine whether there is an exact match with the entry. The system slides the fixed window length one character to isolate another set of characters of the input stream in response to no exact match from the comparison. Alternatively, the system stores the input stream in response to an exact match from the comparison.
    Type: Grant
    Filed: August 3, 2007
    Date of Patent: May 24, 2011
    Assignee: Trend Micro Incorporated
    Inventors: Liwei Ren, Shu Huang
  • Patent number: 7945535
    Abstract: In one embodiment, there is provided a method for a media storage device to manage digital content. The method comprises determining if there is digital content to be categorized into one or more galleries; automatically categorizing said digital content into the one or more galleries; and for digital content categorized into a gallery with an auto-publish flag, sending at least one of said digital content and a derivative form of said digital content to a server.
    Type: Grant
    Filed: December 13, 2005
    Date of Patent: May 17, 2011
    Assignee: Microsoft Corporation
    Inventors: Michael J Toutonghi, Jaroslav Bengl
  • Patent number: 7937372
    Abstract: Managing backup data comprises mounting a snapshot of a file system. Each of the plurality of snapshots is taken at a particular time and each comprises a replica of the data set at that particular time. The mounted snapshot is accessed. For each of the one or more file system objects included in the accessed snapshot, index data is added which indicates that each of the one or more file system objects is located within the accessed snapshot. This information is added to an index associated with the snapshot so that it is able to be determined, using the index and without having to again mount the accessed snapshot, whether an object of interest is included in the snapshot.
    Type: Grant
    Filed: March 17, 2010
    Date of Patent: May 3, 2011
    Assignee: EMC Corporation
    Inventor: Nathan Kryger
  • Patent number: 7921101
    Abstract: A method and system are provided for maintaining an XML index in response to piece-wise modifications on indexed XML documents. The database server that manages the XML index determines which nodes are involved in the piece-wise modifications, and updates the XML index based on only those nodes. Index entries for nodes not involved in the piece-wise modifications remain unchanged.
    Type: Grant
    Filed: July 15, 2008
    Date of Patent: April 5, 2011
    Assignee: Oracle International Corporation
    Inventors: Ravi Murthy, Sivasankaran Chandrasekaran, Ashish Thusoo, Nipun Agarwal, Eric Sedlar
  • Patent number: 7912817
    Abstract: Data is decayed over time by a type of data item by identifying constituent units of each data item; creating a shelf-life criterion for the constituent units by assigning dimensions to each data item and to each constituent unit; for each of the data items of the data item type, establishing relationship factors for each data item to other data items, between constituent units within data items, and between data items; periodically calculating or updating a decomposability index for each constituent unit as a function of the priority dimensions and the data life dimensions by moving the index towards a threshold for constituent units which are reproducible; and subsequently, decaying the data by deleting from storage constituent units which have decomposability indices exceeding a configured threshold, thereby reducing the amount of storage occupied by a remaining plurality of data items.
    Type: Grant
    Filed: January 14, 2008
    Date of Patent: March 22, 2011
    Assignee: International Business Machines Corporation
    Inventors: Oriana Jeannette Love, Borna Safabakhsh
  • Patent number: 7908253
    Abstract: Data indexing using polyarchical indexing codes and automatically generated expansion paths. For a piece of data, an indexing code is received relating to a particular categorization or other indexing parameter. Based upon the indexing code, one or more expansion sets of codes are retrieved and applied to the piece of data. The expansion sets of codes may include indexing codes that relate to hierarchical levels of indexing. The expansion sets of codes may also include different expansion paths through the hierarchical levels of indexing. The polyarchical codes may include multiple cross-categorization of the data across the same or different levels of categories. They may also include multiple expansion paths in different directions across hierarchical levels of categories or indexing.
    Type: Grant
    Filed: August 7, 2008
    Date of Patent: March 15, 2011
    Assignee: Factiva, Inc.
    Inventors: Jonathan Guy Grenside Cooke, Andrew Richard Young
  • Patent number: 7890471
    Abstract: The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure.
    Type: Grant
    Filed: July 19, 2007
    Date of Patent: February 15, 2011
    Assignee: International Business Machines Corporation
    Inventors: Wei Fan, Haixun Wang, Philip Shi-Lung Yu
  • Patent number: 7885937
    Abstract: A presence management system may communicate contact information with mapped values. Contact information may be stored in a hierarchical, extensible structure (“hierarchical extensible contact structure”). Devices in a presence management system utilize a mapping scheme to map contact values (e.g., e-mail address, phone number, etc.) to the appropriate field of the hierarchical extensible contact structure. When devices in the presence management system communicate information for thousands of contacts, employing mapped values to navigate the hierarchical extensible contact structure reduces the size of the messages, thus reducing resource consumption (e.g., bandwidth), particularly on the scale of an enterprise.
    Type: Grant
    Filed: October 2, 2007
    Date of Patent: February 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Gary M. Beadle, Michael L. Masterson
  • Patent number: 7882076
    Abstract: An arrangement for performing at least one of collecting and analyzing data from a tool cluster configured to process a set of substrates is provided. The arrangement includes a plurality of tools from which at least one tool of the plurality of tools has a chamber for processing at least one of the set of substrates. The arrangement also includes a plurality of secondary servers configured to collect sensor data from the plurality of tools. The arrangement further includes a primary server communicably coupled with the plurality of secondary servers and configured to execute a database management system. The sensor data is indexed using a plurality of indexing applications on the plurality of secondary servers prior to being forwarded to the primary server for use by the database management system. Indexing includes associating a sensor data item with an identity of a server where the sensor data item is stored.
    Type: Grant
    Filed: December 14, 2006
    Date of Patent: February 1, 2011
    Assignee: Lam Research Corporation
    Inventors: Chad R. Weetman, Chung-Ho Huang
  • Patent number: 7882077
    Abstract: A method and system for creating an index of content without interfering with the source of the content includes an offline content indexing system that creates an index of content from an offline copy of data. The system may associate additional properties or tags with data that are not part of traditional indexing of content, such as the time the content was last available or user attributes associated with the content. Users can search the created index to locate content that is no longer available or based on the associate attributes.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: February 1, 2011
    Assignee: CommVault Systems, Inc.
    Inventors: Parag Gokhale, Rajiv Kottomtharayil, Deepak R. Attarde, Jun H. Ahn
  • Publication number: 20100332457
    Abstract: A segment encompasses a number of segment records less than the total number of records of a database. The segment records have values for a field of the database. Lowest and highest values of the segment records for the field, and a bitmap for the segment, can be determined and stored. Selected bits of the bitmap each correspond to a value for the field. Each selected bit is set to one where at least one segment record has the value to which the bit corresponds. An index relating to just the segment records can be determined and stored. The lowest and highest values, and the bitmap, are adapted to permit determination of whether the segment has to be loaded into memory to locate records that satisfy a query. The index is adapted to permit searching of the segment records after the segment has been loaded into the memory.
    Type: Application
    Filed: June 27, 2009
    Publication date: December 30, 2010
    Inventor: Goetz Graefe
  • Publication number: 20100332501
    Abstract: A system and method for on-demand indexing in a data management system is described. An index is generated when it is requested, such as when a database operation requires access to the index. If the index is loaded in memory, the index is retrieved from memory. Otherwise, the index is generated on-demand. A priority configuration identifies at least one priority index which is generated and loaded in memory. The priority configuration can identify priority indexes either directly or indirectly, such as by a threshold parameter.
    Type: Application
    Filed: June 29, 2009
    Publication date: December 30, 2010
    Inventors: Mark E. Hanson, Richard T. Endo, Simon D. Shipilfoygel, Emil Antonov, Xidong Zheng, Hayim Hendeles, David E. Brookler
  • Publication number: 20100332437
    Abstract: A system for generating a media playlist comprising a media management module operable to select a first media item from a plurality of media items stored in a media database for playback; and using raw user input data representing a measure of the popularity of the first media item, generate preference data representing a refined user preference for the first media item; wherein the preference data is used to determine a second media item from the plurality of media items for playback.
    Type: Application
    Filed: June 26, 2009
    Publication date: December 30, 2010
    Inventor: Ramin Samadani
  • Publication number: 20100332458
    Abstract: A system, method, and computer-readable medium for dynamic detection and management of data skew in parallel join operations are provided. Rows allocated to processing modules involved in a join operation are redistributed among the processing modules by a hash redistribution of the join attributes. Receipt by a processing module of an excessive number of redistributed rows having a skewed value on the join attribute is detected by a processing module which notifies other processing modules of the skewed value. Processing modules then terminate redistribution of rows having a join attribute value matching the skewed value and either store such rows locally or duplicate the rows. The processing module that has received an excessive number of redistributed rows removes rows having a skewed value of the join attribute from a redistribution spool allocated thereto and duplicates the rows to each of the processing modules.
    Type: Application
    Filed: June 30, 2009
    Publication date: December 30, 2010
    Inventors: Yu Xu, Olli Pekka Kostamaa, Xin Zhou
  • Publication number: 20100325092
    Abstract: A computing system can archive information from internetworked computers, such as Internet content, for later retrieval. A server system processes content providers, such as DNS registries and web sites, to extract and store content, including text, image, audio, and video content. For web sites, HTML source code is stored along with a browser-rendered display file. The content is perpetually archived to create a historical record of information for each content provider. An interface is used to retrieve the archived content in response to queries.
    Type: Application
    Filed: August 31, 2010
    Publication date: December 23, 2010
    Inventor: Rodney D. Johnson
  • Publication number: 20100324993
    Abstract: In a computer-implemented method of providing digital content, a plurality of web pages is identified, where each of the identified web pages has an associated benefit to be accrued as a result of activity by a user on the identified web page. A search query that includes a search term is received, and one or more of the identified web pages is selected based on the benefits to be accrued as the result of the activity on the identified web pages and a relationship between the identified web pages and the search term. Representations of the selected one or more of the identified web pages are displayed on a display device.
    Type: Application
    Filed: June 19, 2009
    Publication date: December 23, 2010
    Applicant: Google Inc.
    Inventors: Varun Kacholia, Kedar Dhamdhere, Sugato Basu
  • Publication number: 20100325134
    Abstract: A system, method and program product for evaluating search algorithms. A method is provided that includes: defining a population of searches and database records from a search history database; applying a sampling method and direct sampling rates to each search/record pair in the population using a computing system, wherein search/record pairs having a higher variability relative to the population are assigned a relatively higher probability; randomly sampling a direct sample of search/record pairs with the computing system using the direct sampling rates to increase a likelihood of obtaining search/record pairs having the higher variability; running a search algorithm and measuring errors for the direct sample and/or for an associated indirect sample; and calculating an estimated error rate for the search algorithm using inverse probability weighting.
    Type: Application
    Filed: June 23, 2009
    Publication date: December 23, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Glenn J. Galfond
  • Publication number: 20100318532
    Abstract: A method for information retrieval includes extracting from a video document visual data items and textual data items that occur in the document at respective occurrence times. Indexing records, which index both the visual and the textual data items by their respective occurrence times, are constructed and stored in a memory.
    Type: Application
    Filed: June 10, 2009
    Publication date: December 16, 2010
    Applicant: International Business Machines Corporation
    Inventors: Benjamin Sznajder, Jonathan Mamou
  • Publication number: 20100318499
    Abstract: A system, framework, and algorithms for data deduplication are described. A declarative language, such as a Datalog-type logic language, is provided. Programs in the language describe data to be deduplicated and soft and hard constraints that must/should be satisfied by data deduplicated according to the program. To execute the programs, algorithms for performing graph clustering are described.
    Type: Application
    Filed: June 15, 2009
    Publication date: December 16, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Arvind Arasu, Christopher Re, Dan Suciu
  • Publication number: 20100312752
    Abstract: A system, method, and computer program product for backing up data from a backup source to a central repository using deduplication, where the data comprises source data segments is disclosed. A fingerprint cache comprising fingerprints of data segments stored in the central repository is received, where the data segments were previously backed up from the backup source. Source data fingerprints comprising fingerprints (e.g., hash values) of the source data segments are generated. The source data fingerprints are compared to the fingerprints in the fingerprint cache. The source data segments corresponding to fingerprints not in the fingerprint cache may not be currently stored in the central repository. After further queries to the central repository, one or more of the source data segments are sent to the central repository for storage responsive to comparison.
    Type: Application
    Filed: June 8, 2009
    Publication date: December 9, 2010
    Applicant: SYMANTEC CORPORATION
    Inventors: Mike Zeis, Weibao Wu