Data Indexing; Abstracting; Data Reduction (epo) Patents (Class 707/E17.002)
  • Publication number: 20130024432
    Abstract: A method for storing data in a storage system. In one embodiment, implementation of a method for storing data in compliance with a compression handling instruction includes: at a storage controller, receiving an object for storage within a data storage, wherein the object is in an original state; determining whether a compression handling instruction is received in association with the object; and executing the compression handling instruction when storing the object.
    Type: Application
    Filed: July 20, 2011
    Publication date: January 24, 2013
    Applicant: SYMANTEC CORPORATION
    Inventor: NIRANJAN PENDHARKAR
  • Publication number: 20130024459
    Abstract: A method for creating a search index is disclosed. A plurality of words found in one or more documents is identified. For each word of the plurality of words, one or more fields of the one or more documents in which the word can be found is identified. Using a computing device, a search index is created for each word of the plurality of words. The search index for each word of the plurality of words provides a mapping between the word and each occurrence of the word in each field of the one or more documents in which the word is found.
    Type: Application
    Filed: July 20, 2011
    Publication date: January 24, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Nicolai Bodd, Evan Matthew Roark, Michael Susæg
  • Publication number: 20130024458
    Abstract: A method of representing and managing hierarchical relationship configuration in a computing facility is described. The method includes providing and storing a first index of hardware identifier assigned to each object in the computing facility; providing and storing a second index of ancestry identifiers of each object in the computing facility, the ancestry identifier of an object being the hardware identifier of an ancestor object at 1 to n hierarchy levels above the object; providing and storing a type information element for each ancestor object indicative of a type of ancestor object; and identifying an ancestor object of a particular object in the computing facility by accessing the first index of hardware identifier of the particular object, and identifying an ancestor object thereof of a particular type by accessing the ancestry identifiers and the type information element of the particular object.
    Type: Application
    Filed: July 19, 2011
    Publication date: January 24, 2013
    Applicant: SoftLayer Technologies, Inc.
    Inventors: Kelly Evan Morphis, Joshua Logan Reese
  • Publication number: 20130018853
    Abstract: Mechanisms are provided for accelerated data deduplication. A data stream is received an input interface and maintained in memory. Chunk boundaries are detected and chunk fingerprints are calculated using a deduplication accelerator while a processor maintains a state machine. A deduplication dictionary is accessed using a chunk fingerprint to determine if the associated data chunk has previously been written to persistent memory. If the data chunk has previously been written, reference counts may be updated but the data chunk need not be stored again. Otherwise, datastore suitcases, filemaps, and the deduplication dictionary may be updated to reflect storage of the data chunk. Direct memory access (DMA) addresses are provided to directly transfer a chunk to an output interface as needed.
    Type: Application
    Filed: December 1, 2011
    Publication date: January 17, 2013
    Applicant: Dell Products L.P.
    Inventors: Vinod Jayaraman, Goutham Rao
  • Publication number: 20130018854
    Abstract: A technique for routing data for improved deduplication in a storage server cluster includes computing, for each node in the cluster, a value collectively representative of the data stored on the node, such as a “geometric center” of the node. New or modified data is routed to the node which has stored data identical or most similar to the new or modified data, as determined based on those values. Each node stores a plurality of chunks of data, where each chunk includes multiple deduplication segments. A content hash is computed for each deduplication segment in each node, and a similarity hash is computed for each chunk from the content hashes of all segments in the chunk. A geometric center of a node is computed from the similarity hashes of the chunks stored in the node.
    Type: Application
    Filed: September 14, 2012
    Publication date: January 17, 2013
    Applicant: NetApp, Inc.
    Inventor: Michael N. CONDICT
  • Publication number: 20130018857
    Abstract: A system and method for transparently compressing file system data using compression group descriptors is provided. When data contained within a compression group be compressed beyond a predefined threshold value, a compression group descriptor is included in the compression group that signifies that the data for the group of level 0 blocks is compressed into a lesser number of physical data blocks. When performing a read operation, the file system first determines the appropriate compression group that contains the desired data and determines whether the compression group has been compressed. If so, the file system decompresses the data in the compression group before returning the decompressed data. If the magic value is not the first pointer position, then the data within the compression group was previously stored in an uncompressed format, and the data may be returned without performing a decompression operation.
    Type: Application
    Filed: August 20, 2012
    Publication date: January 17, 2013
    Inventors: Jim Voll, Sandeep Yadav
  • Publication number: 20130018847
    Abstract: A database archiving performance benefit determination system may include a data reduction module to ascertain a reduction value of data stored on a database, and a database setup module to ascertain a setup of the database. A performance modeling module may calculate a performance increase for a database application using the database based on the reduction value, the setup of the database, and at least one parameter representing the database application.
    Type: Application
    Filed: July 12, 2011
    Publication date: January 17, 2013
    Inventor: Yu Gong
  • Publication number: 20130018856
    Abstract: The present invention relates to compression of values and bitmaps, and methods thereof. Such methods are configured for operating on a computer system having a word length architecture of length WL and are based on the observation that not all the bits used for the run-length counter—i.e., the fill length field (FL) inhere—are often used, since runs are seldom so long. Contrarily to other compression schemes (e.g., WAH), said methods may assign the unused bits to one or more position list fields (PL, PL1, PL2, PLs), thus boosting the compression ratio. Moreover, the total length (in terms of number of bits) of the uncompressed data—comprising values or bitmaps—may be stored just once, preferably at the beginning of the compression, thus dramatically diminishing the storage requirements for the compression scheme, since it is not required to keep track of the length of each bitmap word while performing the compression or the decompression.
    Type: Application
    Filed: July 29, 2010
    Publication date: January 17, 2013
    Applicant: Algorhyme A/S
    Inventors: Torben Bach Pedersen, Francois Deliege
  • Publication number: 20130018855
    Abstract: A method for data deduplication includes receiving a set of hashes derived from a data chunk of a set of input data chunks 310. The method includes sampling the set of hashes 320, using an index indentifying data chunk containers that hold data chunks having a hash in the set of sampled hashes 330, and loading indexes for at least one of the identified data chunk containers 340. The method includes determining which of the hashes correspond to data chunks stored in data chunk containers corresponding to the loaded indexes 350 and deciding which of the set of input data chunks should be stored based at least in part on the determination.
    Type: Application
    Filed: October 8, 2010
    Publication date: January 17, 2013
    Inventors: Kave Eshghi, Mark D. Lillibridge, David M. Falkinder
  • Publication number: 20130013610
    Abstract: Provided are techniques for selecting row identifiers from an initial index structure storing rows of randomized indexes. The row identifiers are randomized. Groups are formed with the randomized row identifiers so that each group has a predetermined number of row identifiers. At least one group is selected from the groups. Indexes are retrieved from the initial index structure that correspond to the row identifiers in the selected at least one group. The retrieved indexes are encoded by adding product information to form new identifiers.
    Type: Application
    Filed: September 14, 2012
    Publication date: January 10, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Nisanth M. Simon
  • Publication number: 20130013617
    Abstract: Techniques are described for processing a query to produce query results, the query specifying at least a first timestamp value. Embodiments receive the query for processing and access a database index containing a plurality of database keys. The database index contains one or more database index keys, each of which includes at least a timestamp value and a time zone value. Embodiments compare the first timestamp value specified in the query with a portion of one the database index keys to locate at least a portion of the query results. More specifically, the compared portion of the database index key excludes the time zone value. The located portion of the query results is then retrieved.
    Type: Application
    Filed: July 7, 2011
    Publication date: January 10, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Mengchu Cai, Stephen Yao Ching Chen, Ruiping Li, Wei Li, Robert W. Lyle
  • Publication number: 20130013618
    Abstract: A method for reducing redundancy between two or more datasets of potentially very large size. The method improves upon current technology by oversubscribing the data structure that represents a digest of data blocks and using positional information about matching data so that very large datasets can be analyzed and the redundancies removed by, having found a match on digest, expands the match in both directions in order to detect and eliminate large runs of data by replace duplicate runs with references to common data. The method is particularly useful for capturing the states of images of a hard disk. The method permits several files to have their redundancy removed and the files to later be reconstituted. The method is appropriate for use on a WORM device. The method can also make use of L2 cache to improve performance.
    Type: Application
    Filed: September 14, 2012
    Publication date: January 10, 2013
    Applicant: Chrysalis Storage, LLC
    Inventors: Steve Heller, Ralph Shnelvar
  • Publication number: 20130013605
    Abstract: In general, a value of a numerical attribute of a record stored in a data structure is received. A numerical range is generated that includes the value of the numerical attribute. An entry is stored, in an index associated with the data structure, that specifies a location of the record within the data structure and that includes a first index key and a second index key. The first index key corresponds to a value of an attribute of the record different from the numerical attribute, and the second index key corresponds to the generated numerical range.
    Type: Application
    Filed: July 5, 2012
    Publication date: January 10, 2013
    Inventor: Craig W. Stanfill
  • Publication number: 20130013602
    Abstract: Operating a database system comprises: storing a database table comprising a plurality of rows, each row comprising a key value and one or more attributes; storing a primary index for the database table, the primary index comprising a plurality of leaf nodes, each leaf node comprising one or more key values and respective memory addresses, each memory address defining the storage location of the respective key value; creating a new leaf node comprising one or more key values and respective memory addresses; performing a memory allocation analysis based upon the lowest key value of the new leaf node to identify a non-full memory page storing a leaf node whose lowest key value is similar to the lowest key value of the new leaf node; and storing the new leaf node in the identified non-full memory page.
    Type: Application
    Filed: March 16, 2012
    Publication date: January 10, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Markku J. Manner, Simo A. Neuvonen, Vilho T. Raatikka
  • Publication number: 20130013575
    Abstract: A current key is received at a current arrival time at a computer. An index in an array corresponding to the current key is determined using a hash function. A previous key and a previous arrival time are retrieved from the array at the index. The array is transformed by replacing the previous key and the previous arrival time with the current key and the current arrival time in the array at the index. The previous key and the previous arrival time are inserted into a nearest eligible sequential index in the array.
    Type: Application
    Filed: September 14, 2012
    Publication date: January 10, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Mikkel Thorup
  • Publication number: 20130013619
    Abstract: Peer-to-peer redundant file server system and methods include clients that determine a target storage provider to contact for a particular storage transaction based on a pathname provided by the filesystem and a predetermined scheme such as a hash function applied to a portion of the pathname. Servers use the same scheme to determine where to store relevant file information so that the clients can locate the file information. The target storage provider may store the file itself and/or may store metadata that identifies one or more other storage providers where the file is stored. A file may be replicated in multiple storage providers, and the metadata may include a list of storage providers from which the clients can select (e.g., randomly) in order to access the file.
    Type: Application
    Filed: September 14, 2012
    Publication date: January 10, 2013
    Applicant: OVERLAND STORAGE, INC.
    Inventors: Francesco Lacapra, Peter Wallace Steele, Bruno Sartirana, Ernest Ying Sue Hua, I Chung Joseph Lin, Samuel Sui-Lun Li, Nathanael John Diller, Thomas Reynold Ramsdell, Don Nguyen, Kyle Dinh Tran
  • Publication number: 20130013606
    Abstract: In general, a value of a numerical attribute of a record stored in a data structure is received. A numerical range is generated that includes the value of the numerical attribute. An entry is stored, in an index associated with the data structure, that specifies a location of the record within the data structure and that includes a first index key and a second index key. The first index key corresponds to a value of an attribute of the record different from the numerical attribute, and the second index key corresponds to the generated numerical range.
    Type: Application
    Filed: July 6, 2012
    Publication date: January 10, 2013
    Inventor: Craig W. Stanfill
  • Publication number: 20130007006
    Abstract: A search request received from a user is converted to a search request integer value using an operational portion of a chip in network equipment. The search request integer value is compared to representative data integer values that were previously converted from a dataset of search terms using the operational portion, the representative integer values being stored on the chip. If the comparing is successful, a signal is transmitted to a second database, the signal being used to determine a message to be transmitted to the user that corresponds to the representative data integer.
    Type: Application
    Filed: June 28, 2011
    Publication date: January 3, 2013
    Applicant: BROADCOM CORPORATION
    Inventor: Eddie Chung
  • Publication number: 20130007007
    Abstract: An approach is provided for providing a list-based interface to key-value stores. The library interface platform determines one or more key-value pairs of at least one key-value store, the one or more key-value pairs comprising one or more data entries. Next, the library interface platform causes, at least in part, an association of at least one list object with the one or more key-value pairs, one or more sub-list objects, or a combination thereof. Then, the library interface platform provides at least one interface for performing one or more operations on the at least one list object to interact with the one or more data entries, the one or more key-value pairs, the one or more sub-list objects, or a combination thereof.
    Type: Application
    Filed: June 29, 2011
    Publication date: January 3, 2013
    Applicant: Nokia Corporation
    Inventors: Zane Zheng Yan Pan, Fujian Yang, Kenneth D. McCracken
  • Publication number: 20130007003
    Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 3, 2013
    Applicant: International Business Machines Corporation
    Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
  • Publication number: 20130007008
    Abstract: A hash algorithm-based data storage method and apparatus are disclosed, including: pre-configuring L number of backend storage modules and a mapping relationship between identifiers of the backend storage modules and a modulo L operation; calculating a key value of data to be stored using a hash algorithm; performing a modulo L operation on the obtained key value and, using the mapping relationship between identifiers of the backend storage modules and the modulo L operation, outputting the key value in the modulo L operation and the corresponding data to a backend storage module with a corresponding backend storage module identifier; determining a preconfigured hash table in the backend storage module does not contain data to be stored, and storing the data to be stored and the corresponding key value. By using the present invention, requirements on storage devices can be lowered and the storage efficiency can be improved.
    Type: Application
    Filed: September 12, 2012
    Publication date: January 3, 2013
    Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: QING YUAN, JIANGUI ZHANG
  • Publication number: 20130007002
    Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.
    Type: Application
    Filed: September 11, 2012
    Publication date: January 3, 2013
    Applicant: COMMVAULT SYSTEMS, INC.
    Inventors: Anand Prahlad, Jeremy Alan Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
  • Publication number: 20130006974
    Abstract: Systems and methods are provided for file searching on mobile devices. A system includes a user interface and a file query system. The user interface is for receiving a user-provided spatio-temporal query for use in searching for a particular file. The user-provided spatio-temporal query is provided by a user of a mobile device. The file query system is for determining information about the particular file responsive to the user-provided spatio-temporal query, and identifying from the information one or more files as a search result for the particular file.
    Type: Application
    Filed: September 10, 2012
    Publication date: January 3, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: DAKSHI AGRAWAL, JOEL W. BRANCH, FRANCK LE, SIHYUNG LEE, MUKESH K. MOHANIA
  • Publication number: 20130007000
    Abstract: Disclosed herein is a method and system for integrating an enterprise's structured and unstructured data to provide users and enterprise applications with efficient and intelligent access to that data. In accordance with exemplary embodiments, the generation of metadata indexes about unstructured data can be hardware-accelerated by processing streaming unstructured data through a reconfigurable logic device to generate the metadata about the unstructured data for the index.
    Type: Application
    Filed: April 9, 2012
    Publication date: January 3, 2013
    Applicant: EXEGY INCORPORATED
    Inventors: Ronald S. Indeck, David Mark Indeck
  • Publication number: 20130007005
    Abstract: A method for realizing fast response in a multimedia file control process is disclosed. The method includes creating a file playing time index of a multimedia file in a parallel processing manner at a same time when the multimedia file is opened. A playing device for realizing fast response in a media file control process is also disclosed, and includes a logic control module and a media file analyzing module, wherein the logic control module is configured to control the media file analyzing module to create the file playing time index of the multimedia file in a parallel manner when the multimedia file is opened, and the media file analyzing module is configured to create the file playing time index of the multimedia file according to the control of the logic control module.
    Type: Application
    Filed: December 24, 2010
    Publication date: January 3, 2013
    Applicant: ZTE CORPORATION
    Inventors: Youxin Chen, Wei Ma
  • Publication number: 20130006998
    Abstract: Provided are techniques for analyzing fields. Statistical metrics for each field in a data set are received. A general interestingness index is generated for each field using one or more combination functions that aggregate standardized interestingness sub-indexes. One or more fields are identified as interesting for further analysis using the general interestingness index. One or more expert recommendations for field transformations are constructed for the identified one or more fields.
    Type: Application
    Filed: June 29, 2011
    Publication date: January 3, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jing-Yun Shyr, Damir Spisic, Raymond Wright, Jing Xu, Xueying Zhang
  • Publication number: 20120330909
    Abstract: Systems, methods and computer readable medium for storing data elements transmitted via data streams received from distributed devices connected via a network. The received data elements may be stored in block stores on the distributed devices. The stored data elements may be allocated to data blocks of a block store that have assigned block identifiers and further allocated to events of the data blocks. Stream schema of the received plurality of data streams may have the same stream schema, and indices may be generated indices based on the order of the event allocated data elements. Stream schema of the received data streams may comprise a list of token names. Token names may be assigned to the event allocated data elements. Indices may be generated for the event allocated data elements based on the stream schema.
    Type: Application
    Filed: August 31, 2012
    Publication date: December 27, 2012
    Applicant: Red Lambda, Inc.
    Inventors: Robert Bird, Adam Leko, Matthew Whitlock
  • Publication number: 20120330907
    Abstract: A storage system 103 carries out first and second de-duplication processes in response to receiving a write request from a client. First, a determination is made as to whether a write target data item overlaps with any of the stored data items of a part of a stored data item group, which is a user data item group stored in a storage device 209, and if so, the write target data item is prevented from being stored in the storage device. Second, a determination is made as to whether a target stored data item, which is not finished being evaluated as to whether it overlaps with the stored data item in the first de-duplication process, overlaps with another stored data item, and if so, the target stored data item or the same data item overlapping with the target stored data item is deleted from the storage device 209.
    Type: Application
    Filed: September 7, 2012
    Publication date: December 27, 2012
    Applicant: HITACHI, LTD.
    Inventors: Takaki NAKAMURA, Akira YAMAMOTO, Masaaki IWASAKI, Yohsuke ISHII, Nobumitsu TAKAOKA
  • Publication number: 20120330904
    Abstract: In accordance with one or more embodiments, an inode implemented file system may be utilized to support both offline and inline deduplication. When the first content is stored in the storage medium, one inode is used to associate a filename with the data blocks where the first content is stored. When a second content that is a duplicate of the first content is to be stored, then a parent inode is created to point to the data blocks in which a copy of the first content is stored. Further, two inodes are created, one representing the first content and the other representing the second content. Both inodes point to the same parent inode that points to the data blocks where the first content is stored.
    Type: Application
    Filed: June 27, 2011
    Publication date: December 27, 2012
    Applicant: International Business Machines Corporation
    Inventors: Michael Factor, Joseph Samuel Glider, Danny Harnik, Elliot K. Kolodner, Dalit Naor, Demyn Lee Plantenberg, Eran Rom, Sivan Tal, Paula Ta-Shma
  • Publication number: 20120330964
    Abstract: A tool for using an interconnected network of systems to create an index for a database table. An index advisor on a primary server recommends one or more indexes to improve efficiency. While resources of the primary server are being used by various queries and processes, the primary server sends the recommendations to a secondary server (with available resources) so that the recommended indexes may be built in parallel with the processes executing on the primary server. The secondary server builds the recommended indexes based on its own copies of the database tables. The secondary server sends the built indexes to the primary server, where the primary server must reconcile the indexes with any changes that took place to the database tables subsequent to the replication of the tables on the secondary server. The primary server makes the associations between the new indexes and the tables they were built for.
    Type: Application
    Filed: June 22, 2011
    Publication date: December 27, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brent Joseph Baude, Gregory Scott Hurlebaus, Jaroslaw Miszczyk, Gottfried Schimunek
  • Publication number: 20120330944
    Abstract: A method for processing a search query according to one embodiment includes receiving a search query containing terms; combining at least some consecutive terms in the search query to create biwords; looking up at least some of the terms and biwords in a search index for identifying sections of documents containing the at least some of the terms and/or biwords; generating a content score for each of the identified sections based at least in part on a number of the terms and biwords found in the sections of each document, wherein the biwords are given a higher priority than matched terms, wherein the priority affects the content score; and selecting and outputting an indicator of at least one of the sections, or portion thereof, based at least in part on the content score.
    Type: Application
    Filed: September 7, 2012
    Publication date: December 27, 2012
    Applicant: barnesandnoble.com LLC
    Inventors: Aditya Vailaya, Jiang Wu, Manish Rathi
  • Publication number: 20120330988
    Abstract: In accordance with the teachings described herein, systems and methods are provided for performing index joins. A database management application may receive an instruction to perform an index join operation between columns in a first table and a second table, wherein the database management application does not have direct access to an index of the first table or the second table for performing the index join operation. A query may be automatically generated by the database management application, wherein the query includes a where clause equality expression that equates an indexed column of the second table with a parameter or updatable constant. The database management application may substitute a value from a row of the first table may for the parameter or updatable constant, and cause the query to be executed on the index of the second table to fetch any one or more rows of the second table that satisfy the where clause by having an index value that matches the substituted value.
    Type: Application
    Filed: June 24, 2011
    Publication date: December 27, 2012
    Inventors: Douglass Adam Christie, Gordon Lyle Keener
  • Publication number: 20120330910
    Abstract: A system and method for a block based differencing algorithm which includes the ability to limit memory requirements regardless of source file sizes by splitting the source file into optimally sized blocks. The invention allows the blocks to be processed in any order allowing in-place operation. Further, the present invention allows a second stage compressor to match the compressor blocks to those used by the differencing algorithm to optimize compressor and decompressor performance.
    Type: Application
    Filed: September 4, 2012
    Publication date: December 27, 2012
    Applicant: SMITH MICRO SOFTWARE, INC.
    Inventors: Serge Volkoff, Mark Armour, Darryl Lovato
  • Publication number: 20120330966
    Abstract: A modular data and storage management system. The system includes a time variance interface that provides for storage into a storage media of data that is received over time. The time variance interface of the modular data and storage management system provides for retrieval, from the storage media, of an indication of the data corresponding to a user specified date. The retrieved indication of the data provides a user with an option to access specific information relative to the data, such as content of files that are included in the data.
    Type: Application
    Filed: September 7, 2012
    Publication date: December 27, 2012
    Applicant: CommVault Systems, Inc.
    Inventors: Anand Prahlad, Randy De Meno, Jeremy A. Schwartz, James J. McGuigan
  • Publication number: 20120323923
    Abstract: A system for sorting tables comprises an interface operable to receive a first segment of an index column and a first segment of a key column from an on-disk database (ODDB), wherein a value in the index column represents a row of information in the ODDB and a value in the key column represents data to be sorted and each index value is associated with a key value, and wherein the ODDB is operable to store the sorted index values and key values in the first segments, a processor communicatively coupled to the interface, the processor is operable to sort the index values in the first segment and key values in the first segment by the key values according to sorting criteria, remove the sorted index values and key values in the first segments from an in-memory database in a sorting module, and the interface is operable to receive a second segment of the index column and a second segment of the key column from the ODDB.
    Type: Application
    Filed: June 14, 2011
    Publication date: December 20, 2012
    Applicant: Bank of America Corporation
    Inventor: Junan Duan
  • Publication number: 20120323925
    Abstract: A computer-implemented system and method for generating an index to a captured media stream. The system includes an output device configured to play a media stream. The system further includes an automatic tagging system for generating at least one auto tag based on the content of the received media stream, the auto tag associated with a portion of the received media stream and a user driven tagging system for generating at least one user tag based on a command received from a user, the user tag associated with a portion of the received media stream being provided at the time the command is received. The system yet further includes a non-transitory storage medium for capturing the received media stream in a media data file associated with a media index file, the media index file including the at least one auto tag and the at least one user tag.
    Type: Application
    Filed: June 17, 2011
    Publication date: December 20, 2012
    Inventors: Jeffrey E. Fitzsimmons, Matthew Stockton, Marc Della Torre
  • Publication number: 20120323870
    Abstract: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.
    Type: Application
    Filed: August 27, 2012
    Publication date: December 20, 2012
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: Marios Hadjieleftheriou, Nick Koudas, Divesh Srivastava
  • Publication number: 20120317089
    Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.
    Type: Application
    Filed: April 17, 2012
    Publication date: December 13, 2012
    Inventor: Keith H. Randall
  • Publication number: 20120317105
    Abstract: The present invention provides a method and an apparatus for updating an index and sequencing search results based on the updated index in a terminal. The method comprises: retrieving whether there is any modification in a file; if there is any modification in the file, performing an increment index the modified file to generate new index file, wherein the increment index includes a number of times that the modified file is selected historically; merging the new index file into the original index file; obtaining key words input by the user; querying the search results related to the key words, sequencing the search results according to the relevance between the search results and the key words and the number of times that the modified file is selected historically and displaying the sequenced search results to the user. By the present invention, the user experience of the mobile terminal is improved.
    Type: Application
    Filed: June 23, 2010
    Publication date: December 13, 2012
    Applicant: ZTE CORPORATION
    Inventors: Luo Bai, Zhongwei Ji, Bin Li, Rufu Weng
  • Publication number: 20120310903
    Abstract: A method replicates data between instances of a distributed database. The method identifies at least two instances of the database at distinct geographic locations. The method tracks changes to the database by storing deltas. Each delta has a row identifier that identifies the piece of data modified, a sequence identifier that specifies the order in which the deltas are applied to the data, and an instance identifier that specifies where the delta was created. The method determines which deltas to send using an egress map that specifies which combinations of row identifier and sequence identifier have been acknowledged as received at other instances. The method builds a transmission matrix that identifies deltas that have not yet been acknowledged as received. The method then transmits deltas identified in the transmission matrix. After receiving acknowledgement that transmitted deltas have been incorporated into databases at other instances, the method updates the egress map.
    Type: Application
    Filed: August 17, 2012
    Publication date: December 6, 2012
    Inventor: Yonatan Zunger
  • Publication number: 20120310947
    Abstract: A method for indexing a plurality of nodes using a computer system is provided. The computer system includes data storage and a processor coupled to the data storage. The method includes acts of storing the plurality of nodes in the data storage, each of the plurality of nodes having a hit count, a link count and an outcome, creating a qualitative index ordering a plurality of nodes according to the hit count, the link count and the outcome of each node and storing the qualitative index in the data storage. The hit count of each node indicates a number of times a case attribute associated with the node is presented to a user. The link count of each node indicates a number of times the case attribute associated with the node is affirmed as useful. The outcome of each node indicates a desirability of the outcome.
    Type: Application
    Filed: August 13, 2012
    Publication date: December 6, 2012
    Inventors: Paul J. Fortier, Theophano Mitsa, Nancy Dluhy
  • Publication number: 20120310948
    Abstract: A method, system, and article are provided for evaluating regular expressions over large data collections. A general purpose index is built to handle complex regular expressions at the character level. Characters, character classes, and associated metadata are identified and stored in an index of a collection of documents. Given a regular expression, a query is generated based on the contents of the index. This query is executed over the index to identify a set of documents in the collection of documents over which the regular expression can be evaluated. Based upon the query execution, the identified set of documents is returned for evaluation by the regular expression responsive to execution of the query over the index.
    Type: Application
    Filed: August 14, 2012
    Publication date: December 6, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ting Chen, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan
  • Publication number: 20120310907
    Abstract: Methods and systems provide a tool for prioritizing the ordering of outstanding indexing work in order to bring a particular portion of an indexing source up to date quickly and to reduce the likelihood of inconsistencies between an index-backed view and a direct view of a source. In accordance with the described embodiments, indexing of items can be prioritized based upon a user's view or metadata contained within a query. Further, in at least some embodiments, the tool can decide the order to index items based upon multiple prioritization requests.
    Type: Application
    Filed: August 15, 2012
    Publication date: December 6, 2012
    Applicant: Microsoft Corporation
    Inventors: Michael J. Novak, Christopher C. McConnell
  • Publication number: 20120310945
    Abstract: Apparatuses, computer readable media, methods, and systems are described for processing a workload record for each of a plurality of assessors, each of the workload records identifying an assessment previously assigned to a particular one of the assessors, calculating a complexity score for each of the assessments, calculating a workload index for each of the assessors based on the complexity score of the assessment previously assigned to that assessor, and assigning a new assessment to a particular one of the assessors based on the workload indexes.
    Type: Application
    Filed: June 6, 2011
    Publication date: December 6, 2012
    Applicant: BANK OF AMERICA CORPORATION
    Inventors: Karthik Reddy Mitta, Susheel Walia
  • Publication number: 20120310884
    Abstract: Systems and methods for publishing datasets are provided herein. According to some embodiments, methods for publishing datasets may include receiving a request to publish a dataset to at least one of an internal environment located within a secured zone and an external environment located outside the secured zone, the request comprising at least one selection criteria, selecting the dataset based upon the at least one selection criteria, the dataset being selected from an index of collected datasets, and responsive to the request, publishing the dataset to at least one of the internal environment and the external environment.
    Type: Application
    Filed: June 4, 2011
    Publication date: December 6, 2012
    Inventor: Robert Tennant
  • Publication number: 20120310890
    Abstract: Provided are systems and methods for use in data archiving. In one arrangement, compression techniques are provided wherein an earlier version of a data set (e.g., file folder, etc) is utilized as a dictionary of a compression engine to compress a subsequent version of the data set. This compression identifies changes between data sets and allows for storing these differences without duplicating many common portions of the data sets. For a given version of a data set, new information is stored along with metadata used to reconstruct the version from each individual segment saved at different points in time. In this regard, the earlier data set and one or more references to stored segments of a subsequent data set may be utilized to reconstruct the subsequent data set.
    Type: Application
    Filed: May 2, 2012
    Publication date: December 6, 2012
    Applicant: DATA STORAGE GROUP, LLC
    Inventors: Brian Dodd, Michael Moore
  • Publication number: 20120303622
    Abstract: A computer system comprising one or more processors and memory groups a set of documents into a plurality of clusters. Each cluster includes one or more documents of the set of documents and a respective cluster of documents of the plurality of clusters includes respective cluster data corresponding to a plurality of documents including a first document and a second document. The computer system determines that the second document includes duplicate data that is duplicative of corresponding data in the first document, identifies a respective subset of the respective cluster data that excludes at least a subset of the duplicate data, and generates an index of the respective subset of the respective cluster data.
    Type: Application
    Filed: August 9, 2012
    Publication date: November 29, 2012
    Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Gautham Thambidorai
  • Publication number: 20120303627
    Abstract: A data processing system includes a plurality of processing stages. In response to a query, a membership structure is accessed to determine whether partially processed data from a particular one of the processing stages.
    Type: Application
    Filed: May 23, 2011
    Publication date: November 29, 2012
    Inventors: KIMBERLY KEETON, Charles B. Morrey, III, Craig A. Soules
  • Publication number: 20120303598
    Abstract: In one embodiment, a set of boundaries may be obtained, where the set of boundaries includes boundaries for each of one or more bins. The boundaries for each of the one or more bins may include a lower boundary and an upper boundary, wherein the set of boundaries of the one or more bins together defines a contiguous range of data values capable of being stored in the one or more bins. A data value may be obtained. The data value may be added to one of the one or more bins according to the boundaries of the one or more bins. It may be determined whether to modify the set of boundaries. The set of boundaries may be adjusted according to a result of the determining step.
    Type: Application
    Filed: May 24, 2012
    Publication date: November 29, 2012
    Applicant: CAUSATA, INC.
    Inventors: Leonard Michael Newnham, Jason Derek McFall
  • Publication number: 20120303632
    Abstract: A computerized searchable repository stores documents as structured metadata parts and unstructured content parts using single instancing. A full text index used for keyword searching includes a metadata index and a content index. A linking structure includes metadata-to-content (MD to CT) links and content-to-metadata (CT to MD) linking entries, with each MD to CT link linking a metadata part of a document to each content part of the document, and each CT to MD linking entry having one or more CT to MD links collectively linking a content part to the metadata parts of the documents that include the content part. Indexing includes metadata indexing a metadata part, conditionally content indexing a content part, and updating the linking structure. Content indexing is performed only if the content part does not match a content part already stored and indexed. Index entries each associate a key word or key value with corresponding metadata or content parts containing the key word or key value.
    Type: Application
    Filed: May 26, 2011
    Publication date: November 29, 2012
    Applicant: MIMOSA SYSTEMS, INC.
    Inventors: Rahul Kapoor, Sameer H. Ranade, Sherif M. Botros