Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
  • Publication number: 20150032703
    Abstract: A database statement can be identified in a software artifact that is configured to issue the database statement. At least one execution plan for the database statement can be retrieved, and reference(s) to database object(s) can be identified in the execution plan(s). Metadata from the reference(s) can be assembled, where the metadata can reflect one or more dependencies of the software artifact on the object(s). The metadata can be included in a data structure.
    Type: Application
    Filed: October 6, 2014
    Publication date: January 29, 2015
    Inventors: Kaarthik Sivashanmugam, David I. Noor
  • Publication number: 20150032702
    Abstract: Systems and methods for reconstructing unified data in an electronic storage network are provided which may include the identification and use of metadata stored centrally within the system. The metadata may be generated by a group of storage operation cells during storage operations within the network. The unified metadata is used to reconstruct data throughout the storage operation cells that may be missing, deleted or corrupt.
    Type: Application
    Filed: August 8, 2014
    Publication date: January 29, 2015
    Inventor: Parag GOKHALE
  • Publication number: 20150026135
    Abstract: For adaptive similarity search resolution in a data deduplication system using a processor device in a computing environment, input data is partitioned into data chunks. Input similarity elements are calculated for an input chunk. The input similarity elements are used to find similar data in a repository of data using a similarity search structure. A resolution level is calculated for storing the input similarity elements. The input similarity elements are stored in the calculated resolution level in the similarity search structure.
    Type: Application
    Filed: July 17, 2013
    Publication date: January 22, 2015
    Inventor: Lior ARONOVICH
  • Publication number: 20150026136
    Abstract: According to some embodiments, logic executing on a processor receives a request to compare a first file and a second file. Each file comprises records, attributes, and attribute values. An attribute value is a value that a record associates with a corresponding attribute. The logic receives a mapping file indicating a key and one or more selected attributes for comparison. The logic compares each record in the first file to its corresponding record in the second file, the corresponding record determined according to the key. For records that fail to match, the logic determines which of the selected attributes are unmatched. The logic communicates a report indicating a result of comparing the first file and the second file.
    Type: Application
    Filed: July 17, 2013
    Publication date: January 22, 2015
    Inventors: Nitesh Rathod, Sindhuja Subramani, Christopher T. Walsh, James H. Peterson, Jayanta Sengupta, Scott Murray
  • Publication number: 20150026139
    Abstract: Mechanisms are provided for efficiently determining commonality in a deduplicated data set in a scalable manner regardless of the number of deduplicated files or the number of stored segments. Information is generated and maintained during deduplication to allow scalable and efficient determination of data segments shared in a particular file, other files sharing data segments included in a particular file, the number of files sharing a data segment, etc. Data need not be expanded or uncompressed. Deduplication processing can be validated and verified during commonality detection.
    Type: Application
    Filed: October 6, 2014
    Publication date: January 22, 2015
    Applicant: Dell Products L.P.
    Inventor: Vinod Jayaraman
  • Publication number: 20150026140
    Abstract: Provided are a computer program product, system, and method for merging entries in a deduplication index. An index has chunk signatures calculated from chunks of data in the data objects in the storage, wherein each index entry includes at least one of the chunk signatures and a reference to the chunk of data from which the signature was calculated. Entries in the index are selected to merge and a merge operation is performed on the chunk signatures in the selected entries to generate a merged signature. An entry is added to the index including the merged signature and a reference to the chunks in the storage referenced in the merged selected entries. The index of the signatures is used in deduplication operations when adding data objects to the storage.
    Type: Application
    Filed: October 6, 2014
    Publication date: January 22, 2015
    Inventors: Jonathan Amit, Corneliu M. Constantinescu, Joseph S. Gilder, Shai I. Tahar
  • Publication number: 20150026138
    Abstract: Systems, methods, and computer program products are provided for transmitting modified sets of data to, or deleting existing sets of data from, mobile wallet applications on mobile devices. Data set identifiers associated with existing sets of data, attributes defining existing sets of data, and other information associated with existing sets of data are stored on a server. A change request to modify or delete an existing set of data is received from a service provider system. The server is searched for an existing set of data corresponding to the existing set of data identified in the change request. The change request is processed and a modified set of data, or a request to delete the existing set of data, is transmitted to mobile devices that have previously received the existing set of data.
    Type: Application
    Filed: July 15, 2014
    Publication date: January 22, 2015
    Inventors: Todd A. Strickler, Hani Nadra
  • Publication number: 20150026137
    Abstract: Provided are a computer program product, system, and method for recovering from a pending uncompleted reorganization of a data set managing data sets in a storage. In response an initiation of an operation to access a data set, an operation is initiated to complete a pending uncompleted reorganization of the data set in response to the data set being in a pending uncompleted reorganization state and no other process currently accessing the data set.
    Type: Application
    Filed: July 17, 2013
    Publication date: January 22, 2015
    Inventors: Philip R. Chauvet, Charles J. House, David C. Reed, Max D. Smith
  • Patent number: 8938461
    Abstract: A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of the data structure and capable of displaying selected data on a display unit. The data structure includes a plurality of directionally interlinked nodes, each node being associated with one or more documents having a header and body text. All the documents are associated with a given node and have identical normalized body text. All documents that have identical normalized body text are associated with the same node. One or more of the nodes is associated with more than one document. For any node that is a descendent of another node, the normalized body text of each document associated with the node is inclusive of the normalized body text of a document that is associated with the other node.
    Type: Grant
    Filed: July 20, 2010
    Date of Patent: January 20, 2015
    Assignee: Equivio Ltd.
    Inventors: Yiftach Ravid, Amir Milo
  • Patent number: 8937562
    Abstract: This disclosure relates to synchronizing dictionaries of acceleration nodes in a computer network. For example, dictionaries of a plurality of acceleration nodes of a client-server network can be synchronized to each include one or more identical data items and data identifier pairs. Synchronization can include transmitting a particular data item, or a combination of a data item and an associated data identifier, to another acceleration node which includes it in its dictionary. A particular acceleration node can, instead of transmitting a data item, transmit an associated data identifier to another acceleration node. As all (or a subset) of the acceleration nodes can have an identical dictionary when employing the methods described herein, the particular acceleration node can use the same dictionary to communicate with all (or the subset of) other acceleration nodes of the computer network.
    Type: Grant
    Filed: July 29, 2013
    Date of Patent: January 20, 2015
    Assignee: SAP SE
    Inventor: Or Igelka
  • Patent number: 8938414
    Abstract: A data transformation system receives data from one or more external source systems and stores and transforms the data for providing to reporting systems. The data transformation system maintains multiple versions of data received from an external source system. The data transformation system can combine data from different versions of data and provide to the reporting system. As a result, external source systems that do not maintain data in a format appropriate for reporting systems and/or do not maintain sufficient historical data to generate different types of reports are able to generate these reports. The data transformation system can also enhance older versions of data stored in the system or exclude portions of data from reports. The data transformation system can purge older versions of data so that older data that is less frequently requested is maintained at a lower frequency than recent data.
    Type: Grant
    Filed: June 5, 2014
    Date of Patent: January 20, 2015
    Assignee: GoodData Corporation
    Inventor: Pavel Kolesnikov
  • Publication number: 20150019506
    Abstract: Data matches are calculated between input data and repository data via a digest based matching algorithm where in a first step digest matches, anchored at already verified matching positions in the input data and in the repository data, are extended to produce data matches. In a second step the remaining unmatched input digests are matched with repository digests and extended to produce further data matches.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventor: Lior ARONOVICH
  • Publication number: 20150019508
    Abstract: For producing secondary segmentations of data into blocks and corresponding digests for input data in a data deduplication system using a processor device in a computing environment, digests are calculated for an input data chunk using a primary segmentation into blocks. Secondary segmentations are produced for each of the data mismatches based on reference data, and used to calculate further data matches. The primary segmentation and the corresponding primary digests are stored for the input data chunk.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventor: Lior ARONOVICH
  • Publication number: 20150019503
    Abstract: For producing digest block segmentations based on reference segmentations in a data deduplication system using a processor device in a computing environment, digests are calculated for an input data chunk. Data matches and data mismatches are produced based on matching input digests with reference digests. Secondary digest block segmentations are obtained from similar reference intervals for each of the data mismatches and applied to the input data.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventors: Shay H. AKIRAV, Lior ARONOVICH, Michael HIRSCH, Yair TOAFF
  • Publication number: 20150019510
    Abstract: Applying a content defined maximum size bound on blocks produced by content defined segmentation of data by calculating the size of the interval of data between a newly found candidate segmenting position and a last candidate segmenting position of the same or higher hierarchy level, and then using the intermediate candidate segmenting positions of that interval if the size of the interval exceeds the maximum size bound, or discarding the intermediate candidate segmenting positions of that interval if the size of the interval does not exceed the maximum size bound.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Lior ARONOVICH
  • Publication number: 20150019499
    Abstract: Data matches are calculated between input data and repository data via a digest based matching algorithm where the reference digests corresponding to a repository interval of data identified as similar to an input interval of data are loaded into a sequential array and into a search structure. Each of the matching digests found using the search structure are extended using the sequential array of reference digests. Repository data intervals are determined as similar to an input data interval. Reference digests corresponding to the similar repository data interval are loaded into a sequential representation and into a search structure. Matches of input digests and the reference digests are found using the search structure. Each one of the found matches of the input digests and repository digests are extended using the sequential representation. Data matches are determined between the input data and the repository data using extended matches of digests.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventor: Lior ARONOVICH
  • Publication number: 20150019507
    Abstract: Repository data intervals are determined as similar to an input data interval. Repository digests corresponding to the similar repository data interval are loaded into a sequential representation and into a search structure. Matches of input digests and the repository digests are found using the search structure. Each one of the found matches of the input digests and repository digests are extended using the sequential representation. Data matches are determined between the input data and the repository data using extended matches of digests. A compact index pointing to a position in the sequential representation of digests is incorporated into entries of the search structure.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventor: Lior ARONOVICH
  • Publication number: 20150019502
    Abstract: For read ahead of digests in similarity based data deduplication in a data deduplication system using a processor device in a computing environment, input data is partitioned into data chunks and digest values are calculated for each of the data chunks. The positions and sizes of similar data intervals in a repository of data are found for each of the data chunks. The positions and the sizes of read ahead intervals are calculated based on the similar data intervals. The read ahead digests of the read ahead intervals are located and loaded into memory in a background read ahead process.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventors: Lior ARONOVICH, Michael HIRSCH
  • Publication number: 20150019511
    Abstract: Applying a content defined minimum size bound on blocks produced by content defined segmentation of data by calculating the size of the interval of data between a newly found candidate segmenting position and a last candidate segmenting position of same or higher hierarchy level, and then discarding the newly found candidate segmenting position if a size of an interval of data is lower than the minimum size bound, or retaining the newly found candidate segmenting position if the size of the interval of data is not lower than the minimum size bound or if there is no last candidate segmenting position of a same or higher hierarchy level as the newly found candidate segmenting position. When a last candidate segmenting position of a same or higher hierarchy level becomes available, the evaluation is reiterated to converge edge segmenting positions of the outputs of consecutive calculation units.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventor: Lior ARONOVICH
  • Publication number: 20150019513
    Abstract: The present subject matter relates to analysis of time-series data based on world events derived from unstructured content. According to one embodiment, a method comprises obtaining event information corresponding to at least one world event from unstructured content obtained from a plurality of data sources. The event information includes at least time of occurrence of the world event, time of termination of the world event, and at least one entity associated with the world event. Further, the method comprises retrieving time-series data pertaining to the entity associated with the world event from a time-series data repository. Based on the event information and the time-series data, the world event is aligned and correlated with at least one time-series event to identify at least one pattern indicative of cause-effect relationship amongst the world event and the time-series event.
    Type: Application
    Filed: July 10, 2014
    Publication date: January 15, 2015
    Inventors: Lipika DEY, Ishan VERMA, Arpit KHURDIYA, Diwakar MAHAJAN, Gautam SHROFF
  • Publication number: 20150019512
    Abstract: Systems and methods disclosed herein provide intelligent filtering of system log messages having low utility value. In providing the filtering, the systems and methods determine the utility value of a system log message and delete the message from the system log if the message is determined to be of low utility value. As such, embodiments herein provide an system log filter, which reduces the amount of data stored in the system log based on the utility value of the message.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventors: Jayanta Basak, Nagesh Panyam Chandrasekarasastry
  • Publication number: 20150019500
    Abstract: For conditional activation of similarity search in a data deduplication system using a processor device in a computing environment, input data is partitioned into data chunks. A determination is made as to whether to apply the similarity search process for an input data chunk based on deduplication results of a previous input data chunk in the input data.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventor: Lior ARONOVICH
  • Publication number: 20150019501
    Abstract: For utilizing a global digests cache in deduplication processing in a data deduplication system using a processor device in a computing environment, input data is partitioned into data chunks and digest values are calculated for each of the data chunks. The positions of similar repository data are found in a repository of data for each of the data chunks. The repository digests of the similar repository data are located and loaded into the global digests cache. The global digests cache contains digests previously loaded by other deduplication processes. The input digests of the input data are matched with the repository digests contained in the global digests cache for locating data matches.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventors: Shay H. AKIRAV, Lior ARONOVICH
  • Publication number: 20150019509
    Abstract: For adaptive similarity search resolution in a data deduplication system using a processor device in a computing environment, multiple resolution levels are configured for a similarity search. Input similarity elements are calculated in one resolution level for a chunk of input data. The input similarity elements of the one resolution level are used to find similar data in a repository of data where similarity elements of the stored similar repository data are of the multiple resolution levels.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventor: Lior ARONOVICH
  • Publication number: 20150019504
    Abstract: For calculation of digest segmentations for input data using similar data in a data deduplication system using a processor device in a computing environment, a stream of input data is partitioned into input data chunks. Similar repository intervals are calculated for each input data chunk. Anchor positions are determined between an input data chunk and the similar repository intervals, based on data matches between a previous input data chunk and previous similar repository intervals. Digest segmentations of the similar repository intervals are projected onto the input data chunk, starting at the anchor positions.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventor: Lior ARONOVICH
  • Publication number: 20150019505
    Abstract: Data matches are calculated in a data deduplication system by matching input and repository digests using a digest based data matching process where the reference digests corresponding to a repository interval of data identified as similar to an input interval of data are loaded into two data structures. The two data structures include a sequential buffer containing digests in a sequence of occurrence in the data and a search structure for searching of the reference digests matching a version digest.
    Type: Application
    Filed: July 15, 2013
    Publication date: January 15, 2015
    Inventor: Lior ARONOVICH
  • Patent number: 8935713
    Abstract: Determining a video audience is disclosed, including: identifying a set of videos based at least in part on a received criterion; querying a video database to retrieve engagements associated with each of at least a subset of the set of videos; identifying a set of audience members associated with the engagements associated with each of the at least subset of the set of videos; and querying a user database to gather events associated with each of at least a subset of the set of audience members.
    Type: Grant
    Filed: May 24, 2013
    Date of Patent: January 13, 2015
    Assignee: Tubular Labs, Inc.
    Inventors: Robert L. Gabel, David A. Koblas, Allison J. Stern
  • Patent number: 8935487
    Abstract: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.
    Type: Grant
    Filed: December 28, 2010
    Date of Patent: January 13, 2015
    Assignee: Microsoft Corporation
    Inventors: Sudipta Sengupta, Biplob Debnath, Jin Li, Ronakkumar N. Desai, Paul Adrian Oltean
  • Patent number: 8935222
    Abstract: For optimizing a partition of a data block into matching and non-matching segments in data deduplication using a processor device in a computing environment, an optimal calculation operation is applied in polynomial time to the matching segments for selecting a globally optimal subset of a set of matching segments according to overhead considerations for minimizing an overall size of a deduplicated file by determining a trade off between a time complexity and a space complexity.
    Type: Grant
    Filed: January 2, 2013
    Date of Patent: January 13, 2015
    Assignee: International Business Machines Corporation
    Inventors: Michael Hirsch, Ariel J. Ish-Shalom, Shmuel T. Klein
  • Publication number: 20150012504
    Abstract: Data file in the data deduplication system are associated with a file identifier defined to have a first part identifier for denoting a location of the data file in a storage, and a second part identifier for uniquely identifying the data file in the data deduplication system over time.
    Type: Application
    Filed: July 8, 2013
    Publication date: January 8, 2015
    Inventors: Shay H. AKIRAV, Lior ARONOVICH, Rafael BUCHBINDER, Ariel J. ISH-SHALOM, Lior TAMARY
  • Publication number: 20150012503
    Abstract: For self-healing in a hash-based deduplication system using a processor device in a computing environment, deduplication digests of data and a corresponding list of the deduplication digests in a table of contents (TOC) are maintained for the self-healing of data that is lost or unreadable. The input data digests are compared to the TOC if directed to data that is lost or unreadable, and the input data digests are used to repair the one of lost and unreadable data.
    Type: Application
    Filed: July 8, 2013
    Publication date: January 8, 2015
    Inventors: Shay H. AKIRAV, Michael HIRSCH
  • Patent number: 8930687
    Abstract: In an encrypted storage system employing data deduplication, encrypted data units are stored with the respective keyed data digests. A secure equivalence process is performed to determine whether an encrypted data unit on one storage unit is a duplicate of an encrypted data unit on another storage unit. The process includes an exchange phase and a testing phase in which no sensitive information is exposed outside the storage units. If duplication is detected then the duplicate data unit is deleted from one of the storage units and replaced with a mapping to the encrypted data unit as stored on the other storage unit. The mapping is used at the one storage unit when the corresponding logical data unit is accessed there.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: January 6, 2015
    Assignee: EMC Corporation
    Inventors: Peter Alan Robinson, Eric Young
  • Patent number: 8930327
    Abstract: In production applications that process and transfer secure and sensitive customer data, the heap dump files of these applications, which may be useful for debugging production issues and bugs, may contain secure and sensitive information. Thus, to make the useful debugging information available in heap dumps from production applications without compromising secure client data to those assigned to debugging and fixing production issues, these heap dumps may be scrubbed of sensitive information without scrubbing information that is useful for debugging.
    Type: Grant
    Filed: April 28, 2011
    Date of Patent: January 6, 2015
    Assignee: salesforce.com, inc.
    Inventors: Fiaz Hossain, Zuye Zheng
  • Patent number: 8930328
    Abstract: It is provided a storage system including a storage device for storing data, and a controller for controlling data read/write in the storage device. The controller includes a processor for executing a program, and a memory for storing the program that is executed by the processor. The processor executes deduplication processing for converting a duplicate part of data that is stored in the storage device into shared data, and calculates a distributed capacity consumption, which represents a capacity of a storage area that is used by a user in the storage device, by using a size of the data prior to the deduplication processing and a count of pieces of data referring to the shared data that is referred to by this data.
    Type: Grant
    Filed: November 13, 2012
    Date of Patent: January 6, 2015
    Assignee: Hitachi, Ltd.
    Inventors: Jun Nemoto, Hitoshi Kamei, Atsushi Sutoh
  • Patent number: 8924366
    Abstract: Storage systems and methods are presented. In one embodiment, a variable length segment storage method comprises: receiving a data stream; performing a tailored segment process on the data stream, wherein at least one of a plurality of tailored segments include corresponding data of at least one of a plurality of variable length segments and alignment padding to align with boundaries of a fixed length de-duplication scheme; performing a de-duplication process on the plurality of tailored segments; and storing information corresponding to the result of the de-duplication process. In one embodiment, the tailored segment process includes adjusting the alignment padding of the at least one of a plurality of tailored segments, wherein an adjustment in the alignment padding of the at least one of a plurality of tailored segments corresponds to a modification in the at least one of the plurality of variable length segments.
    Type: Grant
    Filed: September 16, 2011
    Date of Patent: December 30, 2014
    Assignee: Symantec Corporation
    Inventor: Graham Bromley
  • Publication number: 20140379670
    Abstract: Example systems and methods of deleting data stored in a database system are presented. In one example, a plurality of data items is received from an application and stored at the database system. Also received from the application and stored at the database system is deletion timing information for each of the data items. The deletion timing information for a data item may indicate when the data item is to be deleted from the database system. At least one of the data items may be deleted at the database system at a time indicated by its corresponding deletion timing information without assistance from the application.
    Type: Application
    Filed: June 19, 2013
    Publication date: December 25, 2014
    Applicant: SAP AG
    Inventor: Gernot Kuhr
  • Publication number: 20140379671
    Abstract: Disclosed is the technology for data scrubbing in a cluster-based storage system. This technology allows protecting data against failures of storage devices by periodically reading data object replicas and data object hashes stored in a plurality of storage devices and rewriting those data object replicas that have errors. The present disclosure addresses aspects of writing data object replicas and hashes, checking validity of data object replicas, and performing data scrubbing based upon results of the checking.
    Type: Application
    Filed: June 19, 2014
    Publication date: December 25, 2014
    Inventors: Frank E. Barrus, Tad Hunt
  • Publication number: 20140379672
    Abstract: The file storage system includes a controller and a volume storing a plurality of files, the volume including a first directory storing a first file and a second file and a second directory storing a third file being created. The controller migrates actual data of the second file to the third file, sets up a management information of the second file so that the third file is referred to when the second file is read, and if the sizes of actual data of the first file and the actual data of the third file are identical and the binaries of the actual data of the first file and the actual data of the third file are identical, sets up a management information of the first file to refer to the third file when reading the first file.
    Type: Application
    Filed: September 11, 2014
    Publication date: December 25, 2014
    Inventors: Tomonori Esaka, Takaki Nakamura, Hitoshi Kamei, Masakuni Agetsuma
  • Patent number: 8918372
    Abstract: A set of metadata associated with backup data is obtained. A consistent hash key for the backup data is generated based at least in part on the set of metadata. The backup data is assigned to one of a plurality of deduplication nodes based at least in part on the consistent hash key.
    Type: Grant
    Filed: September 19, 2012
    Date of Patent: December 23, 2014
    Assignee: EMC Corporation
    Inventors: Feng Guo, Qiyan Chen, Mandavilli Navneeth Rao, Lintao Wan, Dong Xiang
  • Publication number: 20140372386
    Abstract: A method and system comprising a duplication identifier module to analyze data input information to automatically identify duplicate expected inputs associated with a process are shown. The system includes logical process model information defining a logically structured series of process activities and data input information representing a plurality of expected inputs associated with respective process activities, with each expected input being indicative of expected collection of a corresponding data element during execution of the associated process activity. Each duplicate expected input comprises one of the plurality of expected inputs for which there is at least one other expected input with respect to a common corresponding data element.
    Type: Application
    Filed: August 27, 2014
    Publication date: December 18, 2014
    Inventors: Vikram Duvvoori, Satish Venkatesan Srinivasan, Prasad A. Chodavarapu, Ravindra S. Gajulapalli, Rajesh Ramesh Agrawal
  • Publication number: 20140372387
    Abstract: The present invention is directed to a method and mechanism for reducing the expense of data transmissions between a client and a server. According to an aspect of data prefetching is utilized to predictably retrieve information between the client and server. Another aspect pertains to data redundancy management for reducing the expense of transmitting and storing redundant data between the client and server. Another aspect relates to moved data structures for tracking and managing data at a client in conjunction with data redundancy management.
    Type: Application
    Filed: August 29, 2014
    Publication date: December 18, 2014
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Sreenivas GOLLAPUDI, Debashish CHATTERJEE
  • Patent number: 8914331
    Abstract: A computer-implemented system and method for identifying duplicate and near duplicate messages is provided. A set of messages is obtained. A body of one such message is compared with the body of each other message. Those messages having matching bodies are identified as exact duplicates. The exact duplicates are removed from the set. The remaining messages are sorted in order of message length and a shorter message is compared with a longer message. A determination is made that the body of the shorter message is included in the body of the longer message and the shorter message is marked as a near duplicate of the longer message.
    Type: Grant
    Filed: January 6, 2014
    Date of Patent: December 16, 2014
    Assignee: FTI Technology LLC
    Inventors: Kenji Kawai, David T. McDonald
  • Patent number: 8914338
    Abstract: A method for storing data in a data storage system by partitioning the data into a plurality of data chunks and generating representative data for each of the plurality of chunks by applying a predetermined algorithm to each chunk of the plurality of chunks. Subsequently, the representative data is compared and sorted. Representative data for base data chunks and representative data for other data chunks that can be stored relative to the base data chunks are identified by evaluating the sorted set of representative data. Finally, each of the other data chunks identified as those that can be stored relative to a base data chunk are stored in the data storage system as the difference between the data chunk and a base data chunk.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: December 16, 2014
    Assignee: EMC Corporation
    Inventors: Grant Wallace, Philip N. Shilane, Frederick Douglis
  • Patent number: 8914343
    Abstract: Keys are obtained and aggregated by storing a primary object as an entry in a parent keys storage and a child keys storage, the entry identified as unvisited in each. An object evaluation process is then performed until all unique entries in the parent keys storage and all unique entries in the child keys storage have been visited and by committing the keys of at least one related object as an entry to the hierarchical database. The object evaluation process visits each unvisited object in the parent keys storage and child keys storage by selecting, for the unvisited object, objects in the parent direction that have not already been visited and objects in the child direction that have not already been visited and by inserting the keys of the selected related objects as entries in the parent keys storage or child keys storage.
    Type: Grant
    Filed: December 4, 2012
    Date of Patent: December 16, 2014
    Assignee: CA, Inc.
    Inventors: B. V. K. Venu Gopala Rao, Muruganandam Somasundaram, James L. Broadhurst, Timothy J. Weltzer
  • Publication number: 20140365448
    Abstract: Aspects of the subject matter described herein relate to paragraph snapping. In aspects, trending data is collected and prepared for sending to one or more target machines. Upon receiving the trending data, a target machines installs the trending data locally and deletes previously installed trending data. After installation, the trending data may be used to suggest text in response to input from a user. If a user selects suggested text, the text may be added to a local dictionary of the target machine.
    Type: Application
    Filed: June 5, 2013
    Publication date: December 11, 2014
    Inventors: Daniel Ethan Keller, David A. Stevens, Bryan Douglas Scott, David Earl Washington
  • Publication number: 20140365449
    Abstract: A computing device receives a plurality of writes; each write is comprised of chunks of data. The computing device records metrics associated with the deduplication of the chunks of data from the plurality of writes. The computing device generates groups based on associating each group with a portion of a range of the metrics, such that each of the chunks of data are associated with one of the groups, and a similar number of chunks of data are associated with each group. The computing device determines a deduplication affinity for each of the groups based on the chunks of data that are duplicates and at least one metric. The computing device sets a threshold for the deduplication affinity and in response to any of the groups exceeding the threshold, the computing device excluding the chunks of data associated with a group exceeding the threshold, from deduplication.
    Type: Application
    Filed: June 6, 2013
    Publication date: December 11, 2014
    Inventors: David D. Chambliss, Bhushan P. Jain, Maohua Lu
  • Publication number: 20140365450
    Abstract: A system configured to generate a macro-fingerprint from at least one predefined set of summaries is provided. The system includes data storage storing a first predefined set of summaries associated with a first region of data, each member of the first predefined set of summaries characterizing data within the first region of data; and at least one processor coupled to the data storage and configured to: read the first predefined set of summaries; select at least one first member from the first predefined set of summaries based on a value of the at least one first member; and store the at least one first member within a first macro-fingerprint. The first region of data may have a first size indicative of to a quantity of data included in the first region of data. The macro fingerprints are created from previously created smaller (micro) fingerprints without having to reread the data.
    Type: Application
    Filed: June 6, 2013
    Publication date: December 11, 2014
    Inventors: Ronald Ray Trimble, Jon Christopher Kennedy
  • Publication number: 20140365451
    Abstract: A method and system for cleaning up junk files on a mobile terminal are provided. The method comprises: scanning, by a mobile terminal client, a file system on a local mobile terminal to generate a list of file information; submitting, by the mobile terminal client, to a server side the list of file information; comparing, by the server side, the list of file information submitted by the client with an associated list of file information in a server side database and returning the comparison result; determining a request for cleaning up in the file system on the basis of the comparison result, and performing, by the mobile terminal client, an operation of cleaning up.
    Type: Application
    Filed: March 13, 2013
    Publication date: December 11, 2014
    Inventors: Yaowei Chen, Yu Lin, Shihong Zou
  • Patent number: 8909607
    Abstract: A computer identifies a relationship among a subset of a set of data blocks, a basis of the relationship forming a context shared by the subset of data blocks. The computer selects a code data structure from a set of code data structures using the context. The context is associated with the code data structure, and the code data structure includes a set of codes. The computer computes, for a first data block in the subset of data blocks, a first code corresponding to a content of the first data block. The computer determines whether the first code matches a stored code in the code data structure. The computer replaces, responsive to the first code matching the stored code, the first data block with a reference to an instance of the first data block. The computer causes the reference to be stored in a target data processing system.
    Type: Grant
    Filed: May 21, 2012
    Date of Patent: December 9, 2014
    Assignee: International Business Machines Corporation
    Inventors: Vishal Chittranjan Aslot, Adekunle Bello, Brian W. Hart, Robert Wright Thompson
  • Patent number: 8908911
    Abstract: Systems and methods are described herein for identifying and filtering redundant database entries associated with a visual search system. An example of a method of managing a database associated with a mobile device described herein includes identifying a captured image; obtaining an external database record from an external database corresponding to an object identified from the captured image; comparing the external database record to a locally stored database record; and locally discarding one of the external database record or the locally stored database record if the comparing indicates overlap between the external database record and the locally stored database record.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: December 9, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Charles Wheeler Sweet, III, Prince Gupta