Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)

GETTING DEPENDENCY METADATA USING STATEMENT EXECUTION PLANS

Publication number: 20150032703

Abstract: A database statement can be identified in a software artifact that is configured to issue the database statement. At least one execution plan for the database statement can be retrieved, and reference(s) to database object(s) can be identified in the execution plan(s). Metadata from the reference(s) can be assembled, where the metadata can reflect one or more dependencies of the software artifact on the object(s). The metadata can be included in a data structure.

Type: Application

Filed: October 6, 2014

Publication date: January 29, 2015

Inventors: Kaarthik Sivashanmugam, David I. Noor
SYSTEMS AND METHODS OF UNIFIED RECONSTRUCTION IN STORAGE SYSTEMS

Publication number: 20150032702

Abstract: Systems and methods for reconstructing unified data in an electronic storage network are provided which may include the identification and use of metadata stored centrally within the system. The metadata may be generated by a group of storage operation cells during storage operations within the network. The unified metadata is used to reconstruct data throughout the storage operation cells that may be missing, deleted or corrupt.

Type: Application

Filed: August 8, 2014

Publication date: January 29, 2015

Inventor: Parag GOKHALE
ADAPTIVE SIMILARITY SEARCH RESOLUTION IN A DATA DEDUPLICATION SYSTEM

Publication number: 20150026135

Abstract: For adaptive similarity search resolution in a data deduplication system using a processor device in a computing environment, input data is partitioned into data chunks. Input similarity elements are calculated for an input chunk. The input similarity elements are used to find similar data in a repository of data using a similarity search structure. A resolution level is calculated for storing the input similarity elements. The input similarity elements are stored in the calculated resolution level in the similarity search structure.

Type: Application

Filed: July 17, 2013

Publication date: January 22, 2015

Inventor: Lior ARONOVICH
Automated Data Validation

Publication number: 20150026136

Abstract: According to some embodiments, logic executing on a processor receives a request to compare a first file and a second file. Each file comprises records, attributes, and attribute values. An attribute value is a value that a record associates with a corresponding attribute. The logic receives a mapping file indicating a key and one or more selected attributes for comparison. The logic compares each record in the first file to its corresponding record in the second file, the corresponding record determined according to the key. For records that fail to match, the logic determines which of the selected attributes are unmatched. The logic communicates a report indicating a result of comparing the first file and the second file.

Type: Application

Filed: July 17, 2013

Publication date: January 22, 2015

Inventors: Nitesh Rathod, Sindhuja Subramani, Christopher T. Walsh, James H. Peterson, Jayanta Sengupta, Scott Murray
SCALABLE MECHANISM FOR DETECTION OF COMMONALITY IN A DEDUPLICATED DATA SET

Publication number: 20150026139

Abstract: Mechanisms are provided for efficiently determining commonality in a deduplicated data set in a scalable manner regardless of the number of deduplicated files or the number of stored segments. Information is generated and maintained during deduplication to allow scalable and efficient determination of data segments shared in a particular file, other files sharing data segments included in a particular file, the number of files sharing a data segment, etc. Data need not be expanded or uncompressed. Deduplication processing can be validated and verified during commonality detection.

Type: Application

Filed: October 6, 2014

Publication date: January 22, 2015

Applicant: Dell Products L.P.

Inventor: Vinod Jayaraman
MERGING ENTRIES IN A DEDUPLCIATION INDEX

Publication number: 20150026140

Abstract: Provided are a computer program product, system, and method for merging entries in a deduplication index. An index has chunk signatures calculated from chunks of data in the data objects in the storage, wherein each index entry includes at least one of the chunk signatures and a reference to the chunk of data from which the signature was calculated. Entries in the index are selected to merge and a merge operation is performed on the chunk signatures in the selected entries to generate a merged signature. An entry is added to the index including the merged signature and a reference to the chunks in the storage referenced in the merged selected entries. The index of the signatures is used in deduplication operations when adding data objects to the storage.

Type: Application

Filed: October 6, 2014

Publication date: January 22, 2015

Inventors: Jonathan Amit, Corneliu M. Constantinescu, Joseph S. Gilder, Shai I. Tahar
SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR MODIFYING AND DELETING DATA FROM A MOBILE DEVICE

Publication number: 20150026138

Abstract: Systems, methods, and computer program products are provided for transmitting modified sets of data to, or deleting existing sets of data from, mobile wallet applications on mobile devices. Data set identifiers associated with existing sets of data, attributes defining existing sets of data, and other information associated with existing sets of data are stored on a server. A change request to modify or delete an existing set of data is received from a service provider system. The server is searched for an existing set of data corresponding to the existing set of data identified in the change request. The change request is processed and a modified set of data, or a request to delete the existing set of data, is transmitted to mobile devices that have previously received the existing set of data.

Type: Application

Filed: July 15, 2014

Publication date: January 22, 2015

Inventors: Todd A. Strickler, Hani Nadra
RECOVERING FROM A PENDING UNCOMPLETED REORGANIZATION OF A DATA SET

Publication number: 20150026137

Abstract: Provided are a computer program product, system, and method for recovering from a pending uncompleted reorganization of a data set managing data sets in a storage. In response an initiation of an operation to access a data set, an operation is initiated to complete a pending uncompleted reorganization of the data set in response to the data set being in a pending uncompleted reorganization state and no other process currently accessing the data set.

Type: Application

Filed: July 17, 2013

Publication date: January 22, 2015

Inventors: Philip R. Chauvet, Charles J. House, David C. Reed, Max D. Smith
Method for organizing large numbers of documents

Patent number: 8938461

Abstract: A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of the data structure and capable of displaying selected data on a display unit. The data structure includes a plurality of directionally interlinked nodes, each node being associated with one or more documents having a header and body text. All the documents are associated with a given node and have identical normalized body text. All documents that have identical normalized body text are associated with the same node. One or more of the nodes is associated with more than one document. For any node that is a descendent of another node, the normalized body text of each document associated with the node is inclusive of the normalized body text of a document that is associated with the other node.

Type: Grant

Filed: July 20, 2010

Date of Patent: January 20, 2015

Assignee: Equivio Ltd.

Inventors: Yiftach Ravid, Amir Milo
Shared data de-duplication method and system

Patent number: 8937562

Abstract: This disclosure relates to synchronizing dictionaries of acceleration nodes in a computer network. For example, dictionaries of a plurality of acceleration nodes of a client-server network can be synchronized to each include one or more identical data items and data identifier pairs. Synchronization can include transmitting a particular data item, or a combination of a data item and an associated data identifier, to another acceleration node which includes it in its dictionary. A particular acceleration node can, instead of transmitting a data item, transmit an associated data identifier to another acceleration node. As all (or a subset) of the acceleration nodes can have an identical dictionary when employing the methods described herein, the particular acceleration node can use the same dictionary to communicate with all (or the subset of) other acceleration nodes of the computer network.

Type: Grant

Filed: July 29, 2013

Date of Patent: January 20, 2015

Assignee: SAP SE

Inventor: Or Igelka
Data abstraction layer for interfacing with reporting systems

Patent number: 8938414

Abstract: A data transformation system receives data from one or more external source systems and stores and transforms the data for providing to reporting systems. The data transformation system maintains multiple versions of data received from an external source system. The data transformation system can combine data from different versions of data and provide to the reporting system. As a result, external source systems that do not maintain data in a format appropriate for reporting systems and/or do not maintain sufficient historical data to generate different types of reports are able to generate these reports. The data transformation system can also enhance older versions of data stored in the system or exclude portions of data from reports. The data transformation system can purge older versions of data so that older data that is less frequently requested is maintained at a lower frequency than recent data.

Type: Grant

Filed: June 5, 2014

Date of Patent: January 20, 2015

Assignee: GoodData Corporation

Inventor: Pavel Kolesnikov
OPTIMIZING DIGEST BASED DATA MATCHING IN SIMILARITY BASED DEDUPLICATION

Publication number: 20150019506

Abstract: Data matches are calculated between input data and repository data via a digest based matching algorithm where in a first step digest matches, anchored at already verified matching positions in the input data and in the repository data, are extended to produce data matches. In a second step the remaining unmatched input digests are matched with repository digests and extended to produce further data matches.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventor: Lior ARONOVICH
PRODUCING ALTERNATIVE SEGMENTATIONS OF DATA INTO BLOCKS IN A DATA DEDUPLICATION SYSTEM

Publication number: 20150019508

Abstract: For producing secondary segmentations of data into blocks and corresponding digests for input data in a data deduplication system using a processor device in a computing environment, digests are calculated for an input data chunk using a primary segmentation into blocks. Secondary segmentations are produced for each of the data mismatches based on reference data, and used to calculate further data matches. The primary segmentation and the corresponding primary digests are stored for the input data chunk.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventor: Lior ARONOVICH
DIGEST BLOCK SEGMENTATION BASED ON REFERENCE SEGMENTATION IN A DATA DEDUPLICATION SYSTEM

Publication number: 20150019503

Abstract: For producing digest block segmentations based on reference segmentations in a data deduplication system using a processor device in a computing environment, digests are calculated for an input data chunk. Data matches and data mismatches are produced based on matching input digests with reference digests. Secondary digest block segmentations are obtained from similar reference intervals for each of the data mismatches and applied to the input data.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventors: Shay H. AKIRAV, Lior ARONOVICH, Michael HIRSCH, Yair TOAFF
APPLYING A MAXIMUM SIZE BOUND ON CONTENT DEFINED SEGMENTATION OF DATA

Publication number: 20150019510

Abstract: Applying a content defined maximum size bound on blocks produced by content defined segmentation of data by calculating the size of the interval of data between a newly found candidate segmenting position and a last candidate segmenting position of the same or higher hierarchy level, and then using the intermediate candidate segmenting positions of that interval if the size of the interval exceeds the maximum size bound, or discarding the intermediate candidate segmenting positions of that interval if the size of the interval does not exceed the maximum size bound.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Lior ARONOVICH
DIGEST BASED DATA MATCHING IN SIMILARITY BASED DEDUPLICATION

Publication number: 20150019499

Abstract: Data matches are calculated between input data and repository data via a digest based matching algorithm where the reference digests corresponding to a repository interval of data identified as similar to an input interval of data are loaded into a sequential array and into a search structure. Each of the matching digests found using the search structure are extended using the sequential array of reference digests. Repository data intervals are determined as similar to an input data interval. Reference digests corresponding to the similar repository data interval are loaded into a sequential representation and into a search structure. Matches of input digests and the reference digests are found using the search structure. Each one of the found matches of the input digests and repository digests are extended using the sequential representation. Data matches are determined between the input data and the repository data using extended matches of digests.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventor: Lior ARONOVICH
OPTIMIZING HASH TABLE STRUCTURE FOR DIGEST MATCHING IN A DATA DEDUPLICATION SYSTEM

Publication number: 20150019507

Abstract: Repository data intervals are determined as similar to an input data interval. Repository digests corresponding to the similar repository data interval are loaded into a sequential representation and into a search structure. Matches of input digests and the repository digests are found using the search structure. Each one of the found matches of the input digests and repository digests are extended using the sequential representation. Data matches are determined between the input data and the repository data using extended matches of digests. A compact index pointing to a position in the sequential representation of digests is incorporated into entries of the search structure.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventor: Lior ARONOVICH
READ AHEAD OF DIGESTS IN SIMILARITY BASED DATA DEDUPLICATON

Publication number: 20150019502

Abstract: For read ahead of digests in similarity based data deduplication in a data deduplication system using a processor device in a computing environment, input data is partitioned into data chunks and digest values are calculated for each of the data chunks. The positions and sizes of similar data intervals in a repository of data are found for each of the data chunks. The positions and the sizes of read ahead intervals are calculated based on the similar data intervals. The read ahead digests of the read ahead intervals are located and loaded into memory in a background read ahead process.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventors: Lior ARONOVICH, Michael HIRSCH
APPLYING A MINIMUM SIZE BOUND ON CONTENT DEFINED SEGMENTATION OF DATA

Publication number: 20150019511

Abstract: Applying a content defined minimum size bound on blocks produced by content defined segmentation of data by calculating the size of the interval of data between a newly found candidate segmenting position and a last candidate segmenting position of same or higher hierarchy level, and then discarding the newly found candidate segmenting position if a size of an interval of data is lower than the minimum size bound, or retaining the newly found candidate segmenting position if the size of the interval of data is not lower than the minimum size bound or if there is no last candidate segmenting position of a same or higher hierarchy level as the newly found candidate segmenting position. When a last candidate segmenting position of a same or higher hierarchy level becomes available, the evaluation is reiterated to converge edge segmenting positions of the outputs of consecutive calculation units.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventor: Lior ARONOVICH
TIME-SERIES ANALYSIS BASED ON WORLD EVENT DERIVED FROM UNSTRUCTURED CONTENT

Publication number: 20150019513

Abstract: The present subject matter relates to analysis of time-series data based on world events derived from unstructured content. According to one embodiment, a method comprises obtaining event information corresponding to at least one world event from unstructured content obtained from a plurality of data sources. The event information includes at least time of occurrence of the world event, time of termination of the world event, and at least one entity associated with the world event. Further, the method comprises retrieving time-series data pertaining to the entity associated with the world event from a time-series data repository. Based on the event information and the time-series data, the world event is aligned and correlated with at least one time-series event to identify at least one pattern indicative of cause-effect relationship amongst the world event and the time-series event.

Type: Application

Filed: July 10, 2014

Publication date: January 15, 2015

Inventors: Lipika DEY, Ishan VERMA, Arpit KHURDIYA, Diwakar MAHAJAN, Gautam SHROFF
SYSTEMS AND METHODS FOR FILTERING LOW UTILITY VALUE MESSAGES FROM SYSTEM LOGS

Publication number: 20150019512

Abstract: Systems and methods disclosed herein provide intelligent filtering of system log messages having low utility value. In providing the filtering, the systems and methods determine the utility value of a system log message and delete the message from the system log if the message is determined to be of low utility value. As such, embodiments herein provide an system log filter, which reduces the amount of data stored in the system log based on the utility value of the message.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventors: Jayanta Basak, Nagesh Panyam Chandrasekarasastry
REDUCING ACTIVATION OF SIMILARITY SEARCH IN A DATA DEDUPLICATION SYSTEM

Publication number: 20150019500

Abstract: For conditional activation of similarity search in a data deduplication system using a processor device in a computing environment, input data is partitioned into data chunks. A determination is made as to whether to apply the similarity search process for an input data chunk based on deduplication results of a previous input data chunk in the input data.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventor: Lior ARONOVICH
GLOBAL DIGESTS CACHING IN A DATA DEDUPLICATION SYSTEM

Publication number: 20150019501

Abstract: For utilizing a global digests cache in deduplication processing in a data deduplication system using a processor device in a computing environment, input data is partitioned into data chunks and digest values are calculated for each of the data chunks. The positions of similar repository data are found in a repository of data for each of the data chunks. The repository digests of the similar repository data are located and loaded into the global digests cache. The global digests cache contains digests previously loaded by other deduplication processes. The input digests of the input data are matched with the repository digests contained in the global digests cache for locating data matches.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventors: Shay H. AKIRAV, Lior ARONOVICH
COMPATIBILITY AND INCLUSION OF SIMILARITY ELEMENT RESOLUTIONS

Publication number: 20150019509

Abstract: For adaptive similarity search resolution in a data deduplication system using a processor device in a computing environment, multiple resolution levels are configured for a similarity search. Input similarity elements are calculated in one resolution level for a chunk of input data. The input similarity elements of the one resolution level are used to find similar data in a repository of data where similarity elements of the stored similar repository data are of the multiple resolution levels.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventor: Lior ARONOVICH
CALCULATION OF DIGEST SEGMENTATIONS FOR INPUT DATA USING SIMILAR DATA IN A DATA DEDUPLICATION SYSTEM

Publication number: 20150019504

Abstract: For calculation of digest segmentations for input data using similar data in a data deduplication system using a processor device in a computing environment, a stream of input data is partitioned into input data chunks. Similar repository intervals are calculated for each input data chunk. Anchor positions are determined between an input data chunk and the similar repository intervals, based on data matches between a previous input data chunk and previous similar repository intervals. Digest segmentations of the similar repository intervals are projected onto the input data chunk, starting at the anchor positions.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventor: Lior ARONOVICH
DATA STRUCTURES FOR DIGESTS MATCHING IN A DATA DEDUPLICATION SYSTEM

Publication number: 20150019505

Abstract: Data matches are calculated in a data deduplication system by matching input and repository digests using a digest based data matching process where the reference digests corresponding to a repository interval of data identified as similar to an input interval of data are loaded into two data structures. The two data structures include a sequential buffer containing digests in a sequence of occurrence in the data and a search structure for searching of the reference digests matching a version digest.

Type: Application

Filed: July 15, 2013

Publication date: January 15, 2015

Inventor: Lior ARONOVICH
Determining audience members associated with a set of videos

Patent number: 8935713

Abstract: Determining a video audience is disclosed, including: identifying a set of videos based at least in part on a received criterion; querying a video database to retrieve engagements associated with each of at least a subset of the set of videos; identifying a set of audience members associated with the engagements associated with each of the at least subset of the set of videos; and querying a user database to gather events associated with each of at least a subset of the set of audience members.

Type: Grant

Filed: May 24, 2013

Date of Patent: January 13, 2015

Assignee: Tubular Labs, Inc.

Inventors: Robert L. Gabel, David A. Koblas, Allison J. Stern
Fast and low-RAM-footprint indexing for data deduplication

Patent number: 8935487

Abstract: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.

Type: Grant

Filed: December 28, 2010

Date of Patent: January 13, 2015

Assignee: Microsoft Corporation

Inventors: Sudipta Sengupta, Biplob Debnath, Jin Li, Ronakkumar N. Desai, Paul Adrian Oltean
Optimizing a partition in data deduplication

Patent number: 8935222

Abstract: For optimizing a partition of a data block into matching and non-matching segments in data deduplication using a processor device in a computing environment, an optimal calculation operation is applied in polynomial time to the matching segments for selecting a globally optimal subset of a set of matching segments according to overhead considerations for minimizing an overall size of a deduplicated file by determining a trade off between a time complexity and a space complexity.

Type: Grant

Filed: January 2, 2013

Date of Patent: January 13, 2015

Assignee: International Business Machines Corporation

Inventors: Michael Hirsch, Ariel J. Ish-Shalom, Shmuel T. Klein
PROVIDING IDENTIFIERS TO DATA FILES IN A DATA DEDUPLICATION SYSTEM

Publication number: 20150012504

Abstract: Data file in the data deduplication system are associated with a file identifier defined to have a first part identifier for denoting a location of the data file in a storage, and a second part identifier for uniquely identifying the data file in the data deduplication system over time.

Type: Application

Filed: July 8, 2013

Publication date: January 8, 2015

Inventors: Shay H. AKIRAV, Lior ARONOVICH, Rafael BUCHBINDER, Ariel J. ISH-SHALOM, Lior TAMARY
SELF-HEALING BY HASH-BASED DEDUPLICATION

Publication number: 20150012503

Abstract: For self-healing in a hash-based deduplication system using a processor device in a computing environment, deduplication digests of data and a corresponding list of the deduplication digests in a table of contents (TOC) are maintained for the self-healing of data that is lost or unreadable. The input data digests are compared to the TOC if directed to data that is lost or unreadable, and the input data digests are used to repair the one of lost and unreadable data.

Type: Application

Filed: July 8, 2013

Publication date: January 8, 2015

Inventors: Shay H. AKIRAV, Michael HIRSCH
Secure distributed deduplication in encrypted data storage

Patent number: 8930687

Abstract: In an encrypted storage system employing data deduplication, encrypted data units are stored with the respective keyed data digests. A secure equivalence process is performed to determine whether an encrypted data unit on one storage unit is a duplicate of an encrypted data unit on another storage unit. The process includes an exchange phase and a testing phase in which no sensitive information is exposed outside the storage units. If duplication is detected then the duplicate data unit is deleted from one of the storage units and replaced with a mapping to the encrypted data unit as stored on the other storage unit. The mapping is used at the one storage unit when the corresponding logical data unit is accessed there.

Type: Grant

Filed: March 15, 2013

Date of Patent: January 6, 2015

Assignee: EMC Corporation

Inventors: Peter Alan Robinson, Eric Young
Method and system for scrubbing information from heap dumps

Patent number: 8930327

Abstract: In production applications that process and transfer secure and sensitive customer data, the heap dump files of these applications, which may be useful for debugging production issues and bugs, may contain secure and sensitive information. Thus, to make the useful debugging information available in heap dumps from production applications without compromising secure client data to those assigned to debugging and fixing production issues, these heap dumps may be scrubbed of sensitive information without scrubbing information that is useful for debugging.

Type: Grant

Filed: April 28, 2011

Date of Patent: January 6, 2015

Assignee: salesforce.com, inc.

Inventors: Fiaz Hossain, Zuye Zheng
Storage system, storage system control method, and storage control device

Patent number: 8930328

Abstract: It is provided a storage system including a storage device for storing data, and a controller for controlling data read/write in the storage device. The controller includes a processor for executing a program, and a memory for storing the program that is executed by the processor. The processor executes deduplication processing for converting a duplicate part of data that is stored in the storage device into shared data, and calculates a distributed capacity consumption, which represents a capacity of a storage area that is used by a user in the storage device, by using a size of the data prior to the deduplication processing and a count of pieces of data referring to the shared data that is referred to by this data.

Type: Grant

Filed: November 13, 2012

Date of Patent: January 6, 2015

Assignee: Hitachi, Ltd.

Inventors: Jun Nemoto, Hitoshi Kamei, Atsushi Sutoh
Data storage deduplication systems and methods

Patent number: 8924366

Abstract: Storage systems and methods are presented. In one embodiment, a variable length segment storage method comprises: receiving a data stream; performing a tailored segment process on the data stream, wherein at least one of a plurality of tailored segments include corresponding data of at least one of a plurality of variable length segments and alignment padding to align with boundaries of a fixed length de-duplication scheme; performing a de-duplication process on the plurality of tailored segments; and storing information corresponding to the result of the de-duplication process. In one embodiment, the tailored segment process includes adjusting the alignment padding of the at least one of a plurality of tailored segments, wherein an adjustment in the alignment padding of the at least one of a plurality of tailored segments corresponds to a modification in the at least one of the plurality of variable length segments.

Type: Grant

Filed: September 16, 2011

Date of Patent: December 30, 2014

Assignee: Symantec Corporation

Inventor: Graham Bromley
Data Item Deletion in a Database System

Publication number: 20140379670

Abstract: Example systems and methods of deleting data stored in a database system are presented. In one example, a plurality of data items is received from an application and stored at the database system. Also received from the application and stored at the database system is deletion timing information for each of the data items. The deletion timing information for a data item may indicate when the data item is to be deleted from the database system. At least one of the data items may be deleted at the database system at a time indicated by its corresponding deletion timing information without assistance from the application.

Type: Application

Filed: June 19, 2013

Publication date: December 25, 2014

Applicant: SAP AG

Inventor: Gernot Kuhr
DATA SCRUBBING IN CLUSTER-BASED STORAGE SYSTEMS

Publication number: 20140379671

Abstract: Disclosed is the technology for data scrubbing in a cluster-based storage system. This technology allows protecting data against failures of storage devices by periodically reading data object replicas and data object hashes stored in a plurality of storage devices and rewriting those data object replicas that have errors. The present disclosure addresses aspects of writing data object replicas and hashes, checking validity of data object replicas, and performing data scrubbing based upon results of the checking.

Type: Application

Filed: June 19, 2014

Publication date: December 25, 2014

Inventors: Frank E. Barrus, Tad Hunt
SINGLE INSTANTIATION METHOD USING FILE CLONE AND FILE STORAGE SYSTEM UTILIZING THE SAME

Publication number: 20140379672

Abstract: The file storage system includes a controller and a volume storing a plurality of files, the volume including a first directory storing a first file and a second file and a second directory storing a third file being created. The controller migrates actual data of the second file to the third file, sets up a management information of the second file so that the third file is referred to when the second file is read, and if the sizes of actual data of the first file and the actual data of the third file are identical and the binaries of the actual data of the first file and the actual data of the third file are identical, sets up a management information of the first file to refer to the third file when reading the first file.

Type: Application

Filed: September 11, 2014

Publication date: December 25, 2014

Inventors: Tomonori Esaka, Takaki Nakamura, Hitoshi Kamei, Masakuni Agetsuma
Content-aware distributed deduplicating storage system based on consistent hashing

Patent number: 8918372

Abstract: A set of metadata associated with backup data is obtained. A consistent hash key for the backup data is generated based at least in part on the set of metadata. The backup data is assigned to one of a plurality of deduplication nodes based at least in part on the consistent hash key.

Type: Grant

Filed: September 19, 2012

Date of Patent: December 23, 2014

Assignee: EMC Corporation

Inventors: Feng Guo, Qiyan Chen, Mandavilli Navneeth Rao, Lintao Wan, Dong Xiang
DETECTING WASTEFUL DATA COLLECTION

Publication number: 20140372386

Abstract: A method and system comprising a duplication identifier module to analyze data input information to automatically identify duplicate expected inputs associated with a process are shown. The system includes logical process model information defining a logically structured series of process activities and data input information representing a plurality of expected inputs associated with respective process activities, with each expected input being indicative of expected collection of a corresponding data element during execution of the associated process activity. Each duplicate expected input comprises one of the plurality of expected inputs for which there is at least one other expected input with respect to a common corresponding data element.

Type: Application

Filed: August 27, 2014

Publication date: December 18, 2014

Inventors: Vikram Duvvoori, Satish Venkatesan Srinivasan, Prasad A. Chodavarapu, Ravindra S. Gajulapalli, Rajesh Ramesh Agrawal
METHOD AND MECHANISM FOR REDUCING CLIENT-SIDE MEMORY FOOTPRINT OF TRANSMITTED DATA

Publication number: 20140372387

Abstract: The present invention is directed to a method and mechanism for reducing the expense of data transmissions between a client and a server. According to an aspect of data prefetching is utilized to predictably retrieve information between the client and server. Another aspect pertains to data redundancy management for reducing the expense of transmitting and storing redundant data between the client and server. Another aspect relates to moved data structures for tracking and managing data at a client in conjunction with data redundancy management.

Type: Application

Filed: August 29, 2014

Publication date: December 18, 2014

Applicant: ORACLE INTERNATIONAL CORPORATION

Inventors: Sreenivas GOLLAPUDI, Debashish CHATTERJEE
Computer-implemented system and method for identifying duplicate and near duplicate messages

Patent number: 8914331

Abstract: A computer-implemented system and method for identifying duplicate and near duplicate messages is provided. A set of messages is obtained. A body of one such message is compared with the body of each other message. Those messages having matching bodies are identified as exact duplicates. The exact duplicates are removed from the set. The remaining messages are sorted in order of message length and a shorter message is compared with a longer message. A determination is made that the body of the shorter message is included in the body of the longer message and the shorter message is marked as a near duplicate of the longer message.

Type: Grant

Filed: January 6, 2014

Date of Patent: December 16, 2014

Assignee: FTI Technology LLC

Inventors: Kenji Kawai, David T. McDonald
Out-of-core similarity matching

Patent number: 8914338

Abstract: A method for storing data in a data storage system by partitioning the data into a plurality of data chunks and generating representative data for each of the plurality of chunks by applying a predetermined algorithm to each chunk of the plurality of chunks. Subsequently, the representative data is compared and sorted. Representative data for base data chunks and representative data for other data chunks that can be stored relative to the base data chunks are identified by evaluating the sorted set of representative data. Finally, each of the other data chunks identified as those that can be stored relative to a base data chunk are stored in the data storage system as the difference between the data chunk and a base data chunk.

Type: Grant

Filed: December 22, 2011

Date of Patent: December 16, 2014

Assignee: EMC Corporation

Inventors: Grant Wallace, Philip N. Shilane, Frederick Douglis
Aggregating keys of dependent objects for a given primary object

Patent number: 8914343

Abstract: Keys are obtained and aggregated by storing a primary object as an entry in a parent keys storage and a child keys storage, the entry identified as unvisited in each. An object evaluation process is then performed until all unique entries in the parent keys storage and all unique entries in the child keys storage have been visited and by committing the keys of at least one related object as an entry to the hierarchical database. The object evaluation process visits each unvisited object in the parent keys storage and child keys storage by selecting, for the unvisited object, objects in the parent direction that have not already been visited and objects in the child direction that have not already been visited and by inserting the keys of the selected related objects as entries in the parent keys storage or child keys storage.

Type: Grant

Filed: December 4, 2012

Date of Patent: December 16, 2014

Assignee: CA, Inc.

Inventors: B. V. K. Venu Gopala Rao, Muruganandam Somasundaram, James L. Broadhurst, Timothy J. Weltzer
TRENDING SUGGESTIONS

Publication number: 20140365448

Abstract: Aspects of the subject matter described herein relate to paragraph snapping. In aspects, trending data is collected and prepared for sending to one or more target machines. Upon receiving the trending data, a target machines installs the trending data locally and deletes previously installed trending data. After installation, the trending data may be used to suggest text in response to input from a user. If a user selects suggested text, the text may be added to a local dictionary of the target machine.

Type: Application

Filed: June 5, 2013

Publication date: December 11, 2014

Inventors: Daniel Ethan Keller, David A. Stevens, Bryan Douglas Scott, David Earl Washington
INLINE LEARNING-BASED SELECTIVE DEDUPLICATION FOR PRIMARY STORAGE SYSTEMS

Publication number: 20140365449

Abstract: A computing device receives a plurality of writes; each write is comprised of chunks of data. The computing device records metrics associated with the deduplication of the chunks of data from the plurality of writes. The computing device generates groups based on associating each group with a portion of a range of the metrics, such that each of the chunks of data are associated with one of the groups, and a similar number of chunks of data are associated with each group. The computing device determines a deduplication affinity for each of the groups based on the chunks of data that are duplicates and at least one metric. The computing device sets a threshold for the deduplication affinity and in response to any of the groups exceeding the threshold, the computing device excluding the chunks of data associated with a group exceeding the threshold, from deduplication.

Type: Application

Filed: June 6, 2013

Publication date: December 11, 2014

Inventors: David D. Chambliss, Bhushan P. Jain, Maohua Lu
SYSTEM AND METHOD FOR MULTI-SCALE NAVIGATION OF DATA

Publication number: 20140365450

Abstract: A system configured to generate a macro-fingerprint from at least one predefined set of summaries is provided. The system includes data storage storing a first predefined set of summaries associated with a first region of data, each member of the first predefined set of summaries characterizing data within the first region of data; and at least one processor coupled to the data storage and configured to: read the first predefined set of summaries; select at least one first member from the first predefined set of summaries based on a value of the at least one first member; and store the at least one first member within a first macro-fingerprint. The first region of data may have a first size indicative of to a quantity of data included in the first region of data. The macro fingerprints are created from previously created smaller (micro) fingerprints without having to reread the data.

Type: Application

Filed: June 6, 2013

Publication date: December 11, 2014

Inventors: Ronald Ray Trimble, Jon Christopher Kennedy
METHOD AND SYSTEM FOR CLEANING UP FILES ON A DEVICE

Publication number: 20140365451

Abstract: A method and system for cleaning up junk files on a mobile terminal are provided. The method comprises: scanning, by a mobile terminal client, a file system on a local mobile terminal to generate a list of file information; submitting, by the mobile terminal client, to a server side the list of file information; comparing, by the server side, the list of file information submitted by the client with an associated list of file information in a server side database and returning the comparison result; determining a request for cleaning up in the file system on the basis of the comparison result, and performing, by the mobile terminal client, an operation of cleaning up.

Type: Application

Filed: March 13, 2013

Publication date: December 11, 2014

Inventors: Yaowei Chen, Yu Lin, Shihong Zou
Context sensitive reusable inline data deduplication

Patent number: 8909607

Abstract: A computer identifies a relationship among a subset of a set of data blocks, a basis of the relationship forming a context shared by the subset of data blocks. The computer selects a code data structure from a set of code data structures using the context. The context is associated with the code data structure, and the code data structure includes a set of codes. The computer computes, for a first data block in the subset of data blocks, a first code corresponding to a content of the first data block. The computer determines whether the first code matches a stored code in the code data structure. The computer replaces, responsive to the first code matching the stored code, the first data block with a reference to an instance of the first data block. The computer causes the reference to be stored in a target data processing system.

Type: Grant

Filed: May 21, 2012

Date of Patent: December 9, 2014

Assignee: International Business Machines Corporation

Inventors: Vishal Chittranjan Aslot, Adekunle Bello, Brian W. Hart, Robert Wright Thompson
Redundant detection filtering

Patent number: 8908911

Abstract: Systems and methods are described herein for identifying and filtering redundant database entries associated with a visual search system. An example of a method of managing a database associated with a mobile device described herein includes identifying a captured image; obtaining an external database record from an external database corresponding to an object identified from the captured image; comparing the external database record to a locally stored database record; and locally discarding one of the external database record or the locally stored database record if the comparing indicates overlap between the external database record and the locally stored database record.

Type: Grant

Filed: September 30, 2011

Date of Patent: December 9, 2014

Assignee: QUALCOMM Incorporated

Inventors: Charles Wheeler Sweet, III, Prince Gupta

prev … 6 7 8 9 10 11 12 13 14 … next