Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)

Redundant detection filtering

Patent number: 8908911

Abstract: Systems and methods are described herein for identifying and filtering redundant database entries associated with a visual search system. An example of a method of managing a database associated with a mobile device described herein includes identifying a captured image; obtaining an external database record from an external database corresponding to an object identified from the captured image; comparing the external database record to a locally stored database record; and locally discarding one of the external database record or the locally stored database record if the comparing indicates overlap between the external database record and the locally stored database record.

Type: Grant

Filed: September 30, 2011

Date of Patent: December 9, 2014

Assignee: QUALCOMM Incorporated

Inventors: Charles Wheeler Sweet, III, Prince Gupta
LIFE CYCLE MANAGEMENT OF METADATA

Publication number: 20140358868

Abstract: The program code assigns a first record to a first object having a first life cycle and a second record to a second object having a second life cycle, wherein the first object is associated to the second object, and wherein the assigning is based on configurable predefined rules. In response to receiving a request to perform a delete action on at least one of the first object and the second object, performing the delete action when the at least one of the first object and the second object has a life cycle that is in a destroy phase.

Type: Application

Filed: June 4, 2013

Publication date: December 4, 2014

Inventors: Jean-Marc Costecalde, Kevin N. Trinh
Systems, Methods, and Computer Program Products for Scheduling Processing to Achieve Space Savings

Publication number: 20140358873

Abstract: A method performed in a system that has a plurality of volumes stored to storage hardware, the method including generating, for each of the volumes, a respective space saving potential iteratively over time and scheduling space saving operations among the plurality of volumes by analyzing each of the volumes for space saving potential and assigning priority of resources based at least in part on space saving potential.

Type: Application

Filed: August 14, 2014

Publication date: December 4, 2014

Inventors: Vinod Kumar Daga, Craig Anthony Johnston, Ling Zheng
DE-DUPLICATION DEPLOYMENT PLANNING

Publication number: 20140358870

Abstract: Assignment of files to a de-duplication domain. Address space of data files is divided into multiple containers. For each of the containers, a file metadata scan is performed to obtain file system metadata, which is aggregated and summarized in a content feature summary. A content feature summary prediction measurement is measured between containers from the generated content feature summary, and files from each container are assigned to a de-duplication domain based upon the content similarity predication measurement.

Type: Application

Filed: September 3, 2013

Publication date: December 4, 2014

Applicant: International Business Machines Corporation

Inventors: David D. Chambliss, Mihail C. Constantinescu, Joseph S. Glider, Maohua Lu
STORAGE SYSTEM AND METHOD FOR PERFORMING DEDUPLICATION IN CONJUNCTION WITH HOST DEVICE AND STORAGE DEVICE

Publication number: 20140358872

Abstract: Provided is a method for performing deduplication in conjunction with a host device and a storage device, and a storage system therefor. The host device includes a brief examination device which is configured to briefly examine whether data to be stored is duplicated or not based on a hash value of the data to be stored, and a data transmission device which is configured to transmit the data to be stored with an examination request or a data storage request to the at least one storage device according to a result of the examination.

Type: Application

Filed: May 29, 2014

Publication date: December 4, 2014

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Hyun-jung SHIN, Ju-Pyung LEE
DE-DUPLICATION DEPLOYMENT PLANNING

Publication number: 20140358867

Abstract: Assignment of files to a de-duplication domain. Address space of data files is divided into multiple containers. For each of the containers, a file metadata scan is performed to obtain file system metadata, which is aggregated and summarized in a content feature summary. A content feature summary prediction measurement is measured between containers from the generated content feature summary, and files from each container are assigned to a de-duplication domain based upon the content similarity predication measurement.

Type: Application

Filed: June 3, 2013

Publication date: December 4, 2014

Inventors: David D. Chambliss, Mihail C. Constantinescu, Joseph S. Glider, Maohua Lu
DEDUPLICATION FOR A STORAGE SYSTEM

Publication number: 20140358871

Abstract: A method and system for deduplication of data to be stored on a storage system. A deduplication system performs a method that includes the steps of: segmenting a storage object into a plurality of data segments; generating a content similarity key indicative of a content of a data segment as well as associating a physical position on the storage medium for the data segment with the generated content similarity key; storing the association in deduplication index information; and using the stored associations for optimizing the deduplication.

Type: Application

Filed: May 20, 2014

Publication date: December 4, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Roy D. Cideciyan, Jens Jelitto, Slavisa Sarafijanovic, Jan Stanek
SYSTEM AND METHOD FOR ACCELERATING MAPREDUCE OPERATION

Publication number: 20140358869

Abstract: Provided are a system and method for accelerating a mapreduce operation. The system for accelerating a mapreduce operation includes at least one map node configured to perform a map operation in response to a map operation request of a master node, and at least one reduce node configured to perform a reduce operation using result data of the map operation. The map node includes at least one map operation accelerator configured to generate a data stream by merging a plurality of data blocks generated as results of the map operation and establish a transmission channel for transmission of the data stream, and the reduce node includes at least one reduce operation accelerator configured to receive the data stream from the map operation accelerator through the transmission channel, recover the plurality of data blocks from the received data stream, and provide the recovered data blocks for the reduce operation.

Type: Application

Filed: August 28, 2013

Publication date: December 4, 2014

Applicant: SAMSUNG SDS CO., LTD.

Inventor: Jin Cheol KIM
DE-DUPLICATION WITH PARTITIONING ADVICE AND AUTOMATION

Publication number: 20140358857

Abstract: Migrating a sub-volume in data storage with at least two de-duplication domains, each of the domains having at least one sub-volume. A first sub-volume is assigned to a de-duplication domain and a first content summary is computed for the first sub-volume. Similarly, a second sub-volume is assigned to a second de-duplication domains and a second content summary is computed for the second sub-volume. A first content affinity is calculated between the first sub-volume and a third sub-volume, and a second content affinity is calculated between the second sub-volume and the third sub-volume. A domain placement is selected for the third sub-volume based on comparison of the first content affinity and the second content affinity.

Type: Application

Filed: September 9, 2013

Publication date: December 4, 2014

Applicant: International Business Machines Corporation

Inventors: David D. Chambliss, Mihail C. Constantinescu, Joseph S. Glider, Bhushan P. Jain, Maohua Lu
Enhanced reliability in deduplication technology over storage clouds

Patent number: 8903764

Abstract: Methods and systems for enhancing reliability in deduplication over storage clouds are provided. A method includes: determining a weight for each of a plurality of duplicate files based on parameters associated with a respective storage device of each of the plurality of duplicate files; and designating one of the plurality of duplicate files as a master copy based on the determined weight.

Type: Grant

Filed: April 25, 2012

Date of Patent: December 2, 2014

Assignee: International Business Machines Corporation

Inventors: Sandeep R. Patil, Sri Ramanathan, Riyazahamad M. Shiraguppi, Prashant Sodhiya, Matthew B. Trevathan
Segmented fingerprint datastore and scaling a fingerprint datastore in de-duplication environments

Patent number: 8904120

Abstract: A storage server is coupled to a storage device that stores data blocks, and generates a fingerprint for each data block stored on the storage device. The storage server creates a master datastore and a plurality of datastore segments. The master datastore comprises an entry for each data block that is written to the storage device and a datastore segment comprises an entry for a new data block or a modified data block that is subsequently written to the storage device. The storage server merges the entries in the datastore segments with the entries in the master datastore in memory to free duplicate data blocks in the storage device. The storage server overwrites the master datastore with the entries in the plurality of datastore segments and the entries in the master datastore to create an updated master datastore in response to detecting that the number of datastore segments meets a threshold.

Type: Grant

Filed: December 15, 2010

Date of Patent: December 2, 2014

Assignee: NetApp Inc.

Inventors: Praveen Killamsetti, Subramaniam V. Periyagaram, Satbir Singh, Bipul Raj
Processing a request to restore deduplicated data

Patent number: 8904128

Abstract: For a restore request, at least a portion of a recipe that refers to chunks is read. Based on the recipe portion, a container having plural chunks is retrieved. From the recipe portion, it is identified which of the plural chunks of the container to save, where some of the chunks identified do not, at a time of the identifying, have to be presently communicated to a requester. The identified chunks are stored in a memory area from which chunks are read for the restore operation.

Type: Grant

Filed: June 8, 2011

Date of Patent: December 2, 2014

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Mark David Lillibridge
Distributed Feature Collection and Correlation Engine

Publication number: 20140351226

Abstract: A distributed feature collection and correlation engine is provided, Feature extraction comprises obtaining one or more data records; extracting information from the one or more data records based on domain knowledge; transforming the extracted information into a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; and storing the key/value pair in a feature store database if the key/value pair does not already exist in the feature store database using a de-duplication mechanism. Features extracted from data records can be queried by obtaining a feature store database comprised of the extracted features stored as a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; receiving a query comprised of at least one query key; retrieving values from the feature store database that match the query key; and returning one or more retrieved key/value pairs.

Type: Application

Filed: May 22, 2013

Publication date: November 27, 2014

Applicant: International Business Machines Corporation

Inventors: Mihai Christodorescu, Xin Hu, Douglas Lee Schales, Reiner Sailer, Marc P. Stoecklin, Ting Wang
DIALOG SYSTEM, REDUNDANT MESSAGE REMOVAL METHOD AND REDUNDANT MESSAGE REMOVAL PROGRAM

Publication number: 20140351228

Abstract: There are provided an answer evaluation means 501 that finds an answer content indicating how much an expression which would be an answer for a query is contained in a series of user's comments for the query contained in a set of queries which are response message candidate for a user's comment as characters string information indicating user's comment contents and which are character string information in a question form, and a query ranking means 502 that ranks each query in ascending order of answer content based on the answer content of each query in a user's comment found by the answer evaluation means 501.

Type: Application

Filed: August 14, 2012

Publication date: November 27, 2014

Inventor: Kosuke Yamamoto
Distributed Feature Collection and Correlation Engine

Publication number: 20140351227

Abstract: A distributed feature collection and correlation engine is provided, Feature extraction comprises obtaining one or more data records; extracting information from the one or more data records based on domain knowledge; transforming the extracted information into a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; and storing the key/value pair in a feature store database if the key/value pair does not already exist in the feature store database using a de-duplication mechanism. Features extracted from data records can be queried by obtaining a feature store database comprised of the extracted features stored as a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; receiving a query comprised of at least one query key; retrieving values from the feature store database that match the query key; and returning one or more retrieved key/value pairs.

Type: Application

Filed: August 15, 2013

Publication date: November 27, 2014

Applicant: International Business Machines Corporation

Inventors: Mihai Christodorescu, Xin Hu, Douglas Lee Schales, Reiner Sailer, Marc P. Stoecklin, Ting Wang
Methods and systems to selectively scrub a system memory

Patent number: 8898412

Abstract: A computer system is provided, the computer system having a processor and a system memory coupled to the processor. The computer system also includes a Basic Input/Output System (BIOS) in communication with the processor. The BIOS selectively scrubs the system memory during a shutdown process of the computer system.

Type: Grant

Filed: March 21, 2007

Date of Patent: November 25, 2014

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Louis B. Hobson, Wael M. Ibrahim, Manuel Novoa
Merging entries in a deduplication index

Patent number: 8898121

Abstract: Provided are a computer program product, system, and method for merging entries in a deduplication index. An index has chunk signatures calculated from chunks of data in the data objects in the storage, wherein each index entry includes at least one of the chunk signatures and a reference to the chunk of data from which the signature was calculated. Entries in the index are selected to merge and a merge operation is performed on the chunk signatures in the selected entries to generate a merged signature. An entry is added to the index including the merged signature and a reference to the chunks in the storage referenced in the merged selected entries. The index of the signatures is used in deduplication operations when adding data objects to the storage.

Type: Grant

Filed: May 29, 2012

Date of Patent: November 25, 2014

Assignee: International Business Machines Corporation

Inventors: Jonathan Amit, Corneliu M. Constantinescu, Joseph S. Glider, Shai I. Tahar
Storage devices and methods of driving storage devices

Patent number: 8898414

Abstract: A storage device includes a data storage having first and second storage areas corresponding to different physical addresses. First data are stored in the first storage area. The storage device further includes a first memory that stores a reference count associated with the first data, and a controller that rearranges the first data from the first storage area to the second storage area in response to a change in the reference count of the first data.

Type: Grant

Filed: July 11, 2012

Date of Patent: November 25, 2014

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hyun-Chul Park, Kyung-Ho Kim, Sang-Mok Kim, O-Tae Bae, Dong-Gi Lee, Jeong-Hoon Jeong
Fingerprints datastore and stale fingerprint removal in de-duplication environments

Patent number: 8898119

Abstract: A storage server is coupled to a storage device that stores blocks of data, and generates a fingerprint for each data block stored on the storage device. The storage server creates a fingerprints datastore that is divided into a primary datastore and a secondary datastore. The primary datastore comprises a single entry for each unique fingerprint and the secondary datastore comprises an entry having an identical fingerprint as an entry in the primary datastore. The storage server merges entries in a changelog with the entries in the primary datastore to identify duplicate data blocks in the storage device and frees the identified duplicate data blocks in the storage device. The storage server stores the entries that correspond to the freed data blocks to a third datastore and overwrites the primary datastore with the entries from the merged data that correspond to the unique fingerprints to create an updated primary datastore.

Type: Grant

Filed: December 15, 2010

Date of Patent: November 25, 2014

Assignee: NetApp, Inc.

Inventors: Alok Sharma, Praveen Killamsetti, Satbir Singh
Systems and methods for distributed data deduplication

Patent number: 8898120

Abstract: A computer-implemented method for distributed data deduplication may include (1) identifying a deduplicated data system, the deduplicated data system include a plurality of nodes, wherein each node within the plurality of nodes is configured to deduplicate data stored on the node, (2) identifying a data object to store within the deduplicated data system, (3) generating a similarity hash of the data object, the similarity hash representing a probabilistic dimension-reduction of the data object, (4) selecting, based at least in part on the similarity hash, a target node from the plurality nodes on which to store the data object, and then (5) routing the data object for storage on the target node based on the selection of the target node. Various other methods, systems, and computer-readable media are also disclosed.

Type: Grant

Filed: October 9, 2011

Date of Patent: November 25, 2014

Assignee: Symantec Corporation

Inventor: Petros Efstathopoulos
Streaming Content Storage

Publication number: 20140344227

Abstract: A computing system includes a plurality of dispersed storage (DS) processing units operable to receive a continuous data stream, simultaneously disperse storage error encode the continuous data stream to produce a plurality of encoded data slices and store the plurality of encoded data slices in a DS memory.

Type: Application

Filed: August 1, 2014

Publication date: November 20, 2014

Applicant: CLEVERSAFE, INC.

Inventors: Gary W. Grube, Timothy W. Markison, Jason K. Resch
SYSTEMS AND METHODS FOR DATA CHUNK DEDUPLICATION

Publication number: 20140344229

Abstract: A method includes receiving information about a plurality of data chunks and determining if one or more of a plurality of back-end nodes already stores more than a threshold amount of the plurality of data chunks where one of the plurality of back-end nodes is designated as a sticky node. The method further includes, responsive to determining that none of the plurality of back-end nodes already stores more than a threshold amount of the plurality of data chunks, deduplicating the plurality of data chunks against the back-end node designated as the sticky node. Finally, the method includes, responsive to an amount of data being processed, designating a different back-end node as the sticky node.

Type: Application

Filed: February 2, 2012

Publication date: November 20, 2014

Inventors: Mark D. Lillibridge, Kave Eshghi, Mark R. Watkins
Deduplication seeding

Patent number: 8892526

Abstract: Apparatus, methods, and other embodiments associated with de-duplication seeding are described. One example method includes re-configuring a data de-duplication repository with a blocklet from a data de-duplication seed corpus. Reconfiguring the repository may include adding a blocklet from the seed corpus to the repository, activating a blocklet identified with the seed corpus in the repository, removing a blocklet from the repository, and de-activating a blocklet in the repository. The example method may also include re-configuring a data de-duplication index associated with the data de-duplication repository with information about the blocklet. Reconfiguring the repository and the index increases the likelihood that a blocklet ingested by a data de-duplication apparatus that relies on the repository and the index will be treated as a duplicate blocklet by the data de-duplication apparatus.

Type: Grant

Filed: January 11, 2012

Date of Patent: November 18, 2014

Inventor: Timothy Stoakes
Managing redundant immutable files using deduplication in storage clouds

Patent number: 8892521

Abstract: A method includes receiving a request to save a first file as immutable. The method also includes searching for a second file that is saved and is redundant to the first file. The method further includes determining the second file is one of mutable and immutable. When the second file is mutable, the method includes saving the first file as a master copy, and replacing the second file with a soft link pointing to the master copy. When the second file is immutable, the method includes determining which of the first and second files has a later expiration date and an earlier expiration date, saving the one of the first and second files with the later expiration date as a master copy, and replacing the one of the first and second files with the earlier expiration date with a soft link pointing to the master copy.

Type: Grant

Filed: May 10, 2013

Date of Patent: November 18, 2014

Assignee: International Business Machines Corporation

Inventors: Gaurav Chhaunker, Bhushan P. Jain, Sandeep R. Patil, Sri Ramanathan, Matthew B. Trevathan
Accelerated deduplication

Patent number: 8892528

Abstract: Mechanisms are provided for accelerated data deduplication. A data stream is received an input interface and maintained in memory. Chunk boundaries are detected and chunk fingerprints are calculated using a deduplication accelerator while a processor maintains a state machine. A deduplication dictionary is accessed using a chunk fingerprint to determine if the associated data chunk has previously been written to persistent memory. If the data chunk has previously been written, reference counts may be updated but the data chunk need not be stored again. Otherwise, datastore suitcases, filemaps, and the deduplication dictionary may be updated to reflect storage of the data chunk. Direct memory access (DMA) addresses are provided to directly transfer a chunk to an output interface as needed.

Type: Grant

Filed: August 26, 2013

Date of Patent: November 18, 2014

Assignee: Dell Products L.P.

Inventors: Goutham Rao, Vinod Jayaraman
Use of predefined block pointers to reduce duplicate storage of certain data in a storage subsystem of a storage server

Patent number: 8892527

Abstract: A method and system for eliminating the redundant allocation and deallocation of special data on disk, wherein the redundant allocation and deallocation of special data on disk is eliminated by providing an innovate technique for specially allocating special data of a storage system. Specially allocated data is data that is pre-allocated on disk and stored in memory of the storage system. “Special data” may include any pre-decided data, one or more portions of data that exceed a pre-defined sharing threshold, and/or one or more portions of data that have been identified by a user as special. For example, in some embodiments, a zero-filled data block is specially allocated by a storage system. As another example, in some embodiments, a data block whose contents correspond to a particular type document header is specially allocated.

Type: Grant

Filed: September 14, 2012

Date of Patent: November 18, 2014

Assignee: NetApp, Inc.

Inventors: Sandeep Yadav, Subramanian Periyagaram
Data processing method and apparatus in cluster system

Patent number: 8892529

Abstract: In embodiments of the present invention, when a duplicate data query is performed on a received data stream, a first physical node which corresponds to each first sketch value and is in a cluster system is identified according to a first sketch value representing the data stream, and then the first sketch value representing the data stream is sent to the identified physical node for the duplicate data query, and a procedure of the duplicate data query does not change with an increase of the number of nodes in the cluster system; therefore, a calculation amount of each node does not increase with an increase of the number of nodes in the cluster system.

Type: Grant

Filed: December 24, 2013

Date of Patent: November 18, 2014

Assignee: Huawei Technologies Co., Ltd.

Inventors: Qiang Liu, Quancheng Sun, Xiaobo Liu, Jun You, Huadi Yang, Dan Zhou, Yan Huang
Method And Apparatus For Content-Aware And Adaptive Deduplication

Publication number: 20140337299

Abstract: A method, a system, an apparatus, and a computer readable medium for transmission of data across a network are disclosed.

Type: Application

Filed: July 28, 2014

Publication date: November 13, 2014

Inventors: David G. Therrien, David Andrew Thompson
Prioritizing data deduplication

Patent number: 8886613

Abstract: An example method includes controlling a data de-duplication apparatus to arrange a de-duplication schedule based on the presence or absence of a replication indicator in an item to be de-duplicated. The method also includes selectively controlling the de-duplication schedule based on a replication priority. In one embodiment, the method includes, upon determining that a chunk of data is associated with a replication indicator, controlling the data de-duplication apparatus to schedule the chunk for de-duplication ahead of chunks not associated with a replication indicator. In one embodiment, the method also includes, upon determining that the chunk is associated with a replication priority, controlling the data de-duplication apparatus to schedule the chunk for de-duplication ahead of chunks of data not associated with a replication priority. The schedule location is based, at least in part, on the replication priority. The method also includes controlling de-duplication order based on the schedule.

Type: Grant

Filed: October 12, 2010

Date of Patent: November 11, 2014

Inventor: Don Doerner
SYSTEM AND METHOD FOR CONTENT SCORING

Publication number: 20140330794

Abstract: The various implementations of the present invention are provided as a computer-based system for content scoring. Content from a variety of source feeds may be considered for inclusion in an aggregated feed, based on the content of the source feed. The content of the source feed may be “scored” according to a variety of user-configurable options, thereby identifying the most valuable content from the source feeds for inclusion in the aggregated feed. For example, certain content elements may be extracted from a variety of source feeds and then combined to create an aggregated feed where the aggregated feed contains only the highest scoring elements, as determined by the feed creator, from the various source feeds are used to create the aggregated feed.

Type: Application

Filed: July 17, 2014

Publication date: November 6, 2014

Applicant: PARLANT TECHNOLOGY, INC.

Inventors: Dane Dellenbach, Bruce Hassler, Jacob Hutchings, Carson Anderson
OPTIMIZING RESTORATION OF DEDUPLICATED DATA

Publication number: 20140330795

Abstract: A computer identifies a plurality of data retrieval requests that may be serviced using a plurality of unique data chunks. The computer services the data retrieval requests by utilizing at least one of the unique data chunks. At least one of the unique data chunks is utilized for servicing two or more of the data retrieval requests. The computer determines a servicing sequence for the plurality of data retrieval requests such that the two or more of the data retrieval requests that are serviced utilizing the at least one of the unique data chunks are serviced consecutively. The computer services the plurality of data retrieval requests according to the servicing sequence.

Type: Application

Filed: July 18, 2014

Publication date: November 6, 2014

Inventors: Kavita Chavda, Nagapramod S. Mandagere, Ramani R. Routray, Pin Zhou
CAPACITY FORECASTING FOR A DEDUPLICATING STORAGE SYSTEM

Publication number: 20140330793

Abstract: A system for managing a storage system comprises a processor and a memory. The processor is configured to receive storage system information from a deduplicating storage system. The processor is further configured to determine a capacity forecast based at least in part on the storage system information. The processor is further configured to provide a compression forecast. The memory is coupled to the processor and configured to provide the processor with instructions.

Type: Application

Filed: May 1, 2014

Publication date: November 6, 2014

Applicant: EMC CORPORATION

Inventor: Mark Chamness
Replication of deduplicated data

Patent number: 8880482

Abstract: Various embodiments for replicating deduplicated data using a processor device are provided. A block of the deduplicated data, created in a source repository, is assigned a global block identifier (ID) unique in a grid set inclusive of the source repository. The global block ID is generated using at least one unique identification value of the block, a containing grid of the grid set, and the source repository. The global block ID is transmitted from the source repository to a target repository. If the target repository determines the global block ID is associated with an existing block of the deduplicated data located within the target repository, the block is not transmitted to the target repository during a subsequent replication process.

Type: Grant

Filed: January 2, 2013

Date of Patent: November 4, 2014

Assignee: International Business Machines Corporation

Inventors: Shay H. Akirav, Lior Aronovich, Ron Asher, Yariv Bachar, Ariel J. Ish-Shalom, Ofer Leneman
Low-overhead enhancement of reliability of journaled file system using solid state storage and de-duplication

Patent number: 8880476

Abstract: A mechanism is provided in a data processing system for reliable asynchronous solid-state device based de-duplication. Responsive to receiving a write request to write data to the file system, the mechanism sends the write request to the file system, and in parallel, computes a hash key for the write data. The mechanism looks up the hash key in a de-duplication table. The de-duplication table is stored in a memory or a solid-state storage device. Responsive to the hash key not existing in the de-duplication table, the mechanism writes the write data to a storage device, writes a journal transaction comprising the hash key, and updates the de-duplication table to reference the write data in the storage device.

Type: Grant

Filed: June 28, 2012

Date of Patent: November 4, 2014

Assignee: International Business Machines Corporation

Inventors: Ranjit M. Noronha, Ajay K. Singh
Inverse distribution function operations in a parallel relational database

Patent number: 8880481

Abstract: Inverse distribution operations are performed on a large distributed parallel database comprising a plurality of distributed data segments to determine a data value at a predetermined percentile of a sorted dataset formed on one segment. Data elements from across the segments may be first grouped, either by partitioning keys or by hashing, the groups are sorted into a predetermined order, and data values corresponding to the desired percentile are picked up at a row location of the corresponding data element of each group. For a global dataset that is spread across the database segments, a local sort of data elements is performed on each segment, and the data elements from the local sorts are streamed in overall sorted order to one segment to form the sorted dataset.

Type: Grant

Filed: March 29, 2012

Date of Patent: November 4, 2014

Assignee: Pivotal Software, Inc.

Inventors: Hitoshi Harada, Caleb E. Welton, Gavin Sherry
Performance improvement of a capacity optimized storage system including a determiner

Patent number: 8880469

Abstract: A system for storing data comprises a performance storage unit and a performance segment storage unit. The system further comprises a determiner. The determiner determines whether a requested data is stored in the performance storage unit. The determiner determines whether the requested data is stored in the performance segment storage unit in the event that the requested data is not stored in the performance storage unit.

Type: Grant

Filed: April 18, 2013

Date of Patent: November 4, 2014

Assignee: EMC Corporation

Inventor: R. Hugo Patterson
SYSTEM AND METHOD FOR EFFICIENTLY DUPLICATING DATA IN A STORAGE SYSTEM, ELIMINATING THE NEED TO READ THE SOURCE DATA OR WRITE THE TARGET DATA

Publication number: 20140324791

Abstract: A method for copying data efficiently within a deduplicating storage system eliminates the need to read or write the data per se within the storage system. The copying is accomplished by creating duplicates of the metadata block pointers only. The result is a process that creates and arbitrary number of copies using minimal time and bandwidth.

Type: Application

Filed: June 21, 2013

Publication date: October 30, 2014

Inventor: Robert Petrocelli
Method for Layered Storage of Enterprise Data

Publication number: 20140324793

Abstract: A computer-implemented method for layered storage of enterprise data comprises receiving from one or more virtual machines data blocks; de-duplicating the data blocks per hypervisor; storing de-duplicated data blocks in a local cache memory; time-based grouping the data blocks into data containers; dividing each data container in X fixed length mega-blocks; for each data container applying erasure encoding to the X fixed length mega-blocks to thereby generate Y fixed length mega-blocks with redundant data, Y being larger than X; and distributed storing the Y fixed length mega-blocks across multiple backend storage systems.

Type: Application

Filed: April 8, 2014

Publication date: October 30, 2014

Applicant: CLOUDFOUNDERS NV

Inventor: Kurt GLAZEMAKERS
STATE-BASED DIRECTING OF SEGMENTS IN A MULTINODE DEDUPLICATED STORAGE SYSTEM

Publication number: 20140324796

Abstract: A system for directing for storage comprises a processor and a memory. The processor is configured to determine a segment overlap for each of a plurality of nodes. The processor is further configured to determine a selected node of the plurality of nodes based at least in part on the segment overlap for each of the plurality of nodes and based at least in part on a selection criteria. The memory is coupled to the processor and configured to provide the processor with instructions.

Type: Application

Filed: May 1, 2014

Publication date: October 30, 2014

Applicant: EMC CORPORATION

Inventors: Frederick Douglis, Philip Shilane, R. Hugo Patterson
DATA MANAGEMENT

Publication number: 20140324795

Abstract: Methods and systems for data management are disclosed. With embodiments of the present disclosure, data files originating from the same source data can be de-duplicated. One such method comprises calculating one or more of a first characteristic value for first data in a first format, and one or more second characteristic values for one or more data in one or more second formats into which the first data can be converted, said characteristic value uniquely representing an arrangement characteristic of at least part of bits of data in a particular format. The method also includes storing one of the first data and the second data in response to one of the calculated characteristic values being the same as a stored characteristic value corresponding to a second data.

Type: Application

Filed: April 28, 2014

Publication date: October 30, 2014

Applicant: International Business Machines Corporation

Inventors: Peng Hui Jiang, Pi Jun Jiang, Xi Ning Wang, Liang Xue, Wen Yin
SYSTEM AND METHOD FOR SELECTIVE FILE ERASURE USING METADATA MODIFCATIONS

Publication number: 20140324798

Abstract: A process that ensures the virtual destruction of data files a user wishes to erase from a storage medium, such as a hard drive, flash drive, or removable disk. This approach is appropriate for managing custom distributions from a large file sets as it is roughly linear in compute complexity to the number of files erased but is capped when many files are batch erased.

Type: Application

Filed: July 14, 2014

Publication date: October 30, 2014

Inventor: Alan Joshua Shapiro
CLEANER WITH BROWSER MONITORING

Publication number: 20140324788

Abstract: A cleaning application that can monitor one or more browser applications that are executed on a computer, and that can, for at least one browser application, clean at least one of one or more files or a registry associated with the at least one browser application is provided. The cleaning application can include a cleaning module. The cleaning module can monitor one or more browser applications that are executed on a computer. The cleaning module can further detect a closing of at least one browser application. The cleaning module can further perform a pre-defined action in response to the closing of the at least one browser application. The pre-defined action can include cleaning at least one of one or more files or a registry associated with the at least one browser application.

Type: Application

Filed: April 24, 2013

Publication date: October 30, 2014

Applicant: Piriform Ltd.

Inventor: Guy SANER
METHOD AND SYSTEM FOR MULTI-BLOCK OVERLAP-DETECTION IN A PARALLEL ENVIRONMENT WITHOUT INTER-PROCESS COMMUNICATION

Publication number: 20140324790

Abstract: Techniques for avoiding duplicate comparisons while comparing customer records to identify linked customer records pertaining to a single customer entity are provided. The techniques include the computer system comparing a first electronic customer record with a second electronic customer record to determine if the first electronic customer record and the second electronic customer record pertain to a single customer entity if the computer system identifies a common blocker key corresponding to a selected blocker from a data field in the first electronic customer record and from a data field in the second electronic customer record and if the computer system does not identify a common blocker key corresponding to an additional lower order blocker from another data field in the first electronic customer record and from a data field in the second electronic customer record.

Type: Application

Filed: April 26, 2013

Publication date: October 30, 2014

Applicant: Wal-Mart Stores, Inc.

Inventors: Andrew Benjamin Ray, Nathaniel Philip Troutman
Displaying Social Networking System User Information Via a Historical Newsfeed

Publication number: 20140324797

Abstract: The invention provides a display interface in a social networking system that enables the presentation of information related to a user in a timeline or map view. The system accesses information about a user of a social networking system, including both data about the user and social network activities related to the user. The system then selects one or more of these pieces of data and/or activities from a certain time period and gathers them into timeline units based on their relatedness and their relevance to users. These timeline units are ranked by relevance to the user, and are used to generate a timeline or map view for the user containing visual representations of the timeline units organized by location or time. The timeline or map view is then provided to other users of the social networking system that wish to view information about the user.

Type: Application

Filed: July 9, 2014

Publication date: October 30, 2014

Inventors: Raylene Kay Yung, Ryan Case, Jeff Huang, Samuel Lessin, Ryan David Mack, Paul M. McDonald, Serkan Piantino, Arun Vijayvergiya, Joshua Wiseman, Steven Young, Mark E. Zuckerberg
Methods for decomposing events from managed infrastructures

Publication number: 20140324794

Abstract: Methods are provided for clustering events. Data is received at an extraction engine from managed infrastructure. Events are converted into alerts and the alerts mapped to a matrix M. One or more common steps are determined from the events and clusters of events are produced relating to the alerts and or events.

Type: Application

Filed: April 28, 2014

Publication date: October 30, 2014

Applicant: Moogsoft, Inc.

Inventors: Philip Tee, Robert Duncan Harper, Charles Mike Silvey
CLEANER WITH COMPUTER MONITORING

Publication number: 20140324789

Abstract: A cleaning application that can monitor one or more characteristics of a computer, and that can clean at least one of one or more files or a registry of the computer, is provided. The cleaning application can include a cleaning module. The cleaning module can monitor one or more characteristics of the computer. The cleaning module can further detect an occurrence of pre-defined criteria involving the one or more characteristics. The cleaning module can further perform a pre-defined action in response to the pre-defined criteria. The pre-defined action can include cleaning at least one of one or more files or a registry associated with the computer.

Type: Application

Filed: April 24, 2013

Publication date: October 30, 2014

Applicant: Piriform Ltd.

Inventor: Guy SANER
EXTRACTING A SOCIAL GRAPH FROM CONTACT INFORMATION ACROSS A CONFINED USER BASE

Publication number: 20140324792

Abstract: Embodiments of the present invention relate to extraction of a social graph from contact information across a confined user base. Users are typically subscribed to a service that backs up data from end-user devices to a cloud. The data includes contacts from mobile address books. The service is able to determine relationships of contacts in the cloud to build a social graph or map of these contacts. The social graph can be used to drive individual and group analytics to, for example, increase membership and provide value-added features to its service members.

Type: Application

Filed: March 24, 2014

Publication date: October 30, 2014

Applicant: Synchronoss Technologies, Inc.

Inventors: Omar Chaudhry, Andrew Fuller
Data replication system

Patent number: 8874863

Abstract: Systems and methods are provided for an asynchronous data replication system in which the remote replication reduces bandwidth requirements by copying deduplicated differences in business data from a local storage site to a remote, backup storage site, the system comprising: a local performance storage pool for storing data; a local deduplicating storage pool for storing deduplicated data, said local deduplicating storage pool further storing metadata about data objects in the system and which has metadata analysis logic for identifying and specifying differences in a data object over time; a remote performance storage pool for storing a copy of said data, available for immediate use as a backup copy of said data to provide business continuity to said data; a remote deduplicating storage pool for storing deduplicated data; and a controller for synchronizing the remote performance storage pool to have the second version of the data object using deduplicated data.

Type: Grant

Filed: August 1, 2012

Date of Patent: October 28, 2014

Assignee: Actifio, Inc.

Inventors: Madhav Mutalik, Christopher A. Provenzano, Philip J. Abercrombie
Random number generator in a MPP database

Patent number: 8874602

Abstract: A random number generation process generated uncorrelated random numbers from identical random number sequences on parallel processing database segments of an MPP database without communications between the segments by establishing a different starting position in the sequence on each segment using an identifier that is unique to each segment, query slice information and the number of segments. A master node dispatches a seed value to initialize the random number sequence generation on all segments, and dispatches the query slice information and information as to the number of segments during a normal query plan dispatch process.

Type: Grant

Filed: September 29, 2012

Date of Patent: October 28, 2014

Assignee: Pivotal Software, Inc.

Inventors: Hitoshi Harada, Caleb Welton, Florian Schoppmann
Processes and methods for client-side fingerprint caching to improve deduplication system backup performance

Patent number: 8874520

Abstract: A system and method for caching fingerprints in a client cache is provided. A data object that comprises a set of data segments and describes a backup process is identified. Thereafter, a request referencing the data object is made to a deduplication server to request that a task identifier be added to the data object. If the deduplication server is able to successfully add the task identifier to the data object, then an active identifier is added to each data segment from the set of data segments in a cache that is within a client system.

Type: Grant

Filed: February 11, 2011

Date of Patent: October 28, 2014

Assignee: Symantec Corporation

Inventors: Xianbo Zhang, Thomas Hartnett, Weibao Wu

prev … 7 8 9 10 11 12 13 14 15 … next