Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
  • Patent number: 8908911
    Abstract: Systems and methods are described herein for identifying and filtering redundant database entries associated with a visual search system. An example of a method of managing a database associated with a mobile device described herein includes identifying a captured image; obtaining an external database record from an external database corresponding to an object identified from the captured image; comparing the external database record to a locally stored database record; and locally discarding one of the external database record or the locally stored database record if the comparing indicates overlap between the external database record and the locally stored database record.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: December 9, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Charles Wheeler Sweet, III, Prince Gupta
  • Publication number: 20140358868
    Abstract: The program code assigns a first record to a first object having a first life cycle and a second record to a second object having a second life cycle, wherein the first object is associated to the second object, and wherein the assigning is based on configurable predefined rules. In response to receiving a request to perform a delete action on at least one of the first object and the second object, performing the delete action when the at least one of the first object and the second object has a life cycle that is in a destroy phase.
    Type: Application
    Filed: June 4, 2013
    Publication date: December 4, 2014
    Inventors: Jean-Marc Costecalde, Kevin N. Trinh
  • Publication number: 20140358873
    Abstract: A method performed in a system that has a plurality of volumes stored to storage hardware, the method including generating, for each of the volumes, a respective space saving potential iteratively over time and scheduling space saving operations among the plurality of volumes by analyzing each of the volumes for space saving potential and assigning priority of resources based at least in part on space saving potential.
    Type: Application
    Filed: August 14, 2014
    Publication date: December 4, 2014
    Inventors: Vinod Kumar Daga, Craig Anthony Johnston, Ling Zheng
  • Publication number: 20140358870
    Abstract: Assignment of files to a de-duplication domain. Address space of data files is divided into multiple containers. For each of the containers, a file metadata scan is performed to obtain file system metadata, which is aggregated and summarized in a content feature summary. A content feature summary prediction measurement is measured between containers from the generated content feature summary, and files from each container are assigned to a de-duplication domain based upon the content similarity predication measurement.
    Type: Application
    Filed: September 3, 2013
    Publication date: December 4, 2014
    Applicant: International Business Machines Corporation
    Inventors: David D. Chambliss, Mihail C. Constantinescu, Joseph S. Glider, Maohua Lu
  • Publication number: 20140358872
    Abstract: Provided is a method for performing deduplication in conjunction with a host device and a storage device, and a storage system therefor. The host device includes a brief examination device which is configured to briefly examine whether data to be stored is duplicated or not based on a hash value of the data to be stored, and a data transmission device which is configured to transmit the data to be stored with an examination request or a data storage request to the at least one storage device according to a result of the examination.
    Type: Application
    Filed: May 29, 2014
    Publication date: December 4, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Hyun-jung SHIN, Ju-Pyung LEE
  • Publication number: 20140358867
    Abstract: Assignment of files to a de-duplication domain. Address space of data files is divided into multiple containers. For each of the containers, a file metadata scan is performed to obtain file system metadata, which is aggregated and summarized in a content feature summary. A content feature summary prediction measurement is measured between containers from the generated content feature summary, and files from each container are assigned to a de-duplication domain based upon the content similarity predication measurement.
    Type: Application
    Filed: June 3, 2013
    Publication date: December 4, 2014
    Inventors: David D. Chambliss, Mihail C. Constantinescu, Joseph S. Glider, Maohua Lu
  • Publication number: 20140358871
    Abstract: A method and system for deduplication of data to be stored on a storage system. A deduplication system performs a method that includes the steps of: segmenting a storage object into a plurality of data segments; generating a content similarity key indicative of a content of a data segment as well as associating a physical position on the storage medium for the data segment with the generated content similarity key; storing the association in deduplication index information; and using the stored associations for optimizing the deduplication.
    Type: Application
    Filed: May 20, 2014
    Publication date: December 4, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Roy D. Cideciyan, Jens Jelitto, Slavisa Sarafijanovic, Jan Stanek
  • Publication number: 20140358869
    Abstract: Provided are a system and method for accelerating a mapreduce operation. The system for accelerating a mapreduce operation includes at least one map node configured to perform a map operation in response to a map operation request of a master node, and at least one reduce node configured to perform a reduce operation using result data of the map operation. The map node includes at least one map operation accelerator configured to generate a data stream by merging a plurality of data blocks generated as results of the map operation and establish a transmission channel for transmission of the data stream, and the reduce node includes at least one reduce operation accelerator configured to receive the data stream from the map operation accelerator through the transmission channel, recover the plurality of data blocks from the received data stream, and provide the recovered data blocks for the reduce operation.
    Type: Application
    Filed: August 28, 2013
    Publication date: December 4, 2014
    Applicant: SAMSUNG SDS CO., LTD.
    Inventor: Jin Cheol KIM
  • Publication number: 20140358857
    Abstract: Migrating a sub-volume in data storage with at least two de-duplication domains, each of the domains having at least one sub-volume. A first sub-volume is assigned to a de-duplication domain and a first content summary is computed for the first sub-volume. Similarly, a second sub-volume is assigned to a second de-duplication domains and a second content summary is computed for the second sub-volume. A first content affinity is calculated between the first sub-volume and a third sub-volume, and a second content affinity is calculated between the second sub-volume and the third sub-volume. A domain placement is selected for the third sub-volume based on comparison of the first content affinity and the second content affinity.
    Type: Application
    Filed: September 9, 2013
    Publication date: December 4, 2014
    Applicant: International Business Machines Corporation
    Inventors: David D. Chambliss, Mihail C. Constantinescu, Joseph S. Glider, Bhushan P. Jain, Maohua Lu
  • Patent number: 8903764
    Abstract: Methods and systems for enhancing reliability in deduplication over storage clouds are provided. A method includes: determining a weight for each of a plurality of duplicate files based on parameters associated with a respective storage device of each of the plurality of duplicate files; and designating one of the plurality of duplicate files as a master copy based on the determined weight.
    Type: Grant
    Filed: April 25, 2012
    Date of Patent: December 2, 2014
    Assignee: International Business Machines Corporation
    Inventors: Sandeep R. Patil, Sri Ramanathan, Riyazahamad M. Shiraguppi, Prashant Sodhiya, Matthew B. Trevathan
  • Patent number: 8904120
    Abstract: A storage server is coupled to a storage device that stores data blocks, and generates a fingerprint for each data block stored on the storage device. The storage server creates a master datastore and a plurality of datastore segments. The master datastore comprises an entry for each data block that is written to the storage device and a datastore segment comprises an entry for a new data block or a modified data block that is subsequently written to the storage device. The storage server merges the entries in the datastore segments with the entries in the master datastore in memory to free duplicate data blocks in the storage device. The storage server overwrites the master datastore with the entries in the plurality of datastore segments and the entries in the master datastore to create an updated master datastore in response to detecting that the number of datastore segments meets a threshold.
    Type: Grant
    Filed: December 15, 2010
    Date of Patent: December 2, 2014
    Assignee: NetApp Inc.
    Inventors: Praveen Killamsetti, Subramaniam V. Periyagaram, Satbir Singh, Bipul Raj
  • Patent number: 8904128
    Abstract: For a restore request, at least a portion of a recipe that refers to chunks is read. Based on the recipe portion, a container having plural chunks is retrieved. From the recipe portion, it is identified which of the plural chunks of the container to save, where some of the chunks identified do not, at a time of the identifying, have to be presently communicated to a requester. The identified chunks are stored in a memory area from which chunks are read for the restore operation.
    Type: Grant
    Filed: June 8, 2011
    Date of Patent: December 2, 2014
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Mark David Lillibridge
  • Publication number: 20140351226
    Abstract: A distributed feature collection and correlation engine is provided, Feature extraction comprises obtaining one or more data records; extracting information from the one or more data records based on domain knowledge; transforming the extracted information into a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; and storing the key/value pair in a feature store database if the key/value pair does not already exist in the feature store database using a de-duplication mechanism. Features extracted from data records can be queried by obtaining a feature store database comprised of the extracted features stored as a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; receiving a query comprised of at least one query key; retrieving values from the feature store database that match the query key; and returning one or more retrieved key/value pairs.
    Type: Application
    Filed: May 22, 2013
    Publication date: November 27, 2014
    Applicant: International Business Machines Corporation
    Inventors: Mihai Christodorescu, Xin Hu, Douglas Lee Schales, Reiner Sailer, Marc P. Stoecklin, Ting Wang
  • Publication number: 20140351228
    Abstract: There are provided an answer evaluation means 501 that finds an answer content indicating how much an expression which would be an answer for a query is contained in a series of user's comments for the query contained in a set of queries which are response message candidate for a user's comment as characters string information indicating user's comment contents and which are character string information in a question form, and a query ranking means 502 that ranks each query in ascending order of answer content based on the answer content of each query in a user's comment found by the answer evaluation means 501.
    Type: Application
    Filed: August 14, 2012
    Publication date: November 27, 2014
    Inventor: Kosuke Yamamoto
  • Publication number: 20140351227
    Abstract: A distributed feature collection and correlation engine is provided, Feature extraction comprises obtaining one or more data records; extracting information from the one or more data records based on domain knowledge; transforming the extracted information into a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; and storing the key/value pair in a feature store database if the key/value pair does not already exist in the feature store database using a de-duplication mechanism. Features extracted from data records can be queried by obtaining a feature store database comprised of the extracted features stored as a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; receiving a query comprised of at least one query key; retrieving values from the feature store database that match the query key; and returning one or more retrieved key/value pairs.
    Type: Application
    Filed: August 15, 2013
    Publication date: November 27, 2014
    Applicant: International Business Machines Corporation
    Inventors: Mihai Christodorescu, Xin Hu, Douglas Lee Schales, Reiner Sailer, Marc P. Stoecklin, Ting Wang
  • Patent number: 8898412
    Abstract: A computer system is provided, the computer system having a processor and a system memory coupled to the processor. The computer system also includes a Basic Input/Output System (BIOS) in communication with the processor. The BIOS selectively scrubs the system memory during a shutdown process of the computer system.
    Type: Grant
    Filed: March 21, 2007
    Date of Patent: November 25, 2014
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Louis B. Hobson, Wael M. Ibrahim, Manuel Novoa
  • Patent number: 8898121
    Abstract: Provided are a computer program product, system, and method for merging entries in a deduplication index. An index has chunk signatures calculated from chunks of data in the data objects in the storage, wherein each index entry includes at least one of the chunk signatures and a reference to the chunk of data from which the signature was calculated. Entries in the index are selected to merge and a merge operation is performed on the chunk signatures in the selected entries to generate a merged signature. An entry is added to the index including the merged signature and a reference to the chunks in the storage referenced in the merged selected entries. The index of the signatures is used in deduplication operations when adding data objects to the storage.
    Type: Grant
    Filed: May 29, 2012
    Date of Patent: November 25, 2014
    Assignee: International Business Machines Corporation
    Inventors: Jonathan Amit, Corneliu M. Constantinescu, Joseph S. Glider, Shai I. Tahar
  • Patent number: 8898414
    Abstract: A storage device includes a data storage having first and second storage areas corresponding to different physical addresses. First data are stored in the first storage area. The storage device further includes a first memory that stores a reference count associated with the first data, and a controller that rearranges the first data from the first storage area to the second storage area in response to a change in the reference count of the first data.
    Type: Grant
    Filed: July 11, 2012
    Date of Patent: November 25, 2014
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hyun-Chul Park, Kyung-Ho Kim, Sang-Mok Kim, O-Tae Bae, Dong-Gi Lee, Jeong-Hoon Jeong
  • Patent number: 8898119
    Abstract: A storage server is coupled to a storage device that stores blocks of data, and generates a fingerprint for each data block stored on the storage device. The storage server creates a fingerprints datastore that is divided into a primary datastore and a secondary datastore. The primary datastore comprises a single entry for each unique fingerprint and the secondary datastore comprises an entry having an identical fingerprint as an entry in the primary datastore. The storage server merges entries in a changelog with the entries in the primary datastore to identify duplicate data blocks in the storage device and frees the identified duplicate data blocks in the storage device. The storage server stores the entries that correspond to the freed data blocks to a third datastore and overwrites the primary datastore with the entries from the merged data that correspond to the unique fingerprints to create an updated primary datastore.
    Type: Grant
    Filed: December 15, 2010
    Date of Patent: November 25, 2014
    Assignee: NetApp, Inc.
    Inventors: Alok Sharma, Praveen Killamsetti, Satbir Singh
  • Patent number: 8898120
    Abstract: A computer-implemented method for distributed data deduplication may include (1) identifying a deduplicated data system, the deduplicated data system include a plurality of nodes, wherein each node within the plurality of nodes is configured to deduplicate data stored on the node, (2) identifying a data object to store within the deduplicated data system, (3) generating a similarity hash of the data object, the similarity hash representing a probabilistic dimension-reduction of the data object, (4) selecting, based at least in part on the similarity hash, a target node from the plurality nodes on which to store the data object, and then (5) routing the data object for storage on the target node based on the selection of the target node. Various other methods, systems, and computer-readable media are also disclosed.
    Type: Grant
    Filed: October 9, 2011
    Date of Patent: November 25, 2014
    Assignee: Symantec Corporation
    Inventor: Petros Efstathopoulos
  • Publication number: 20140344227
    Abstract: A computing system includes a plurality of dispersed storage (DS) processing units operable to receive a continuous data stream, simultaneously disperse storage error encode the continuous data stream to produce a plurality of encoded data slices and store the plurality of encoded data slices in a DS memory.
    Type: Application
    Filed: August 1, 2014
    Publication date: November 20, 2014
    Applicant: CLEVERSAFE, INC.
    Inventors: Gary W. Grube, Timothy W. Markison, Jason K. Resch
  • Publication number: 20140344229
    Abstract: A method includes receiving information about a plurality of data chunks and determining if one or more of a plurality of back-end nodes already stores more than a threshold amount of the plurality of data chunks where one of the plurality of back-end nodes is designated as a sticky node. The method further includes, responsive to determining that none of the plurality of back-end nodes already stores more than a threshold amount of the plurality of data chunks, deduplicating the plurality of data chunks against the back-end node designated as the sticky node. Finally, the method includes, responsive to an amount of data being processed, designating a different back-end node as the sticky node.
    Type: Application
    Filed: February 2, 2012
    Publication date: November 20, 2014
    Inventors: Mark D. Lillibridge, Kave Eshghi, Mark R. Watkins
  • Patent number: 8892526
    Abstract: Apparatus, methods, and other embodiments associated with de-duplication seeding are described. One example method includes re-configuring a data de-duplication repository with a blocklet from a data de-duplication seed corpus. Reconfiguring the repository may include adding a blocklet from the seed corpus to the repository, activating a blocklet identified with the seed corpus in the repository, removing a blocklet from the repository, and de-activating a blocklet in the repository. The example method may also include re-configuring a data de-duplication index associated with the data de-duplication repository with information about the blocklet. Reconfiguring the repository and the index increases the likelihood that a blocklet ingested by a data de-duplication apparatus that relies on the repository and the index will be treated as a duplicate blocklet by the data de-duplication apparatus.
    Type: Grant
    Filed: January 11, 2012
    Date of Patent: November 18, 2014
    Inventor: Timothy Stoakes
  • Patent number: 8892521
    Abstract: A method includes receiving a request to save a first file as immutable. The method also includes searching for a second file that is saved and is redundant to the first file. The method further includes determining the second file is one of mutable and immutable. When the second file is mutable, the method includes saving the first file as a master copy, and replacing the second file with a soft link pointing to the master copy. When the second file is immutable, the method includes determining which of the first and second files has a later expiration date and an earlier expiration date, saving the one of the first and second files with the later expiration date as a master copy, and replacing the one of the first and second files with the earlier expiration date with a soft link pointing to the master copy.
    Type: Grant
    Filed: May 10, 2013
    Date of Patent: November 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Gaurav Chhaunker, Bhushan P. Jain, Sandeep R. Patil, Sri Ramanathan, Matthew B. Trevathan
  • Patent number: 8892528
    Abstract: Mechanisms are provided for accelerated data deduplication. A data stream is received an input interface and maintained in memory. Chunk boundaries are detected and chunk fingerprints are calculated using a deduplication accelerator while a processor maintains a state machine. A deduplication dictionary is accessed using a chunk fingerprint to determine if the associated data chunk has previously been written to persistent memory. If the data chunk has previously been written, reference counts may be updated but the data chunk need not be stored again. Otherwise, datastore suitcases, filemaps, and the deduplication dictionary may be updated to reflect storage of the data chunk. Direct memory access (DMA) addresses are provided to directly transfer a chunk to an output interface as needed.
    Type: Grant
    Filed: August 26, 2013
    Date of Patent: November 18, 2014
    Assignee: Dell Products L.P.
    Inventors: Goutham Rao, Vinod Jayaraman
  • Patent number: 8892527
    Abstract: A method and system for eliminating the redundant allocation and deallocation of special data on disk, wherein the redundant allocation and deallocation of special data on disk is eliminated by providing an innovate technique for specially allocating special data of a storage system. Specially allocated data is data that is pre-allocated on disk and stored in memory of the storage system. “Special data” may include any pre-decided data, one or more portions of data that exceed a pre-defined sharing threshold, and/or one or more portions of data that have been identified by a user as special. For example, in some embodiments, a zero-filled data block is specially allocated by a storage system. As another example, in some embodiments, a data block whose contents correspond to a particular type document header is specially allocated.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: November 18, 2014
    Assignee: NetApp, Inc.
    Inventors: Sandeep Yadav, Subramanian Periyagaram
  • Patent number: 8892529
    Abstract: In embodiments of the present invention, when a duplicate data query is performed on a received data stream, a first physical node which corresponds to each first sketch value and is in a cluster system is identified according to a first sketch value representing the data stream, and then the first sketch value representing the data stream is sent to the identified physical node for the duplicate data query, and a procedure of the duplicate data query does not change with an increase of the number of nodes in the cluster system; therefore, a calculation amount of each node does not increase with an increase of the number of nodes in the cluster system.
    Type: Grant
    Filed: December 24, 2013
    Date of Patent: November 18, 2014
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Qiang Liu, Quancheng Sun, Xiaobo Liu, Jun You, Huadi Yang, Dan Zhou, Yan Huang
  • Publication number: 20140337299
    Abstract: A method, a system, an apparatus, and a computer readable medium for transmission of data across a network are disclosed.
    Type: Application
    Filed: July 28, 2014
    Publication date: November 13, 2014
    Inventors: David G. Therrien, David Andrew Thompson
  • Patent number: 8886613
    Abstract: An example method includes controlling a data de-duplication apparatus to arrange a de-duplication schedule based on the presence or absence of a replication indicator in an item to be de-duplicated. The method also includes selectively controlling the de-duplication schedule based on a replication priority. In one embodiment, the method includes, upon determining that a chunk of data is associated with a replication indicator, controlling the data de-duplication apparatus to schedule the chunk for de-duplication ahead of chunks not associated with a replication indicator. In one embodiment, the method also includes, upon determining that the chunk is associated with a replication priority, controlling the data de-duplication apparatus to schedule the chunk for de-duplication ahead of chunks of data not associated with a replication priority. The schedule location is based, at least in part, on the replication priority. The method also includes controlling de-duplication order based on the schedule.
    Type: Grant
    Filed: October 12, 2010
    Date of Patent: November 11, 2014
    Inventor: Don Doerner
  • Publication number: 20140330794
    Abstract: The various implementations of the present invention are provided as a computer-based system for content scoring. Content from a variety of source feeds may be considered for inclusion in an aggregated feed, based on the content of the source feed. The content of the source feed may be “scored” according to a variety of user-configurable options, thereby identifying the most valuable content from the source feeds for inclusion in the aggregated feed. For example, certain content elements may be extracted from a variety of source feeds and then combined to create an aggregated feed where the aggregated feed contains only the highest scoring elements, as determined by the feed creator, from the various source feeds are used to create the aggregated feed.
    Type: Application
    Filed: July 17, 2014
    Publication date: November 6, 2014
    Applicant: PARLANT TECHNOLOGY, INC.
    Inventors: Dane Dellenbach, Bruce Hassler, Jacob Hutchings, Carson Anderson
  • Publication number: 20140330795
    Abstract: A computer identifies a plurality of data retrieval requests that may be serviced using a plurality of unique data chunks. The computer services the data retrieval requests by utilizing at least one of the unique data chunks. At least one of the unique data chunks is utilized for servicing two or more of the data retrieval requests. The computer determines a servicing sequence for the plurality of data retrieval requests such that the two or more of the data retrieval requests that are serviced utilizing the at least one of the unique data chunks are serviced consecutively. The computer services the plurality of data retrieval requests according to the servicing sequence.
    Type: Application
    Filed: July 18, 2014
    Publication date: November 6, 2014
    Inventors: Kavita Chavda, Nagapramod S. Mandagere, Ramani R. Routray, Pin Zhou
  • Publication number: 20140330793
    Abstract: A system for managing a storage system comprises a processor and a memory. The processor is configured to receive storage system information from a deduplicating storage system. The processor is further configured to determine a capacity forecast based at least in part on the storage system information. The processor is further configured to provide a compression forecast. The memory is coupled to the processor and configured to provide the processor with instructions.
    Type: Application
    Filed: May 1, 2014
    Publication date: November 6, 2014
    Applicant: EMC CORPORATION
    Inventor: Mark Chamness
  • Patent number: 8880482
    Abstract: Various embodiments for replicating deduplicated data using a processor device are provided. A block of the deduplicated data, created in a source repository, is assigned a global block identifier (ID) unique in a grid set inclusive of the source repository. The global block ID is generated using at least one unique identification value of the block, a containing grid of the grid set, and the source repository. The global block ID is transmitted from the source repository to a target repository. If the target repository determines the global block ID is associated with an existing block of the deduplicated data located within the target repository, the block is not transmitted to the target repository during a subsequent replication process.
    Type: Grant
    Filed: January 2, 2013
    Date of Patent: November 4, 2014
    Assignee: International Business Machines Corporation
    Inventors: Shay H. Akirav, Lior Aronovich, Ron Asher, Yariv Bachar, Ariel J. Ish-Shalom, Ofer Leneman
  • Patent number: 8880476
    Abstract: A mechanism is provided in a data processing system for reliable asynchronous solid-state device based de-duplication. Responsive to receiving a write request to write data to the file system, the mechanism sends the write request to the file system, and in parallel, computes a hash key for the write data. The mechanism looks up the hash key in a de-duplication table. The de-duplication table is stored in a memory or a solid-state storage device. Responsive to the hash key not existing in the de-duplication table, the mechanism writes the write data to a storage device, writes a journal transaction comprising the hash key, and updates the de-duplication table to reference the write data in the storage device.
    Type: Grant
    Filed: June 28, 2012
    Date of Patent: November 4, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ranjit M. Noronha, Ajay K. Singh
  • Patent number: 8880481
    Abstract: Inverse distribution operations are performed on a large distributed parallel database comprising a plurality of distributed data segments to determine a data value at a predetermined percentile of a sorted dataset formed on one segment. Data elements from across the segments may be first grouped, either by partitioning keys or by hashing, the groups are sorted into a predetermined order, and data values corresponding to the desired percentile are picked up at a row location of the corresponding data element of each group. For a global dataset that is spread across the database segments, a local sort of data elements is performed on each segment, and the data elements from the local sorts are streamed in overall sorted order to one segment to form the sorted dataset.
    Type: Grant
    Filed: March 29, 2012
    Date of Patent: November 4, 2014
    Assignee: Pivotal Software, Inc.
    Inventors: Hitoshi Harada, Caleb E. Welton, Gavin Sherry
  • Patent number: 8880469
    Abstract: A system for storing data comprises a performance storage unit and a performance segment storage unit. The system further comprises a determiner. The determiner determines whether a requested data is stored in the performance storage unit. The determiner determines whether the requested data is stored in the performance segment storage unit in the event that the requested data is not stored in the performance storage unit.
    Type: Grant
    Filed: April 18, 2013
    Date of Patent: November 4, 2014
    Assignee: EMC Corporation
    Inventor: R. Hugo Patterson
  • Publication number: 20140324791
    Abstract: A method for copying data efficiently within a deduplicating storage system eliminates the need to read or write the data per se within the storage system. The copying is accomplished by creating duplicates of the metadata block pointers only. The result is a process that creates and arbitrary number of copies using minimal time and bandwidth.
    Type: Application
    Filed: June 21, 2013
    Publication date: October 30, 2014
    Inventor: Robert Petrocelli
  • Publication number: 20140324793
    Abstract: A computer-implemented method for layered storage of enterprise data comprises receiving from one or more virtual machines data blocks; de-duplicating the data blocks per hypervisor; storing de-duplicated data blocks in a local cache memory; time-based grouping the data blocks into data containers; dividing each data container in X fixed length mega-blocks; for each data container applying erasure encoding to the X fixed length mega-blocks to thereby generate Y fixed length mega-blocks with redundant data, Y being larger than X; and distributed storing the Y fixed length mega-blocks across multiple backend storage systems.
    Type: Application
    Filed: April 8, 2014
    Publication date: October 30, 2014
    Applicant: CLOUDFOUNDERS NV
    Inventor: Kurt GLAZEMAKERS
  • Publication number: 20140324796
    Abstract: A system for directing for storage comprises a processor and a memory. The processor is configured to determine a segment overlap for each of a plurality of nodes. The processor is further configured to determine a selected node of the plurality of nodes based at least in part on the segment overlap for each of the plurality of nodes and based at least in part on a selection criteria. The memory is coupled to the processor and configured to provide the processor with instructions.
    Type: Application
    Filed: May 1, 2014
    Publication date: October 30, 2014
    Applicant: EMC CORPORATION
    Inventors: Frederick Douglis, Philip Shilane, R. Hugo Patterson
  • Publication number: 20140324795
    Abstract: Methods and systems for data management are disclosed. With embodiments of the present disclosure, data files originating from the same source data can be de-duplicated. One such method comprises calculating one or more of a first characteristic value for first data in a first format, and one or more second characteristic values for one or more data in one or more second formats into which the first data can be converted, said characteristic value uniquely representing an arrangement characteristic of at least part of bits of data in a particular format. The method also includes storing one of the first data and the second data in response to one of the calculated characteristic values being the same as a stored characteristic value corresponding to a second data.
    Type: Application
    Filed: April 28, 2014
    Publication date: October 30, 2014
    Applicant: International Business Machines Corporation
    Inventors: Peng Hui Jiang, Pi Jun Jiang, Xi Ning Wang, Liang Xue, Wen Yin
  • Publication number: 20140324798
    Abstract: A process that ensures the virtual destruction of data files a user wishes to erase from a storage medium, such as a hard drive, flash drive, or removable disk. This approach is appropriate for managing custom distributions from a large file sets as it is roughly linear in compute complexity to the number of files erased but is capped when many files are batch erased.
    Type: Application
    Filed: July 14, 2014
    Publication date: October 30, 2014
    Inventor: Alan Joshua Shapiro
  • Publication number: 20140324788
    Abstract: A cleaning application that can monitor one or more browser applications that are executed on a computer, and that can, for at least one browser application, clean at least one of one or more files or a registry associated with the at least one browser application is provided. The cleaning application can include a cleaning module. The cleaning module can monitor one or more browser applications that are executed on a computer. The cleaning module can further detect a closing of at least one browser application. The cleaning module can further perform a pre-defined action in response to the closing of the at least one browser application. The pre-defined action can include cleaning at least one of one or more files or a registry associated with the at least one browser application.
    Type: Application
    Filed: April 24, 2013
    Publication date: October 30, 2014
    Applicant: Piriform Ltd.
    Inventor: Guy SANER
  • Publication number: 20140324790
    Abstract: Techniques for avoiding duplicate comparisons while comparing customer records to identify linked customer records pertaining to a single customer entity are provided. The techniques include the computer system comparing a first electronic customer record with a second electronic customer record to determine if the first electronic customer record and the second electronic customer record pertain to a single customer entity if the computer system identifies a common blocker key corresponding to a selected blocker from a data field in the first electronic customer record and from a data field in the second electronic customer record and if the computer system does not identify a common blocker key corresponding to an additional lower order blocker from another data field in the first electronic customer record and from a data field in the second electronic customer record.
    Type: Application
    Filed: April 26, 2013
    Publication date: October 30, 2014
    Applicant: Wal-Mart Stores, Inc.
    Inventors: Andrew Benjamin Ray, Nathaniel Philip Troutman
  • Publication number: 20140324797
    Abstract: The invention provides a display interface in a social networking system that enables the presentation of information related to a user in a timeline or map view. The system accesses information about a user of a social networking system, including both data about the user and social network activities related to the user. The system then selects one or more of these pieces of data and/or activities from a certain time period and gathers them into timeline units based on their relatedness and their relevance to users. These timeline units are ranked by relevance to the user, and are used to generate a timeline or map view for the user containing visual representations of the timeline units organized by location or time. The timeline or map view is then provided to other users of the social networking system that wish to view information about the user.
    Type: Application
    Filed: July 9, 2014
    Publication date: October 30, 2014
    Inventors: Raylene Kay Yung, Ryan Case, Jeff Huang, Samuel Lessin, Ryan David Mack, Paul M. McDonald, Serkan Piantino, Arun Vijayvergiya, Joshua Wiseman, Steven Young, Mark E. Zuckerberg
  • Publication number: 20140324794
    Abstract: Methods are provided for clustering events. Data is received at an extraction engine from managed infrastructure. Events are converted into alerts and the alerts mapped to a matrix M. One or more common steps are determined from the events and clusters of events are produced relating to the alerts and or events.
    Type: Application
    Filed: April 28, 2014
    Publication date: October 30, 2014
    Applicant: Moogsoft, Inc.
    Inventors: Philip Tee, Robert Duncan Harper, Charles Mike Silvey
  • Publication number: 20140324789
    Abstract: A cleaning application that can monitor one or more characteristics of a computer, and that can clean at least one of one or more files or a registry of the computer, is provided. The cleaning application can include a cleaning module. The cleaning module can monitor one or more characteristics of the computer. The cleaning module can further detect an occurrence of pre-defined criteria involving the one or more characteristics. The cleaning module can further perform a pre-defined action in response to the pre-defined criteria. The pre-defined action can include cleaning at least one of one or more files or a registry associated with the computer.
    Type: Application
    Filed: April 24, 2013
    Publication date: October 30, 2014
    Applicant: Piriform Ltd.
    Inventor: Guy SANER
  • Publication number: 20140324792
    Abstract: Embodiments of the present invention relate to extraction of a social graph from contact information across a confined user base. Users are typically subscribed to a service that backs up data from end-user devices to a cloud. The data includes contacts from mobile address books. The service is able to determine relationships of contacts in the cloud to build a social graph or map of these contacts. The social graph can be used to drive individual and group analytics to, for example, increase membership and provide value-added features to its service members.
    Type: Application
    Filed: March 24, 2014
    Publication date: October 30, 2014
    Applicant: Synchronoss Technologies, Inc.
    Inventors: Omar Chaudhry, Andrew Fuller
  • Patent number: 8874863
    Abstract: Systems and methods are provided for an asynchronous data replication system in which the remote replication reduces bandwidth requirements by copying deduplicated differences in business data from a local storage site to a remote, backup storage site, the system comprising: a local performance storage pool for storing data; a local deduplicating storage pool for storing deduplicated data, said local deduplicating storage pool further storing metadata about data objects in the system and which has metadata analysis logic for identifying and specifying differences in a data object over time; a remote performance storage pool for storing a copy of said data, available for immediate use as a backup copy of said data to provide business continuity to said data; a remote deduplicating storage pool for storing deduplicated data; and a controller for synchronizing the remote performance storage pool to have the second version of the data object using deduplicated data.
    Type: Grant
    Filed: August 1, 2012
    Date of Patent: October 28, 2014
    Assignee: Actifio, Inc.
    Inventors: Madhav Mutalik, Christopher A. Provenzano, Philip J. Abercrombie
  • Patent number: 8874602
    Abstract: A random number generation process generated uncorrelated random numbers from identical random number sequences on parallel processing database segments of an MPP database without communications between the segments by establishing a different starting position in the sequence on each segment using an identifier that is unique to each segment, query slice information and the number of segments. A master node dispatches a seed value to initialize the random number sequence generation on all segments, and dispatches the query slice information and information as to the number of segments during a normal query plan dispatch process.
    Type: Grant
    Filed: September 29, 2012
    Date of Patent: October 28, 2014
    Assignee: Pivotal Software, Inc.
    Inventors: Hitoshi Harada, Caleb Welton, Florian Schoppmann
  • Patent number: 8874520
    Abstract: A system and method for caching fingerprints in a client cache is provided. A data object that comprises a set of data segments and describes a backup process is identified. Thereafter, a request referencing the data object is made to a deduplication server to request that a task identifier be added to the data object. If the deduplication server is able to successfully add the task identifier to the data object, then an active identifier is added to each data segment from the set of data segments in a cache that is within a client system.
    Type: Grant
    Filed: February 11, 2011
    Date of Patent: October 28, 2014
    Assignee: Symantec Corporation
    Inventors: Xianbo Zhang, Thomas Hartnett, Weibao Wu