Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
-
Patent number: 8908911Abstract: Systems and methods are described herein for identifying and filtering redundant database entries associated with a visual search system. An example of a method of managing a database associated with a mobile device described herein includes identifying a captured image; obtaining an external database record from an external database corresponding to an object identified from the captured image; comparing the external database record to a locally stored database record; and locally discarding one of the external database record or the locally stored database record if the comparing indicates overlap between the external database record and the locally stored database record.Type: GrantFiled: September 30, 2011Date of Patent: December 9, 2014Assignee: QUALCOMM IncorporatedInventors: Charles Wheeler Sweet, III, Prince Gupta
-
Publication number: 20140358868Abstract: The program code assigns a first record to a first object having a first life cycle and a second record to a second object having a second life cycle, wherein the first object is associated to the second object, and wherein the assigning is based on configurable predefined rules. In response to receiving a request to perform a delete action on at least one of the first object and the second object, performing the delete action when the at least one of the first object and the second object has a life cycle that is in a destroy phase.Type: ApplicationFiled: June 4, 2013Publication date: December 4, 2014Inventors: Jean-Marc Costecalde, Kevin N. Trinh
-
Publication number: 20140358873Abstract: A method performed in a system that has a plurality of volumes stored to storage hardware, the method including generating, for each of the volumes, a respective space saving potential iteratively over time and scheduling space saving operations among the plurality of volumes by analyzing each of the volumes for space saving potential and assigning priority of resources based at least in part on space saving potential.Type: ApplicationFiled: August 14, 2014Publication date: December 4, 2014Inventors: Vinod Kumar Daga, Craig Anthony Johnston, Ling Zheng
-
Publication number: 20140358870Abstract: Assignment of files to a de-duplication domain. Address space of data files is divided into multiple containers. For each of the containers, a file metadata scan is performed to obtain file system metadata, which is aggregated and summarized in a content feature summary. A content feature summary prediction measurement is measured between containers from the generated content feature summary, and files from each container are assigned to a de-duplication domain based upon the content similarity predication measurement.Type: ApplicationFiled: September 3, 2013Publication date: December 4, 2014Applicant: International Business Machines CorporationInventors: David D. Chambliss, Mihail C. Constantinescu, Joseph S. Glider, Maohua Lu
-
Publication number: 20140358872Abstract: Provided is a method for performing deduplication in conjunction with a host device and a storage device, and a storage system therefor. The host device includes a brief examination device which is configured to briefly examine whether data to be stored is duplicated or not based on a hash value of the data to be stored, and a data transmission device which is configured to transmit the data to be stored with an examination request or a data storage request to the at least one storage device according to a result of the examination.Type: ApplicationFiled: May 29, 2014Publication date: December 4, 2014Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Hyun-jung SHIN, Ju-Pyung LEE
-
Publication number: 20140358867Abstract: Assignment of files to a de-duplication domain. Address space of data files is divided into multiple containers. For each of the containers, a file metadata scan is performed to obtain file system metadata, which is aggregated and summarized in a content feature summary. A content feature summary prediction measurement is measured between containers from the generated content feature summary, and files from each container are assigned to a de-duplication domain based upon the content similarity predication measurement.Type: ApplicationFiled: June 3, 2013Publication date: December 4, 2014Inventors: David D. Chambliss, Mihail C. Constantinescu, Joseph S. Glider, Maohua Lu
-
Publication number: 20140358871Abstract: A method and system for deduplication of data to be stored on a storage system. A deduplication system performs a method that includes the steps of: segmenting a storage object into a plurality of data segments; generating a content similarity key indicative of a content of a data segment as well as associating a physical position on the storage medium for the data segment with the generated content similarity key; storing the association in deduplication index information; and using the stored associations for optimizing the deduplication.Type: ApplicationFiled: May 20, 2014Publication date: December 4, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Roy D. Cideciyan, Jens Jelitto, Slavisa Sarafijanovic, Jan Stanek
-
Publication number: 20140358869Abstract: Provided are a system and method for accelerating a mapreduce operation. The system for accelerating a mapreduce operation includes at least one map node configured to perform a map operation in response to a map operation request of a master node, and at least one reduce node configured to perform a reduce operation using result data of the map operation. The map node includes at least one map operation accelerator configured to generate a data stream by merging a plurality of data blocks generated as results of the map operation and establish a transmission channel for transmission of the data stream, and the reduce node includes at least one reduce operation accelerator configured to receive the data stream from the map operation accelerator through the transmission channel, recover the plurality of data blocks from the received data stream, and provide the recovered data blocks for the reduce operation.Type: ApplicationFiled: August 28, 2013Publication date: December 4, 2014Applicant: SAMSUNG SDS CO., LTD.Inventor: Jin Cheol KIM
-
Publication number: 20140358857Abstract: Migrating a sub-volume in data storage with at least two de-duplication domains, each of the domains having at least one sub-volume. A first sub-volume is assigned to a de-duplication domain and a first content summary is computed for the first sub-volume. Similarly, a second sub-volume is assigned to a second de-duplication domains and a second content summary is computed for the second sub-volume. A first content affinity is calculated between the first sub-volume and a third sub-volume, and a second content affinity is calculated between the second sub-volume and the third sub-volume. A domain placement is selected for the third sub-volume based on comparison of the first content affinity and the second content affinity.Type: ApplicationFiled: September 9, 2013Publication date: December 4, 2014Applicant: International Business Machines CorporationInventors: David D. Chambliss, Mihail C. Constantinescu, Joseph S. Glider, Bhushan P. Jain, Maohua Lu
-
Patent number: 8903764Abstract: Methods and systems for enhancing reliability in deduplication over storage clouds are provided. A method includes: determining a weight for each of a plurality of duplicate files based on parameters associated with a respective storage device of each of the plurality of duplicate files; and designating one of the plurality of duplicate files as a master copy based on the determined weight.Type: GrantFiled: April 25, 2012Date of Patent: December 2, 2014Assignee: International Business Machines CorporationInventors: Sandeep R. Patil, Sri Ramanathan, Riyazahamad M. Shiraguppi, Prashant Sodhiya, Matthew B. Trevathan
-
Patent number: 8904120Abstract: A storage server is coupled to a storage device that stores data blocks, and generates a fingerprint for each data block stored on the storage device. The storage server creates a master datastore and a plurality of datastore segments. The master datastore comprises an entry for each data block that is written to the storage device and a datastore segment comprises an entry for a new data block or a modified data block that is subsequently written to the storage device. The storage server merges the entries in the datastore segments with the entries in the master datastore in memory to free duplicate data blocks in the storage device. The storage server overwrites the master datastore with the entries in the plurality of datastore segments and the entries in the master datastore to create an updated master datastore in response to detecting that the number of datastore segments meets a threshold.Type: GrantFiled: December 15, 2010Date of Patent: December 2, 2014Assignee: NetApp Inc.Inventors: Praveen Killamsetti, Subramaniam V. Periyagaram, Satbir Singh, Bipul Raj
-
Patent number: 8904128Abstract: For a restore request, at least a portion of a recipe that refers to chunks is read. Based on the recipe portion, a container having plural chunks is retrieved. From the recipe portion, it is identified which of the plural chunks of the container to save, where some of the chunks identified do not, at a time of the identifying, have to be presently communicated to a requester. The identified chunks are stored in a memory area from which chunks are read for the restore operation.Type: GrantFiled: June 8, 2011Date of Patent: December 2, 2014Assignee: Hewlett-Packard Development Company, L.P.Inventor: Mark David Lillibridge
-
Publication number: 20140351226Abstract: A distributed feature collection and correlation engine is provided, Feature extraction comprises obtaining one or more data records; extracting information from the one or more data records based on domain knowledge; transforming the extracted information into a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; and storing the key/value pair in a feature store database if the key/value pair does not already exist in the feature store database using a de-duplication mechanism. Features extracted from data records can be queried by obtaining a feature store database comprised of the extracted features stored as a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; receiving a query comprised of at least one query key; retrieving values from the feature store database that match the query key; and returning one or more retrieved key/value pairs.Type: ApplicationFiled: May 22, 2013Publication date: November 27, 2014Applicant: International Business Machines CorporationInventors: Mihai Christodorescu, Xin Hu, Douglas Lee Schales, Reiner Sailer, Marc P. Stoecklin, Ting Wang
-
Publication number: 20140351228Abstract: There are provided an answer evaluation means 501 that finds an answer content indicating how much an expression which would be an answer for a query is contained in a series of user's comments for the query contained in a set of queries which are response message candidate for a user's comment as characters string information indicating user's comment contents and which are character string information in a question form, and a query ranking means 502 that ranks each query in ascending order of answer content based on the answer content of each query in a user's comment found by the answer evaluation means 501.Type: ApplicationFiled: August 14, 2012Publication date: November 27, 2014Inventor: Kosuke Yamamoto
-
Publication number: 20140351227Abstract: A distributed feature collection and correlation engine is provided, Feature extraction comprises obtaining one or more data records; extracting information from the one or more data records based on domain knowledge; transforming the extracted information into a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; and storing the key/value pair in a feature store database if the key/value pair does not already exist in the feature store database using a de-duplication mechanism. Features extracted from data records can be queried by obtaining a feature store database comprised of the extracted features stored as a key/value pair comprised of a key K and a value V, wherein the key comprises a feature identifier; receiving a query comprised of at least one query key; retrieving values from the feature store database that match the query key; and returning one or more retrieved key/value pairs.Type: ApplicationFiled: August 15, 2013Publication date: November 27, 2014Applicant: International Business Machines CorporationInventors: Mihai Christodorescu, Xin Hu, Douglas Lee Schales, Reiner Sailer, Marc P. Stoecklin, Ting Wang
-
Patent number: 8898412Abstract: A computer system is provided, the computer system having a processor and a system memory coupled to the processor. The computer system also includes a Basic Input/Output System (BIOS) in communication with the processor. The BIOS selectively scrubs the system memory during a shutdown process of the computer system.Type: GrantFiled: March 21, 2007Date of Patent: November 25, 2014Assignee: Hewlett-Packard Development Company, L.P.Inventors: Louis B. Hobson, Wael M. Ibrahim, Manuel Novoa
-
Patent number: 8898121Abstract: Provided are a computer program product, system, and method for merging entries in a deduplication index. An index has chunk signatures calculated from chunks of data in the data objects in the storage, wherein each index entry includes at least one of the chunk signatures and a reference to the chunk of data from which the signature was calculated. Entries in the index are selected to merge and a merge operation is performed on the chunk signatures in the selected entries to generate a merged signature. An entry is added to the index including the merged signature and a reference to the chunks in the storage referenced in the merged selected entries. The index of the signatures is used in deduplication operations when adding data objects to the storage.Type: GrantFiled: May 29, 2012Date of Patent: November 25, 2014Assignee: International Business Machines CorporationInventors: Jonathan Amit, Corneliu M. Constantinescu, Joseph S. Glider, Shai I. Tahar
-
Patent number: 8898414Abstract: A storage device includes a data storage having first and second storage areas corresponding to different physical addresses. First data are stored in the first storage area. The storage device further includes a first memory that stores a reference count associated with the first data, and a controller that rearranges the first data from the first storage area to the second storage area in response to a change in the reference count of the first data.Type: GrantFiled: July 11, 2012Date of Patent: November 25, 2014Assignee: Samsung Electronics Co., Ltd.Inventors: Hyun-Chul Park, Kyung-Ho Kim, Sang-Mok Kim, O-Tae Bae, Dong-Gi Lee, Jeong-Hoon Jeong
-
Patent number: 8898119Abstract: A storage server is coupled to a storage device that stores blocks of data, and generates a fingerprint for each data block stored on the storage device. The storage server creates a fingerprints datastore that is divided into a primary datastore and a secondary datastore. The primary datastore comprises a single entry for each unique fingerprint and the secondary datastore comprises an entry having an identical fingerprint as an entry in the primary datastore. The storage server merges entries in a changelog with the entries in the primary datastore to identify duplicate data blocks in the storage device and frees the identified duplicate data blocks in the storage device. The storage server stores the entries that correspond to the freed data blocks to a third datastore and overwrites the primary datastore with the entries from the merged data that correspond to the unique fingerprints to create an updated primary datastore.Type: GrantFiled: December 15, 2010Date of Patent: November 25, 2014Assignee: NetApp, Inc.Inventors: Alok Sharma, Praveen Killamsetti, Satbir Singh
-
Patent number: 8898120Abstract: A computer-implemented method for distributed data deduplication may include (1) identifying a deduplicated data system, the deduplicated data system include a plurality of nodes, wherein each node within the plurality of nodes is configured to deduplicate data stored on the node, (2) identifying a data object to store within the deduplicated data system, (3) generating a similarity hash of the data object, the similarity hash representing a probabilistic dimension-reduction of the data object, (4) selecting, based at least in part on the similarity hash, a target node from the plurality nodes on which to store the data object, and then (5) routing the data object for storage on the target node based on the selection of the target node. Various other methods, systems, and computer-readable media are also disclosed.Type: GrantFiled: October 9, 2011Date of Patent: November 25, 2014Assignee: Symantec CorporationInventor: Petros Efstathopoulos
-
Publication number: 20140344227Abstract: A computing system includes a plurality of dispersed storage (DS) processing units operable to receive a continuous data stream, simultaneously disperse storage error encode the continuous data stream to produce a plurality of encoded data slices and store the plurality of encoded data slices in a DS memory.Type: ApplicationFiled: August 1, 2014Publication date: November 20, 2014Applicant: CLEVERSAFE, INC.Inventors: Gary W. Grube, Timothy W. Markison, Jason K. Resch
-
Publication number: 20140344229Abstract: A method includes receiving information about a plurality of data chunks and determining if one or more of a plurality of back-end nodes already stores more than a threshold amount of the plurality of data chunks where one of the plurality of back-end nodes is designated as a sticky node. The method further includes, responsive to determining that none of the plurality of back-end nodes already stores more than a threshold amount of the plurality of data chunks, deduplicating the plurality of data chunks against the back-end node designated as the sticky node. Finally, the method includes, responsive to an amount of data being processed, designating a different back-end node as the sticky node.Type: ApplicationFiled: February 2, 2012Publication date: November 20, 2014Inventors: Mark D. Lillibridge, Kave Eshghi, Mark R. Watkins
-
Patent number: 8892526Abstract: Apparatus, methods, and other embodiments associated with de-duplication seeding are described. One example method includes re-configuring a data de-duplication repository with a blocklet from a data de-duplication seed corpus. Reconfiguring the repository may include adding a blocklet from the seed corpus to the repository, activating a blocklet identified with the seed corpus in the repository, removing a blocklet from the repository, and de-activating a blocklet in the repository. The example method may also include re-configuring a data de-duplication index associated with the data de-duplication repository with information about the blocklet. Reconfiguring the repository and the index increases the likelihood that a blocklet ingested by a data de-duplication apparatus that relies on the repository and the index will be treated as a duplicate blocklet by the data de-duplication apparatus.Type: GrantFiled: January 11, 2012Date of Patent: November 18, 2014Inventor: Timothy Stoakes
-
Patent number: 8892521Abstract: A method includes receiving a request to save a first file as immutable. The method also includes searching for a second file that is saved and is redundant to the first file. The method further includes determining the second file is one of mutable and immutable. When the second file is mutable, the method includes saving the first file as a master copy, and replacing the second file with a soft link pointing to the master copy. When the second file is immutable, the method includes determining which of the first and second files has a later expiration date and an earlier expiration date, saving the one of the first and second files with the later expiration date as a master copy, and replacing the one of the first and second files with the earlier expiration date with a soft link pointing to the master copy.Type: GrantFiled: May 10, 2013Date of Patent: November 18, 2014Assignee: International Business Machines CorporationInventors: Gaurav Chhaunker, Bhushan P. Jain, Sandeep R. Patil, Sri Ramanathan, Matthew B. Trevathan
-
Patent number: 8892528Abstract: Mechanisms are provided for accelerated data deduplication. A data stream is received an input interface and maintained in memory. Chunk boundaries are detected and chunk fingerprints are calculated using a deduplication accelerator while a processor maintains a state machine. A deduplication dictionary is accessed using a chunk fingerprint to determine if the associated data chunk has previously been written to persistent memory. If the data chunk has previously been written, reference counts may be updated but the data chunk need not be stored again. Otherwise, datastore suitcases, filemaps, and the deduplication dictionary may be updated to reflect storage of the data chunk. Direct memory access (DMA) addresses are provided to directly transfer a chunk to an output interface as needed.Type: GrantFiled: August 26, 2013Date of Patent: November 18, 2014Assignee: Dell Products L.P.Inventors: Goutham Rao, Vinod Jayaraman
-
Patent number: 8892527Abstract: A method and system for eliminating the redundant allocation and deallocation of special data on disk, wherein the redundant allocation and deallocation of special data on disk is eliminated by providing an innovate technique for specially allocating special data of a storage system. Specially allocated data is data that is pre-allocated on disk and stored in memory of the storage system. “Special data” may include any pre-decided data, one or more portions of data that exceed a pre-defined sharing threshold, and/or one or more portions of data that have been identified by a user as special. For example, in some embodiments, a zero-filled data block is specially allocated by a storage system. As another example, in some embodiments, a data block whose contents correspond to a particular type document header is specially allocated.Type: GrantFiled: September 14, 2012Date of Patent: November 18, 2014Assignee: NetApp, Inc.Inventors: Sandeep Yadav, Subramanian Periyagaram
-
Patent number: 8892529Abstract: In embodiments of the present invention, when a duplicate data query is performed on a received data stream, a first physical node which corresponds to each first sketch value and is in a cluster system is identified according to a first sketch value representing the data stream, and then the first sketch value representing the data stream is sent to the identified physical node for the duplicate data query, and a procedure of the duplicate data query does not change with an increase of the number of nodes in the cluster system; therefore, a calculation amount of each node does not increase with an increase of the number of nodes in the cluster system.Type: GrantFiled: December 24, 2013Date of Patent: November 18, 2014Assignee: Huawei Technologies Co., Ltd.Inventors: Qiang Liu, Quancheng Sun, Xiaobo Liu, Jun You, Huadi Yang, Dan Zhou, Yan Huang
-
Publication number: 20140337299Abstract: A method, a system, an apparatus, and a computer readable medium for transmission of data across a network are disclosed.Type: ApplicationFiled: July 28, 2014Publication date: November 13, 2014Inventors: David G. Therrien, David Andrew Thompson
-
Patent number: 8886613Abstract: An example method includes controlling a data de-duplication apparatus to arrange a de-duplication schedule based on the presence or absence of a replication indicator in an item to be de-duplicated. The method also includes selectively controlling the de-duplication schedule based on a replication priority. In one embodiment, the method includes, upon determining that a chunk of data is associated with a replication indicator, controlling the data de-duplication apparatus to schedule the chunk for de-duplication ahead of chunks not associated with a replication indicator. In one embodiment, the method also includes, upon determining that the chunk is associated with a replication priority, controlling the data de-duplication apparatus to schedule the chunk for de-duplication ahead of chunks of data not associated with a replication priority. The schedule location is based, at least in part, on the replication priority. The method also includes controlling de-duplication order based on the schedule.Type: GrantFiled: October 12, 2010Date of Patent: November 11, 2014Inventor: Don Doerner
-
Publication number: 20140330794Abstract: The various implementations of the present invention are provided as a computer-based system for content scoring. Content from a variety of source feeds may be considered for inclusion in an aggregated feed, based on the content of the source feed. The content of the source feed may be “scored” according to a variety of user-configurable options, thereby identifying the most valuable content from the source feeds for inclusion in the aggregated feed. For example, certain content elements may be extracted from a variety of source feeds and then combined to create an aggregated feed where the aggregated feed contains only the highest scoring elements, as determined by the feed creator, from the various source feeds are used to create the aggregated feed.Type: ApplicationFiled: July 17, 2014Publication date: November 6, 2014Applicant: PARLANT TECHNOLOGY, INC.Inventors: Dane Dellenbach, Bruce Hassler, Jacob Hutchings, Carson Anderson
-
Publication number: 20140330795Abstract: A computer identifies a plurality of data retrieval requests that may be serviced using a plurality of unique data chunks. The computer services the data retrieval requests by utilizing at least one of the unique data chunks. At least one of the unique data chunks is utilized for servicing two or more of the data retrieval requests. The computer determines a servicing sequence for the plurality of data retrieval requests such that the two or more of the data retrieval requests that are serviced utilizing the at least one of the unique data chunks are serviced consecutively. The computer services the plurality of data retrieval requests according to the servicing sequence.Type: ApplicationFiled: July 18, 2014Publication date: November 6, 2014Inventors: Kavita Chavda, Nagapramod S. Mandagere, Ramani R. Routray, Pin Zhou
-
Publication number: 20140330793Abstract: A system for managing a storage system comprises a processor and a memory. The processor is configured to receive storage system information from a deduplicating storage system. The processor is further configured to determine a capacity forecast based at least in part on the storage system information. The processor is further configured to provide a compression forecast. The memory is coupled to the processor and configured to provide the processor with instructions.Type: ApplicationFiled: May 1, 2014Publication date: November 6, 2014Applicant: EMC CORPORATIONInventor: Mark Chamness
-
Patent number: 8880482Abstract: Various embodiments for replicating deduplicated data using a processor device are provided. A block of the deduplicated data, created in a source repository, is assigned a global block identifier (ID) unique in a grid set inclusive of the source repository. The global block ID is generated using at least one unique identification value of the block, a containing grid of the grid set, and the source repository. The global block ID is transmitted from the source repository to a target repository. If the target repository determines the global block ID is associated with an existing block of the deduplicated data located within the target repository, the block is not transmitted to the target repository during a subsequent replication process.Type: GrantFiled: January 2, 2013Date of Patent: November 4, 2014Assignee: International Business Machines CorporationInventors: Shay H. Akirav, Lior Aronovich, Ron Asher, Yariv Bachar, Ariel J. Ish-Shalom, Ofer Leneman
-
Patent number: 8880476Abstract: A mechanism is provided in a data processing system for reliable asynchronous solid-state device based de-duplication. Responsive to receiving a write request to write data to the file system, the mechanism sends the write request to the file system, and in parallel, computes a hash key for the write data. The mechanism looks up the hash key in a de-duplication table. The de-duplication table is stored in a memory or a solid-state storage device. Responsive to the hash key not existing in the de-duplication table, the mechanism writes the write data to a storage device, writes a journal transaction comprising the hash key, and updates the de-duplication table to reference the write data in the storage device.Type: GrantFiled: June 28, 2012Date of Patent: November 4, 2014Assignee: International Business Machines CorporationInventors: Ranjit M. Noronha, Ajay K. Singh
-
Patent number: 8880481Abstract: Inverse distribution operations are performed on a large distributed parallel database comprising a plurality of distributed data segments to determine a data value at a predetermined percentile of a sorted dataset formed on one segment. Data elements from across the segments may be first grouped, either by partitioning keys or by hashing, the groups are sorted into a predetermined order, and data values corresponding to the desired percentile are picked up at a row location of the corresponding data element of each group. For a global dataset that is spread across the database segments, a local sort of data elements is performed on each segment, and the data elements from the local sorts are streamed in overall sorted order to one segment to form the sorted dataset.Type: GrantFiled: March 29, 2012Date of Patent: November 4, 2014Assignee: Pivotal Software, Inc.Inventors: Hitoshi Harada, Caleb E. Welton, Gavin Sherry
-
Patent number: 8880469Abstract: A system for storing data comprises a performance storage unit and a performance segment storage unit. The system further comprises a determiner. The determiner determines whether a requested data is stored in the performance storage unit. The determiner determines whether the requested data is stored in the performance segment storage unit in the event that the requested data is not stored in the performance storage unit.Type: GrantFiled: April 18, 2013Date of Patent: November 4, 2014Assignee: EMC CorporationInventor: R. Hugo Patterson
-
Publication number: 20140324791Abstract: A method for copying data efficiently within a deduplicating storage system eliminates the need to read or write the data per se within the storage system. The copying is accomplished by creating duplicates of the metadata block pointers only. The result is a process that creates and arbitrary number of copies using minimal time and bandwidth.Type: ApplicationFiled: June 21, 2013Publication date: October 30, 2014Inventor: Robert Petrocelli
-
Publication number: 20140324793Abstract: A computer-implemented method for layered storage of enterprise data comprises receiving from one or more virtual machines data blocks; de-duplicating the data blocks per hypervisor; storing de-duplicated data blocks in a local cache memory; time-based grouping the data blocks into data containers; dividing each data container in X fixed length mega-blocks; for each data container applying erasure encoding to the X fixed length mega-blocks to thereby generate Y fixed length mega-blocks with redundant data, Y being larger than X; and distributed storing the Y fixed length mega-blocks across multiple backend storage systems.Type: ApplicationFiled: April 8, 2014Publication date: October 30, 2014Applicant: CLOUDFOUNDERS NVInventor: Kurt GLAZEMAKERS
-
Publication number: 20140324796Abstract: A system for directing for storage comprises a processor and a memory. The processor is configured to determine a segment overlap for each of a plurality of nodes. The processor is further configured to determine a selected node of the plurality of nodes based at least in part on the segment overlap for each of the plurality of nodes and based at least in part on a selection criteria. The memory is coupled to the processor and configured to provide the processor with instructions.Type: ApplicationFiled: May 1, 2014Publication date: October 30, 2014Applicant: EMC CORPORATIONInventors: Frederick Douglis, Philip Shilane, R. Hugo Patterson
-
Publication number: 20140324795Abstract: Methods and systems for data management are disclosed. With embodiments of the present disclosure, data files originating from the same source data can be de-duplicated. One such method comprises calculating one or more of a first characteristic value for first data in a first format, and one or more second characteristic values for one or more data in one or more second formats into which the first data can be converted, said characteristic value uniquely representing an arrangement characteristic of at least part of bits of data in a particular format. The method also includes storing one of the first data and the second data in response to one of the calculated characteristic values being the same as a stored characteristic value corresponding to a second data.Type: ApplicationFiled: April 28, 2014Publication date: October 30, 2014Applicant: International Business Machines CorporationInventors: Peng Hui Jiang, Pi Jun Jiang, Xi Ning Wang, Liang Xue, Wen Yin
-
Publication number: 20140324798Abstract: A process that ensures the virtual destruction of data files a user wishes to erase from a storage medium, such as a hard drive, flash drive, or removable disk. This approach is appropriate for managing custom distributions from a large file sets as it is roughly linear in compute complexity to the number of files erased but is capped when many files are batch erased.Type: ApplicationFiled: July 14, 2014Publication date: October 30, 2014Inventor: Alan Joshua Shapiro
-
Publication number: 20140324788Abstract: A cleaning application that can monitor one or more browser applications that are executed on a computer, and that can, for at least one browser application, clean at least one of one or more files or a registry associated with the at least one browser application is provided. The cleaning application can include a cleaning module. The cleaning module can monitor one or more browser applications that are executed on a computer. The cleaning module can further detect a closing of at least one browser application. The cleaning module can further perform a pre-defined action in response to the closing of the at least one browser application. The pre-defined action can include cleaning at least one of one or more files or a registry associated with the at least one browser application.Type: ApplicationFiled: April 24, 2013Publication date: October 30, 2014Applicant: Piriform Ltd.Inventor: Guy SANER
-
Publication number: 20140324790Abstract: Techniques for avoiding duplicate comparisons while comparing customer records to identify linked customer records pertaining to a single customer entity are provided. The techniques include the computer system comparing a first electronic customer record with a second electronic customer record to determine if the first electronic customer record and the second electronic customer record pertain to a single customer entity if the computer system identifies a common blocker key corresponding to a selected blocker from a data field in the first electronic customer record and from a data field in the second electronic customer record and if the computer system does not identify a common blocker key corresponding to an additional lower order blocker from another data field in the first electronic customer record and from a data field in the second electronic customer record.Type: ApplicationFiled: April 26, 2013Publication date: October 30, 2014Applicant: Wal-Mart Stores, Inc.Inventors: Andrew Benjamin Ray, Nathaniel Philip Troutman
-
Publication number: 20140324797Abstract: The invention provides a display interface in a social networking system that enables the presentation of information related to a user in a timeline or map view. The system accesses information about a user of a social networking system, including both data about the user and social network activities related to the user. The system then selects one or more of these pieces of data and/or activities from a certain time period and gathers them into timeline units based on their relatedness and their relevance to users. These timeline units are ranked by relevance to the user, and are used to generate a timeline or map view for the user containing visual representations of the timeline units organized by location or time. The timeline or map view is then provided to other users of the social networking system that wish to view information about the user.Type: ApplicationFiled: July 9, 2014Publication date: October 30, 2014Inventors: Raylene Kay Yung, Ryan Case, Jeff Huang, Samuel Lessin, Ryan David Mack, Paul M. McDonald, Serkan Piantino, Arun Vijayvergiya, Joshua Wiseman, Steven Young, Mark E. Zuckerberg
-
Publication number: 20140324794Abstract: Methods are provided for clustering events. Data is received at an extraction engine from managed infrastructure. Events are converted into alerts and the alerts mapped to a matrix M. One or more common steps are determined from the events and clusters of events are produced relating to the alerts and or events.Type: ApplicationFiled: April 28, 2014Publication date: October 30, 2014Applicant: Moogsoft, Inc.Inventors: Philip Tee, Robert Duncan Harper, Charles Mike Silvey
-
Publication number: 20140324789Abstract: A cleaning application that can monitor one or more characteristics of a computer, and that can clean at least one of one or more files or a registry of the computer, is provided. The cleaning application can include a cleaning module. The cleaning module can monitor one or more characteristics of the computer. The cleaning module can further detect an occurrence of pre-defined criteria involving the one or more characteristics. The cleaning module can further perform a pre-defined action in response to the pre-defined criteria. The pre-defined action can include cleaning at least one of one or more files or a registry associated with the computer.Type: ApplicationFiled: April 24, 2013Publication date: October 30, 2014Applicant: Piriform Ltd.Inventor: Guy SANER
-
Publication number: 20140324792Abstract: Embodiments of the present invention relate to extraction of a social graph from contact information across a confined user base. Users are typically subscribed to a service that backs up data from end-user devices to a cloud. The data includes contacts from mobile address books. The service is able to determine relationships of contacts in the cloud to build a social graph or map of these contacts. The social graph can be used to drive individual and group analytics to, for example, increase membership and provide value-added features to its service members.Type: ApplicationFiled: March 24, 2014Publication date: October 30, 2014Applicant: Synchronoss Technologies, Inc.Inventors: Omar Chaudhry, Andrew Fuller
-
Patent number: 8874863Abstract: Systems and methods are provided for an asynchronous data replication system in which the remote replication reduces bandwidth requirements by copying deduplicated differences in business data from a local storage site to a remote, backup storage site, the system comprising: a local performance storage pool for storing data; a local deduplicating storage pool for storing deduplicated data, said local deduplicating storage pool further storing metadata about data objects in the system and which has metadata analysis logic for identifying and specifying differences in a data object over time; a remote performance storage pool for storing a copy of said data, available for immediate use as a backup copy of said data to provide business continuity to said data; a remote deduplicating storage pool for storing deduplicated data; and a controller for synchronizing the remote performance storage pool to have the second version of the data object using deduplicated data.Type: GrantFiled: August 1, 2012Date of Patent: October 28, 2014Assignee: Actifio, Inc.Inventors: Madhav Mutalik, Christopher A. Provenzano, Philip J. Abercrombie
-
Patent number: 8874602Abstract: A random number generation process generated uncorrelated random numbers from identical random number sequences on parallel processing database segments of an MPP database without communications between the segments by establishing a different starting position in the sequence on each segment using an identifier that is unique to each segment, query slice information and the number of segments. A master node dispatches a seed value to initialize the random number sequence generation on all segments, and dispatches the query slice information and information as to the number of segments during a normal query plan dispatch process.Type: GrantFiled: September 29, 2012Date of Patent: October 28, 2014Assignee: Pivotal Software, Inc.Inventors: Hitoshi Harada, Caleb Welton, Florian Schoppmann
-
Patent number: 8874520Abstract: A system and method for caching fingerprints in a client cache is provided. A data object that comprises a set of data segments and describes a backup process is identified. Thereafter, a request referencing the data object is made to a deduplication server to request that a task identifier be added to the data object. If the deduplication server is able to successfully add the task identifier to the data object, then an active identifier is added to each data segment from the set of data segments in a cache that is within a client system.Type: GrantFiled: February 11, 2011Date of Patent: October 28, 2014Assignee: Symantec CorporationInventors: Xianbo Zhang, Thomas Hartnett, Weibao Wu