Data Cleansing, Data Scrubbing, And Deleting Duplicates Patents (Class 707/692)
  • Patent number: 10733158
    Abstract: A method, computer program product, and computing system for receiving a candidate data portion; calculating a distance-preserving hash for the candidate data portion; and performing an entropy analysis on the distance-preserving hash to generate a hash entropy for the candidate data portion.
    Type: Grant
    Filed: May 3, 2019
    Date of Patent: August 4, 2020
    Assignee: EMC IP Holding Company LLC
    Inventors: Sorin Faibish, Philip Shilane, Ivan Basov, Istvan Gonczi, Philippe Armangau, Vamsi Vankamamidi
  • Patent number: 10719906
    Abstract: A graph processing system may include at least one auxiliary memory configured to store graph data including phase data and attribute data, a main memory configured to store a portion of the graph data, a plurality of graphics processing units (GPUs) configured to process the graph data received from the main memory and perform synchronization and including cores and device memories, and a central processing unit (CPU) configured to manage query processing associated with the graph data performed by the GPUs and store, in the auxiliary memory, updatable attribute data of a result of the query processing.
    Type: Grant
    Filed: December 28, 2016
    Date of Patent: July 21, 2020
    Assignee: DAEGU GYEONGBUK INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Min Soo Kim, Kyu Hyeon An, Him Chan Park, Jin Wook Kim, Se Yeon Oh
  • Patent number: 10719516
    Abstract: A method and system for processing database queries containing aggregate functions. The query may specify fewer groups than there are processes available to process the queries. Further, the queries may target a set of rows and specify a sort-by key and a group-by key. The method and system further includes determining that the queries specify application of the aggregate function to each of a plurality of groups that may correspond to a plurality of distinct values of the group-by key and determining that plurality of processes are available to process the queries. The method and system also includes determining the plurality of ranges of a composite key that may be formed by combining the group-by key and the sort-by key and assigning each range of the plurality ranges to a corresponding process to calculate the aggregate function.
    Type: Grant
    Filed: August 27, 2018
    Date of Patent: July 21, 2020
    Assignee: Oracle International Corporation
    Inventors: Venkatesh Sakamuri, Huagang Li, Sankar Subramanian, Andrew Witkowski
  • Patent number: 10719252
    Abstract: A method is used in managing deduplication characteristics in a storage system. Deduplication entries stored in a deduplication cache are categorized into a set of deduplication groups based on a data deduplication probability associated with the deduplication entries. A machine learning system is used to dynamically adjust deduplication characteristics associated with the set of deduplication groups based on an I/O workload associated with the storage system.
    Type: Grant
    Filed: August 3, 2018
    Date of Patent: July 21, 2020
    Assignee: EMC IP Holding Company LLC
    Inventors: Yubing Wang, Philippe Armangau, Ajay Karri
  • Patent number: 10691349
    Abstract: A method, executed by a computer, includes writing, to a storage device, a first instance of a data sequence and a corresponding first reference count, in response to determining that a subsequent data sequence is identical to the first instance of the data sequence, writing, to the storage device, a metadata reference referencing the subsequent data sequence and incrementing the first reference count, and writing, to a storage device, a second instance of the data sequence and a corresponding second reference count in response to determining that the first reference count is equal to a selected threshold. A computer system and computer program product corresponding to the above method are also disclosed herein.
    Type: Grant
    Filed: October 28, 2016
    Date of Patent: June 23, 2020
    Assignee: International Business Machines Corporation
    Inventors: Joseph W. Dain, Itzhack Goldberg, Gregory T. Kishi
  • Patent number: 10691653
    Abstract: Disclosed are various embodiments for intelligent backfill and data migration operations performed using an event processing architecture. A backfill system may identify backfill operations to migrate legacy data from a first system to a second system and generate events to provide to an event processor, where each of the events causes a backfill operation to be performed. Access to the events may be selectively controlled using an event processing queue such that the events are processed and the backfill operations are performed when a computing resource has available computing resources, regardless of a time of day.
    Type: Grant
    Filed: September 5, 2017
    Date of Patent: June 23, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Mark Aran Aiken, Raghunathan Kothandaraman, Sam L. Nelson
  • Patent number: 10678434
    Abstract: This storage system is designed to: divide data into a plurality of chunk data (pieces of data) in a deduplication process; select one or more chunk data from among the plurality of chunk data in accordance with a sampling period which indicates that, on average, one chunk data be selected from among each N chunk data; and calculate a fingerprint, such as a hash value, for each of one or more characteristic chunk data, which are the selected one or more chunk data, and determine whether data including the one or more characteristic chunk data is a duplication. The storage system changes the sampling period on the basis of the results of past deduplication processes.
    Type: Grant
    Filed: May 12, 2015
    Date of Patent: June 9, 2020
    Assignee: HITACHI, LTD.
    Inventors: Yoshihiro Yoshii, Yasuo Watanabe, Yoshinori Ohira
  • Patent number: 10664463
    Abstract: A database structure and a system that uses the structure to facilitate efficient context enrichment of low-level events occurring in a distributed computing system. In one aspect, the database structure comprises a table accessible to a distributed storage system. The table comprises a plurality of rows. Each row represents a corresponding process creation event of a particular process at a particular host at a particular time and assigned a particular event identifier. Each row comprises a row key identifying the particular host, the particular process, the particular time, and the particular event identifier of the process creation event corresponding to the row. The particular time and the particular event identifier are stored as part of the row key in a bitwise one's complement format. The row key structure facilitates efficient identification of a process creation event where only hostname and the process identifier of the process creation event are known.
    Type: Grant
    Filed: February 6, 2017
    Date of Patent: May 26, 2020
    Assignee: Dropbox, Inc.
    Inventor: Santosh Ananthakrishnan
  • Patent number: 10664448
    Abstract: Various embodiments for repository management in a data deduplication system, by a processor device, are provided. Metadata of an inode structure of an entire pre-allocated file system is captured, exported, and compressed from an existing deduplication appliance, the pre-allocated file system comprising a fully padded file system. The exported and compressed metadata of the pre-allocated file system is decompressed and imported into a data deduplication repository of a new deduplication appliance having an identical file system size as within the existing deduplication appliance, to initially configure or subsequently scale the inode structure of a file system of the data deduplication repository of the new deduplication appliance efficiently.
    Type: Grant
    Filed: October 25, 2017
    Date of Patent: May 26, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Oded Aviyam, Shira Ben-Dor, Joseph W. Dain, Gil E. Paz
  • Patent number: 10664461
    Abstract: A size associated with a content file is determined to be greater than a threshold size. In response to the determination, file metadata of the content file split and stored across a plurality of component file metadata structures. The file metadata of the content file specifies tree structure organizing data components of the content file and each component file metadata structure of the plurality of component file metadata structures stores a portion of the tree structure. A snapshot tree is updated to reference the plurality of component file metadata structures for the content file.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: May 26, 2020
    Assignee: Cohesity, Inc.
    Inventors: Zhihuan Qiu, Ganesha Shanmuganathan
  • Patent number: 10664449
    Abstract: A file deduplication processing system is provided. The system deduplicates raw files to generate deduplicated vault files and a descriptor indicating a storage location of each data chunk in the vault files corresponding to the raw files. When receiving a writing request of a write data, the system finds at least one data chunk including old data corresponding to the write data according to the descriptor, loads and recovers the data chunk whose boundary is not overlapped with a boundary of the write data in the vault file comprising the old data corresponding to the write data so as to generate an update data by incorporating the recovered data chunk and the write data, deduplicates the update data to generate a new vault file and stores the same in the chunk store, and updates a content corresponding to each data chunk in the descriptor.
    Type: Grant
    Filed: August 7, 2018
    Date of Patent: May 26, 2020
    Assignee: QNAP SYSTEMS, INC.
    Inventors: Chin-Tsung Cheng, Jing-Wei Su
  • Patent number: 10649676
    Abstract: Duplicates of immutable data objects are identified and deduplicated. This is performed by performing a bottom up deduplication, such that objects in hierarchically lower levels of a data structure are deduplicated first. Deduplication identifies duplicates of a particular object through value equality analysis and replaces pointers to duplicate objects and the duplicate objects themselves, with a reference to the particular object. This process is repeated for hierarchically higher data objects, but where the value equality analysis includes, among other things, evaluating the equality of references to hierarchically lower data objects.
    Type: Grant
    Filed: December 28, 2018
    Date of Patent: May 12, 2020
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventor: Bart Johan Fred De Smet
  • Patent number: 10635339
    Abstract: In some aspects, devices, systems, and methods are provided that relate to data deduplication performed in data storage devices, such as solid-state drives (SSD) or drives of any other type. In some aspects, devices, systems, and methods are provided that relate to hierarchical data deduplication at a local and system level, such as in a storage system built with one or more SSDs having built-in data deduplication functionality. The hierarchical data deduplication utilizes the IDs in the data storage devices to decide if the incoming data has to be stored or if a copy of the incoming data is already stored. In hierarchical data deduplication, no IDs (or signatures) are required to be stored at a system level. In some aspects, data steering is provided that enables data storing coordination in a system that consists of a set of data storage device (e.g., SSDs) having built-in data deduplication.
    Type: Grant
    Filed: May 2, 2018
    Date of Patent: April 28, 2020
    Assignee: SMART IPOS, INC.
    Inventors: Manuel Antonio d'Abreu, Ashutosh Kumar Das
  • Patent number: 10621144
    Abstract: An approach for parallel deduplication using automatic chunk sizing. A dynamic chunk deduplicator receives a request to perform data deduplication where the request includes an identification of a dataset. The dynamic chunk deduplicator analyzes file level usage for one or more data files including the dataset to associate a deduplication chunk size with the one or more data files. The dynamic chunk deduplicator creates a collection of data segments from the dataset, based on the deduplication chunk size associated with the one or more data files. The dynamic chunk deduplicator creates a deduplication data chunk size plan where the deduplication data chunk size plan includes deduplication actions for the collection of data segments and outputs the deduplication data chunk size plan.
    Type: Grant
    Filed: March 23, 2017
    Date of Patent: April 14, 2020
    Assignee: International Business Machines Corporation
    Inventors: Debora A. Lowry, Jonathan Mendez, Jose D. Ramos, Blanca R. Navarro
  • Patent number: 10621496
    Abstract: A context profile is created. The context profile includes one or more types of context data to be captured, frequency in which the one or more types of context data to be captured, data format in which the one or more types of context data is to be outputted and one or more custom data types. The context profile corresponds to an application identifier. A mapping of context profiles to application identifiers is created. Based on the mapping, the context profile is sent to a corresponding context provider. Upon reception, the one or more types of context data are evaluated. Based on a number of data processing rules, the received context data can be processed or discarded.
    Type: Grant
    Filed: December 21, 2016
    Date of Patent: April 14, 2020
    Assignee: SAP SE
    Inventor: Nipun Dev
  • Patent number: 10613785
    Abstract: A very efficient computer system is presented to generate all pairs of records that have a certain similarity. Similarity is defined in terms of the textual similarity of the record attributes and/or absolute difference for numeric record attributes. Software assigns each record to a number of bins, and then compares pairs of records that belong to the same bin. This is more efficient than comparing all pairs of records since the number of records compared to each other is much smaller.
    Type: Grant
    Filed: October 11, 2017
    Date of Patent: April 7, 2020
    Assignee: Tamr, Inc.
    Inventors: George Beskales, Ihab F. Ilyas
  • Patent number: 10599533
    Abstract: Efficient cloud storage systems, methods, and media are provided herein. Exemplary methods may include locating a Merkle tree of a stored object on a deduplicating block store, comparing an object at a source location to the Merkle tree of the stored object, determining changed blocks for the object at a source location, and transmitting a message across a network to the deduplicating block store, the message including the change blocks and Merkle nodes that correspond to the change blocks.
    Type: Grant
    Filed: May 18, 2017
    Date of Patent: March 24, 2020
    Assignee: EFOLDER, INC.
    Inventors: Robert Petri, Nitin Parab
  • Patent number: 10599360
    Abstract: A system and method of data transmission are disclosed. In certain aspects, the method, performed by a target node, includes receiving a first plurality of hash values from the source node and comparing the first plurality of hash values with a second plurality of hash values. The method also includes determining a set of common hash values corresponding to an intersection of the first plurality of hash values and the second plurality of hash values. The method further includes reserving the set of common hash values by placing the set of common hash values in a first filter stored in a memory of the target node and committing the set of common hash values by placing them in a second filter stored in a storage of the target node.
    Type: Grant
    Filed: July 24, 2018
    Date of Patent: March 24, 2020
    Assignee: VMware, Inc.
    Inventors: Vijay Somasundaram, Sudarshan Madenur Sridhara
  • Patent number: 10585864
    Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.
    Type: Grant
    Filed: November 11, 2016
    Date of Patent: March 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Namit Kabra, Yannick Saillet
  • Patent number: 10585865
    Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.
    Type: Grant
    Filed: December 5, 2017
    Date of Patent: March 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Namit Kabra, Yannick Saillet
  • Patent number: 10572177
    Abstract: An apparatus comprises a memory device that stores first history data including a first portion associated with a first end location, and at least one processor configured to receive a closing request from a recipient device storing second history data, indicating that a second portion of the second history data is closing from accepting additional data, and comprising a second end location associated with the second portion; responsive to receiving the closing request, determine whether the first end location matches the second end location; responsive to a matching, enter a confirmation state where the first portion is closed from accepting additional data, and transmit a closing acknowledgement to the recipient device, indicating that the first portion has entered the confirmation state, and allowing the recipient device to close the second portion in response to receiving the closing acknowledgement.
    Type: Grant
    Filed: March 27, 2018
    Date of Patent: February 25, 2020
    Assignee: APPEX NETWORKS HOLDING LIMITED
    Inventors: Hao Zhuang, Yongdong Wang
  • Patent number: 10572172
    Abstract: Multi-granular deduplication is performed on I/O data received at a storage system as part of replicating the I/O data to another storage system. Deduplication may be performed in an iterative fashion, for example, on blocks and smaller and smaller sub-blocks of the I/O data. Deduplication may be performed on blocks and smaller sub-blocks by comparing each block or sub-block to preceding blocks or sub-blocks, respectively, in the I/O data to determine if there is a duplicate. If a duplicate block of sub-block is determined for a block or sub-block, the block or sub-block may be replaced in the I/O data with a reference to the duplicate block or sub-block in a deduplication header for the block. A metadata structure may indicate which blocks of the I/O data have had deduplication performed thereon. The replicating storage system may use the metadata structure and deduplication block headers to restore the I/O data.
    Type: Grant
    Filed: April 20, 2018
    Date of Patent: February 25, 2020
    Assignee: EMC IP Holding Company LLC
    Inventor: Venkata L R Ippatapu
  • Patent number: 10565230
    Abstract: A technique preserves efficiency for replication of data between a source node of a source cluster (“source”) and a destination node of a destination cluster (“destination”) of a clustered network. Replication in the clustered network may be effected by leveraging global in-line deduplication at the source to identify and avoid copying duplicate data from the source to the destination. To ensure that the copy of the data on the destination is synchronized with the data received at the source, the source creates a snapshot of the data for use as a baseline copy at the destination. Thereafter, new data received at the source that differs from the baseline snapshot are transmitted and copied to the destination. In addition, the source and destination nodes negotiate to establish a mapping of name-to-data when transferring data (i.e., an extent) between the clusters.
    Type: Grant
    Filed: October 6, 2015
    Date of Patent: February 18, 2020
    Assignee: NetApp, Inc.
    Inventors: Ling Zheng, Michael L. Federwisch, Blake H. Lewis
  • Patent number: 10564893
    Abstract: A multi-platform data storage system that facilitates sharing of containers including one or more virtual storage resources. The multi-platform data storage system can, for example, include a storage interface configured to enable access to a plurality of storage platforms that use different storage access and/or management protocols, the plurality of storage platforms storing data objects in physical data storage; and a storage mobility and management layer providing virtual management of virtual storage resources corresponding to one or more data objects stored in the plurality of storage platforms, the storage mobility and management layer including at least a transfer module coupled to at least one network and configured to transfer at least one of the data objects. The transfer module can transfer the at least one of the data objects between the multi-platform data storage system and another data storage system.
    Type: Grant
    Filed: February 23, 2018
    Date of Patent: February 18, 2020
    Assignee: Arrikto Inc.
    Inventors: Konstantinos Venetsanopoulos, Evangelos Koukis, Christos Stavrakakis, Ilias Tsitsimpis, Dimitrios Aragiorgis, Alexios Pyrgiotis
  • Patent number: 10545699
    Abstract: A method for execution by a computing device within a dispersed storage network (DSN). The method beings when data accesses occur for a data object of a storage container within the DSN. The method continues by updating, for at least some of the data accesses, an object value for the data object to produce an updated object value. The method continues by updating an object retention cost for the data object to produce an updated object retention cost. The method continues by updating a data object retention policy for the data object based on the updated object value and the updated object retentions costs. When one of the data accesses is a deletion event, the method continues by utilizing a current updated data object retention policy to determine a deletion-retention option for the data object. The method continues by executing the deletion-retention option on the data object.
    Type: Grant
    Filed: April 11, 2017
    Date of Patent: January 28, 2020
    Assignee: International Business Machines Corporation
    Inventors: Andrew D. Baptist, Bart R. Cilfone, Greg R. Dhuse, Harsha Hegde, Wesley B. Leggette, Manish Motwani, Jason K. Resch, Ilya Volvovski, Ethan S. Wozniak
  • Patent number: 10545914
    Abstract: The disclosure provides a system, method and computer-readable storage device embodiments. Some embodiments can include an IPv6-centric distributed storage system. An example method includes receiving, at a computing device, a request to create metadata associated with an object from a client, creating the metadata based on the request and transmitting the metadata and an acknowledgment to the client, wherein the metadata contains an address in a storage system for each replica of the object and wherein the metadata can be used to write data to the storage system and read the data from the storage system. There is no file system layer between an application layer and a storage system layer.
    Type: Grant
    Filed: January 17, 2017
    Date of Patent: January 28, 2020
    Assignee: CISCO TECHNOLOGY, INC.
    Inventors: Andre Surcouf, Guillaume Ruty, William Mark Townsley
  • Patent number: 10540423
    Abstract: A representation of a collection of content items is generated for display by a computing device. The representation includes a two-or-more-dimensional arrangement including representations of the content items. The representations of the content items are positioned relative to one another based, at least in part, on values of one or more attributes of the digital content items. The representation is dynamically adjusted based, at least in part, on a user interaction with a representation of one of the content items.
    Type: Grant
    Filed: June 22, 2017
    Date of Patent: January 21, 2020
    Assignee: Oath Inc.
    Inventors: Simon Kayode Osindero, Robert Jaros, Eric Willis, Clayton Mellina, Anastasia Svetlichnaya
  • Patent number: 10521369
    Abstract: An apparatus in one embodiment comprises a host device configured to communicate over a network with a storage system comprising a plurality of storage devices. The host device comprises a set of input-output queues and a multi-path input-output driver configured to select input-output operations from the set of input-output queues for delivery to the storage system over the network. The multi-path input-output driver is further configured to determine data reduction control indicators for the input-output operations, and to provide the data reduction control indicators to the storage system in association with the input-output operations. Different data reduction control indicators are associated with different ones of the input-output operations that are generated by different processes running on the host device. The storage system adapts its performance of data reduction for the different ones of the input-output operations based at least in part on their associated data reduction control indicators.
    Type: Grant
    Filed: July 13, 2018
    Date of Patent: December 31, 2019
    Assignee: EMC IP Holding Company LLC
    Inventors: Sanjib Mallick, Ramesh Doddaiah, Arieh Don
  • Patent number: 10515223
    Abstract: Techniques to provide secure cloud-based storage of data shared across file system objects and clients are disclosed. In various embodiments, a primary encryption key is determined for an object associated with a plurality of component chunks of file system data. The primary encryption key is used to generate for each of said component chunks a corresponding chunk key, based at least in part on the primary encryption key and data comprising or otherwise associated with the chunk. The respective chunk keys are provided to a file system client configured to create and store the object at least in part by encrypting each chunk included in the plurality of component chunks using the chunk key provided for that chunk to generated encrypted chunk data, and combining the encrypted chunk data to create and store the object.
    Type: Grant
    Filed: March 21, 2019
    Date of Patent: December 24, 2019
    Assignee: EMC IP Holding Company LLC
    Inventors: Thomas Manville, Julio Lopez, Rajiv Desai, Nathan Rosenblum
  • Patent number: 10496543
    Abstract: A method of deduplicating memory in a memory module includes identifying a hash table array including hash tables each corresponding to a hash function, and each including physical buckets, each physical bucket including ways and being configured to store data, identifying a plurality of virtual buckets each including some of the physical buckets, and each sharing at least one of the physical buckets with another of the virtual buckets, hashing a block of data according to a corresponding one of the hash functions to produce a hash value, determining whether an intended physical bucket has available space for the block of data according to the hash value, and determining whether a near-location physical bucket has available space for the block of data when the intended physical bucket does not have available space, the near-location physical bucket being in a same one of the virtual buckets as the intended physical bucket.
    Type: Grant
    Filed: May 23, 2016
    Date of Patent: December 3, 2019
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Frederic Sala, Chaohong Hu, Hongzhong Zheng, Dimin Niu, Mu-Tien Chang
  • Patent number: 10496313
    Abstract: Examples include application of a variable-sized content-defined chunking technique to a first data portion to identify a content-defined chunk boundary at least partially defining a remainder section, merging of the remainder section with a second data portion ordered before the first data portion to create a merged section, and application of the chunking technique to the merged section.
    Type: Grant
    Filed: September 22, 2014
    Date of Patent: December 3, 2019
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Richard Phillip Mayo
  • Patent number: 10482064
    Abstract: De-duplication of immutable data items at runtime may include identifying a set of potentially duplicate immutable data items in use by one or more applications. The applications may access the immutable data items through pointers of respective objects corresponding to the immutable data items. A de-duplication component executing distinctly from the applications may analyze the identified set of potentially duplicate immutable data items to determine two or more that have identical content and may then modify one or more pointers of the corresponding objects so that at least two of the pointers point to a single immutable data item.
    Type: Grant
    Filed: June 26, 2012
    Date of Patent: November 19, 2019
    Assignee: Oracle International Corporations
    Inventors: Mikhail A. Dmitriev, Nathan L. Reynolds, Oleksandr Otenko
  • Patent number: 10482079
    Abstract: A system, method, and computer program includes a communications interface configured to receive a set of industry reports from multiple industry sources, and circuitry to compare one or more attributes of at least two trade lines to identify whether the at least two trade lines are duplicates. The circuitry characterizes as a binary indication whether the comparing indicates the one or more attributes are a match, and display a representation of the binary indication and receive a user-identified indication whether the at least two trade lines are duplicates. The circuitry trains a classifier, records the indication whether the at least two trade lines are duplicates and removes at least one of the at least two trade lines from the set of industry reports, and runs the classifier. Subsequently, a supervised machine learning classifier is trained in fit on the training data and is evaluated for accuracy of the testing data.
    Type: Grant
    Filed: June 8, 2017
    Date of Patent: November 19, 2019
    Assignee: CORELOGIC CREDCO, LLC
    Inventor: Parag Vijay Ahire
  • Patent number: 10452641
    Abstract: Performing snapshot conscious internal file modification for network-attached storage is presented herein. A file system can comprise a first component configured to modify, during a service request, storage for a subset of data blocks of a file—the service request not being recognized by an external entity as a change of content of the file. Further, the file system can comprise a second component configured to prevent, based on the service request, a copy of the storage from being created for servicing of a snapshot—the snapshot comprising a point-in-time copy of the file system.
    Type: Grant
    Filed: June 30, 2015
    Date of Patent: October 22, 2019
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventor: Ravi V. Batchu
  • Patent number: 10452318
    Abstract: Systems and methods for recording and playback of multiple data streams. One device includes a storage controller coupled to an electronic storage device, a first data buffer storing data received from a first data stream, a second data buffer storing data received from a second data stream, a fragment buffer storing fragment metadata, a storage buffer including a plurality of data fragments, and an electronic processor. The electronic processor receives information designating a data stream storage area of the electronic storage device. The electronic processor arbitrates between the first and second data buffers to select a data fragment for writing to the storage buffer. The electronic processor writes the data fragment to the storage buffer, and writes fragment metadata defining the data fragment to the fragment buffer. The electronic processor controls the storage controller to sequentially write from the plurality of data fragments to the data stream storage area.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: October 22, 2019
    Assignee: MOTOROLA SOLUTIONS, INC.
    Inventors: Adrian Guillen, Joel Hegberg, Chet A. Lampert
  • Patent number: 10430383
    Abstract: In one example, a method for processing data includes receiving information that identifies an ad hoc group of size ‘n’ of files F1 . . . Fn, each file F including a respective file sequence S that includes K data segments. Next, each file sequence S is sampled to obtain a sequence SS of data segments from the file sequence S, and a non-random sampling of data segments is sampled from each sequence SS to obtain a set SSU of the sequence SS. The data segments of each set SSU are then sampled to obtain a sample subset SSUS of the set SSU, and a compression ratio is determined for each data segment in each sample subset SSUS. Finally, an average data compression RF1 . . . Fn is estimated and output for the files F in the group of size ‘n’, based on the compression ratios.
    Type: Grant
    Filed: September 30, 2015
    Date of Patent: October 1, 2019
    Assignee: EMC IP HOLDING COMPANY LLC
    Inventors: Guilherme Menezes, Teng Xu, Abdullah Reza
  • Patent number: 10430426
    Abstract: Answer effectiveness evaluations include providing, by a computing device, an answer to a search query received from a user, and in response to receiving a subsequent search query from the user, determining by the computing device a level of effectiveness of the answer to the search query with respect to the user. The determination includes comparing aspects of the search query to aspects of the subsequent search query, calculating, based on the comparing, a relevance score that indicates a measure of similarity between the aspects of the search query and the aspects of the subsequent search query, and determining that the answer effectively answers the search query when the relevance score exceeds a threshold value.
    Type: Grant
    Filed: May 3, 2016
    Date of Patent: October 1, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Donna K. Byron, Lakshminarayanan Krishnamurthy, Priscilla Santos Moraes, Niyati Parameswaran
  • Patent number: 10417202
    Abstract: An example storage system may include storage media and a storage controller. The storage controller may be to establish virtual volumes, private data stores, and a deduplication data store, each being a virtual storage space of the storage media, wherein each of the private data stores is associated with one of the virtual volumes and the deduplication data store is shared among the virtual volumes. The storage controller may, in response to receiving input data that is to be stored in a given one of the virtual volumes, determine a signature for the input data and select between storing the input data in the private data store associated with the given one of the virtual volumes and storing the input data in the deduplication data store.
    Type: Grant
    Filed: December 21, 2016
    Date of Patent: September 17, 2019
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Siamak Nazari, Jin Wang, Srinivasa D. Murthy, Roopesh Kumar Tamma
  • Patent number: 10395145
    Abstract: A computer-implemented method includes receiving a set of representative machine image regions for a computing environment wherein the set of representative machine image regions collectively comprise a set of representative image chunks. The method also includes generating a fingerprint for each representative image chunk within the set of representative image chunks to produce a set of representative fingerprints, generating a fingerprint for selected image chunks within a measured machine image region to produce a set of sampled fingerprints, and determining a deduplication metric for the measured machine image region based on the representative fingerprints and the sampled fingerprints. A corresponding computer program product and computer system are also disclosed herein.
    Type: Grant
    Filed: March 8, 2016
    Date of Patent: August 27, 2019
    Assignee: International Business Machines Corporation
    Inventors: Jonathan Amit, Danny Harnik, Ety Khaitzin, Sergey Marenkov
  • Patent number: 10387265
    Abstract: A method, computer program product, computing system, and system for preventive hash loading are described. The method may include receiving an indication at a storage server that a machine will be backed up. The method may further include loading fingerprints of blocks related to a previous backup of the machine to RAM of the storage server. The method may also include searching the storage server for fingerprints in the RAM that match fingerprints of incoming blocks from the machine being backed up. The method may additionally include, in response to determining that the fingerprints of the incoming blocks do not match fingerprints in the RAM, searching for the fingerprints in a database. Moreover, the method may include transferring only blocks from the machine being backed up that are not in the RAM or the database of the storage server to the storage server.
    Type: Grant
    Filed: December 16, 2015
    Date of Patent: August 20, 2019
    Assignee: ACRONIS INTERNATIONAL GMBH
    Inventors: Vitaly Pogosyan, Andrey Panin, Stanislav Protasov, Serguei M. Beloussov
  • Patent number: 10387044
    Abstract: The presently disclosed subject matter includes various inventive aspects, which are directed for enabling execution of deduplication during data writes in a distributed storage-system.
    Type: Grant
    Filed: April 5, 2017
    Date of Patent: August 20, 2019
    Assignee: Kaminario Technologies Ltd.
    Inventors: Doron Tal, Eyal Gordon
  • Patent number: 10380074
    Abstract: A computer-implemented method for efficient backup deduplication may include (1) identifying a file to be divided into chunks for deduplication, (2) requesting, from a server, a chunk size to use when dividing the file for deduplication by submitting at least one attribute of the file to the server, the server selecting the chunk size based at least in part on a projected chunk reuse rate when the file is deduplicated according to the chunk size, (3) receiving from the server, in response to requesting the chunk size, the chunk size to use when dividing the file for deduplication, and (4) dividing the file for deduplication into a plurality of chunks according to the chunk size. Various other methods, systems, and computer-readable media are also disclosed.
    Type: Grant
    Filed: January 11, 2016
    Date of Patent: August 13, 2019
    Assignee: Symantec Corporation
    Inventors: Lei Gu, Jason Holler, Nathan Rivers, Elton Inada, Riti Saxena, Kirill Levichev
  • Patent number: 10374807
    Abstract: Storing and retrieving ciphertext in data storage can include determining a first ciphertext value for a first data chunk to be saved to a client-server data storage system using an encrypted chunk hash value associated with the first data chunk as an initial value, and storing the first data chunk on a server in the client-server data storage system in response to determining that the first ciphertext value is a unique ciphertext value. Also, storing and retrieving ciphertext in data storage can include decrypting a ciphertext value for a second data chunk received from a client in the client-server data storage system and based on an encrypted chunk hash value associated with the second data chunk, and sending the second data chunk to the client in response to determining that the decrypted ciphertext value corresponds to an original data chunk saved to the server by the client.
    Type: Grant
    Filed: April 4, 2014
    Date of Patent: August 6, 2019
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Liqun Chen, Peter T. Camble, Jonathan P. Buckingham, Simon Pelly, Simon Kai-Ying Shiu, Joseph S. Ficara, Hendrik Radon
  • Patent number: 10366082
    Abstract: Techniques are described for parallel processing of database queries with an inverse distribution function by a database management system (DBMS). To improve the execution time of a query with an inverse distribution function, the data set referenced in the inverse distribution function is range distributed among parallel processes that are spawned and managed by a query execution coordinator process (QC), in an embodiment. The parallel executing processes sort each range of the data set in parallel, while the QC determines the location(s) of inverse distribution function values based on the count of values in each range of the data set. The QC requests the parallel processes to produce to the next stage of parallel processes the values at the location(s) in the sorted ranges. The next stage of parallel processes computes the inverse distribution function based on the produced values.
    Type: Grant
    Filed: December 9, 2016
    Date of Patent: July 30, 2019
    Assignee: Oracle International Corporation
    Inventors: Qingyuan Kong, Huagang Li, Sankar Subramanian
  • Patent number: 10359942
    Abstract: Systems and methods of deduplication aware scalable content placement are described. A method may include receiving data to be stored on one or more nodes of a storage array and calculating a plurality of hashes corresponding to the data. The method further includes determining a first subset of the plurality of hashes, determining a second subset of the plurality of hashes of the first subset, and generating a node candidate placement list. The method may further include sending the first subset to one or more nodes represented on the node candidate placement list and receiving, from the nodes represented on the node candidate placement list, characteristics corresponding to the nodes represented on the candidate placement list. The method may further include identifying one of the one or more nodes represented on the candidate placement list in view of the characteristic and sending the data to the identified node.
    Type: Grant
    Filed: October 31, 2016
    Date of Patent: July 23, 2019
    Assignee: Pure Storage, Inc.
    Inventors: Robert Lee, Christopher Lumb, Ethan L. Miller, Igor Ostrovsky
  • Patent number: 10359968
    Abstract: Virtual storage domains (VSD) are each associated with unique VSD domain ID associated with a first policy and tagged to a request to a storage system when an entity writes a data set to it. A first hash digest, based on data set content, is calculated and combined with first unique VSD domain ID into a second hash digest associated with data set. When first policy is changed to second policy associated with second VSD, a third hash digest of first data set is calculated, the third hash digest based on content of first data set and on second unique VSD domain ID. If third hash digest does not exist in second VSD, data set is copied to the second VSD; else, reference count of the third hash digest, associated with second VSD domain, is incremented, and reference count of second hash digest, associated with first VSD domain, is decremented.
    Type: Grant
    Filed: January 31, 2018
    Date of Patent: July 23, 2019
    Assignee: EMC IP Holding Company LLC
    Inventors: Xiangping Chen, Anton Kucherov, Junping Zhao
  • Patent number: 10346363
    Abstract: An apparatus and a method for maintaining a file system is described. A method may include receiving a request for allocating a first block of a file system to a file, the first block comprising a first data from the file. The method also includes computing a first hash value by hashing the first data with a first hashing procedure and computing a second hash value by hashing the first data with a second hashing procedure. The method also includes using the first and the second hash values to determine whether a tree structure among a plurality of tree structures has a matching hash value among a plurality of hash values. Each of the plurality of hash values in the tree structure correspond to a block among a plurality of blocks stored in the file system. The method further includes in response to determining that the tree structure has the matching hash value, allocating the corresponding block to the file and updating a reference count of the corresponding block in the tree structure.
    Type: Grant
    Filed: April 27, 2017
    Date of Patent: July 9, 2019
    Assignee: Red Hat, Inc.
    Inventor: James Paul Schneider
  • Patent number: 10346075
    Abstract: Regarding a distributed storage system including a plurality of nodes, a first node among the plurality of nodes judges whether the same data as first data, which is written to a first virtual partial area managed by the first node from among a plurality of virtual partial areas, exists in the virtual partial area managed by another node among the plurality of nodes; when the same data as the first data exists in the other node, the first node executes inter-node deduplication for changing allocation of either one of logical partial areas for the first virtual partial area or the virtual partial area of the other node to which the same data is written, to the other logical partial area; and when I/O load on the first node after execution of the inter-node deduplication of the first virtual partial area and the predicted value is less than a first threshold, the first node executes the inter-node deduplication of a second virtual partial area managed by the first node from among the plurality of virtual partia
    Type: Grant
    Filed: March 16, 2015
    Date of Patent: July 9, 2019
    Assignee: Hitachi, Ltd.
    Inventors: Yasuo Watanabe, Hiroaki Akutsu
  • Patent number: 10339011
    Abstract: A method and system for implementing data lossless synthetic full backups. Specifically, the method and system disclosed herein improves upon traditional synthetic full backup operations by considering all user-checkpoint branches, rather than just the active user-checkpoint branch, representing all chains of incremental changes to a virtual disk of a virtual machine. In considering all user-checkpoint branches, no data pertinent to users involved in the development of the non-active (or inactive) user-checkpoint branches is lost.
    Type: Grant
    Filed: October 27, 2017
    Date of Patent: July 2, 2019
    Assignee: EMC IP Holding Company LLC
    Inventors: Aaditya Rakesh Bansal, Sunil Yadav, Suman Chandra Tokuri, Pradeep Anappa, Soumen Acharya, Sudha Vamanraj Hebsur
  • Patent number: 10341467
    Abstract: Methods and systems for data transfer include adding a data chunks to a priority queue in an order based on utilization priority. A reducibility score for the data chunks is determined. A data reduction operation is performed on a data chunk having a highest reducibility in the priority queue using a processor if sufficient resources are available. The data chunk having the lowest reducibility score is moved from the priority queue to a transfer queue for transmission if the transfer queue is not full.
    Type: Grant
    Filed: January 13, 2016
    Date of Patent: July 2, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Danny Harnik, Alexei Karve, Andrzej Kochut, Dmitry Sotnikov