Patents by Inventor Dmitry Sotnikov

Dmitry Sotnikov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

GAUGING ACCURACY OF SAMPLING-BASED DISTINCT ELEMENT ESTIMATION

Publication number: 20170199892

Abstract: A method, including identifying, using a sampling ratio, a random number of logical data units. A hash is calculated for each of the identified logical data units, and a first histogram is computed indicating a duplication count of each of the calculated hashes. Based on respective frequencies of the calculated hashes, a second histogram is computed indicating observed frequencies of each of the duplication counts in the first histogram, and based on the sampling ratio and the second histogram, a target function is derived. A range of acceptable results is derived for the target function, and based on the range of the acceptable results, a set of plausible duplication frequency histograms is defined. A first given plausible duplication frequency histogram having a highest number of distinct logical data units is identified, and a second given plausible duplication frequency histogram having a lowest number of distinct logical data units is identified.

Type: Application

Filed: January 13, 2016

Publication date: July 13, 2017

Applicant: International Business Machines Corporation

Inventors: Danny Harnik, Ety Khaitzin, Dmitry Sotnikov
NETWORK UTILIZATION IMPROVEMENT BY DATA REDUCTION BASED MIGRATION PRIORITIZATION

Publication number: 20170201602

Abstract: Methods and systems for data transfer include adding a data chunks to a priority queue in an order based on utilization priority. A reducibility score for the data chunks is determined. A data reduction operation is performed on a data chunk having a highest reducibility in the priority queue using a processor if sufficient resources are available. The data chunk having the lowest reducibility score is moved from the priority queue to a transfer queue for transmission if the transfer queue is not full.

Type: Application

Filed: January 13, 2016

Publication date: July 13, 2017

Inventors: Danny Harnik, Alexei Karve, Andrzej Kochut, Dmitry Sotnikov
LOW MEMORY SAMPLING-BASED ESTIMATION OF DISTINCT ELEMENTS AND DEDUPLICATION

Publication number: 20170199904

Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include partitioning a dataset into a full set of logical data units, and selecting a sample subset of the full set, the sample subset including a random sample of the full set based on a sampling ratio. A set of target hash values are selected from a full range of hash values, and, using a hash function, a respective unit hash value is calculated for each of the logical data units in the sample subset. A histogram is computed that indicates a duplication count of each of the unit hash values that matches a given target hash value, and based on the histogram, a number of distinct logical data units in the full set is estimated.

Type: Application

Filed: January 13, 2016

Publication date: July 13, 2017

Applicant: International Business Machines Corporation

Inventors: Danny Harnik, Ety KHAITZIN, Dmitry SOTNIKOV
SAMPLING-BASED DEDUPLICATION ESTIMATION

Publication number: 20170199895

Abstract: A method, including partitioning a dataset into a first number of data units, and selecting, based on a sampling ratio, a second number of the data units. A hash value is calculated for each of the selected data units, and a first histogram is computed indicating a first duplication count for each of the calculated hash values. Based on respective frequencies of the calculated hash values, a second histogram is computed indicating an observed frequency for each of the first duplication counts in the first histogram, and based on the sampling ratio and the second histogram, a target function is derived. A third histogram that minimizes the target function is derived, the third histogram including, for the first number of the storage units, second duplication counts and a respective predicted frequency for each of the second duplication counts. Finally, a deduplication ratio is determined based on the third histogram.

Type: Application

Filed: January 13, 2016

Publication date: July 13, 2017

Inventors: Danny Harnik, David Chambliss, Oded Margalit, Dmitry Sotnikov
Durability and availability evaluation for distributed storage systems

Patent number: 9678824

Abstract: Embodiments include evaluating durability and availability of a distributed storage system. Aspects include receiving a configuration of the distributed storage system, identifying a failure model for each component of the distributed storage system. Aspects also include generating a series of failure events for each component of the distributed storage system based on the failure model and calculating a recovery time for each failed component based on a network recovery bandwidth, a disk recovery bandwidth, a total capacity of simultaneous failed storage devices and a resiliency scheme used by the in the distributed storage system. Aspects further include collecting data regarding the series of failures and the recovery times, calculating an observed distribution of component failures from the collected data and calculating the availability and durability of the distributed storage system based on the observed distribution of component failures and using probabilistic durability and availability models.

Type: Grant

Filed: November 5, 2015

Date of Patent: June 13, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Amir Epstein, Michael E. Factor, Elliot K. Kolodner, Dmitry Sotnikov
Fully distributed intelligent rebuild

Patent number: 9665446

Abstract: A globally distributed scan list is determined. A determination is made whether the first data replica in the first plurality of data stored on a first device is in sync with a second data replica in the second plurality of data on a second device. In response to determining that the first data replica is not in sync with the second data replica, the first data replica is added to an unsynced queue. The neighbor data of the first plurality of data is added to a suspect queue. The priority to check the neighbor data is increased if the neighbor data is already in the suspect queue. Unsynced neighbor data is added to the unsynced queue. The priority for recovery of the data in the unsynced queue is determined. The priority is based on the vulnerability of the data. A data replica in the unsynced queue is recovered.

Type: Grant

Filed: December 29, 2015

Date of Patent: May 30, 2017

Assignee: International Business Machines Corporation

Inventors: David Hadas, Dmitry Sotnikov, Paula K. Ta-Shma
Virtual Failure Domains for Storage Systems

Publication number: 20170147458

Abstract: A method for storage systems improvement includes collecting information that indicates one or more failure correlations for disks in a storage system. The disks are then separated into a plurality of virtual failure domains based on the indicated one or more failure correlations. The method then determines that all data objects of a set of redundant data objects are included in a first virtual failure domain. Responsive to determining that all data objects of the set of redundant data objects are included in the first virtual failure domain, the method then migrates at least one data object of the set of redundant data objects from a first disk in the first virtual failure domain to a second disk in a second virtual failure domain.

Type: Application

Filed: November 20, 2015

Publication date: May 25, 2017

Inventors: Amir Epstein, Michael E. Factor, Danny Harnik, Ronen I. Kat, Elliot K. Kolodner, Dmitry Sotnikov
DURABILITY AND AVAILABILITY EVALUATION FOR DISTRIBUTED STORAGE SYSTEMS

Publication number: 20170132056

Abstract: Embodiments include evaluating durability and availability of a distributed storage system. Aspects include receiving a configuration of the distributed storage system, identifying a failure model for each component of the distributed storage system. Aspects also include generating a series of failure events for each component of the distributed storage system based on the failure model and calculating a recovery time for each failed component based on a network recovery bandwidth, a disk recovery bandwidth, a total capacity of simultaneous failed storage devices and a resiliency scheme used by the in the distributed storage system. Aspects further include collecting data regarding the series of failures and the recovery times, calculating an observed distribution of component failures from the collected data and calculating the availability and durability of the distributed storage system based on the observed distribution of component failures and using probabilistic durability and availability models.

Type: Application

Filed: November 5, 2015

Publication date: May 11, 2017

Inventors: AMIR EPSTEIN, MICHAEL E. FACTOR, ELLIOT K. KOLODNER, DMITRY SOTNIKOV
REAL-TIME IDENTIFICATION OF DATA CANDIDATES FOR CLASSIFICATION BASED COMPRESSION

Publication number: 20170132273

Abstract: Identification of data candidates for data processing is performed in real time by a processor device in a computing environment. Data candidates are sampled for performing a classification-based compression upon the data candidates. A heuristic is computed on a randomly selected data sample from the data candidate, the heuristic computed by, for each one of the data classes, calculating an expected number of characters to be in a data class, calculating an expected number of characters that will not belong to a predefined set of the data classes, and calculating an actual number of the characters for each of the data classes and the non-classifiable data.

Type: Application

Filed: January 25, 2017

Publication date: May 11, 2017

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan AMIT, Lilia DEMIDOV, George GOLDBERG, Nir HALOWANI, Ronen I. KAT, Chaim KOIFMAN, Sergey MARENKOV, Dmitry SOTNIKOV
OPTIMIZATION OF DATA DEDUPLICATION

Publication number: 20170116229

Abstract: Various embodiments for optimizing deduplication in a computing storage environment by a processor. Links between data regions are intelligently formed, based on up-to-date popularity statistics, including a number of times a particular one of the data regions was a target for a potential link with another one of the data regions.

Type: Application

Filed: October 21, 2015

Publication date: April 27, 2017

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Danny HARNIK, Ben SASSON, Yosef SHATSKY, Dmitry SOTNIKOV
Enhancing reliability of a storage system by strategic replica placement and migration

Patent number: 9635109

Abstract: Machines, systems and methods for optimizing data replication in a distributed storage network, the method comprising determining a need to create a replica for a data item in a remote failure zone in a data storage network; creating a temporary replica of the data item in a local failure zone defined in the data storage network, in response to determining that it is beneficial to create the temporary replica in the local failure zone based on a cost versus reliability improvement analysis; attempting to create the replica in the remote failure zone; and removing the temporary replica from the local failure zone, in response to successfully creating the replica in the remote failure zone.

Type: Grant

Filed: January 2, 2014

Date of Patent: April 25, 2017

Assignee: International Business Machines Corporation

Inventors: Ilias Iliadis, Elliot K. Kolodner, Dmitry Sotnikov, Paula K Ta-Shma, Vinodh Venkatesan
INTRA-RACK AND INTER-RACK ERASURE CODE DISTRIBUTION

Publication number: 20170068475

Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include detecting multiple sets of storage objects stored in a data facility including multiple server racks, each of the server racks including a plurality of server computers, each of the storage objects in each set being stored in a separate one of the server racks and including one or more data objects and one or more protection objects. A specified number of the storage objects are identified in a given server rack, each of the identified storage objects being stored in a separate one of the server computers, and one or more server computers in the given server rack not storing any of the identified storage objects are identified. Finally, in the identified one or more server computers, an additional protection object is created and managed for the identified storage objects.

Type: Application

Filed: November 17, 2016

Publication date: March 9, 2017

Inventors: Danny Harnik, MICHAEL FACTOR, DMITRY SOTNIKOV, PAULA TA-SHMA, Lukas Kull, Thomas Morf
Real-time identification of data candidates for classification based compression

Patent number: 9588980

Abstract: Identification of data candidates for data processing is performed in real time by a processor device in a distributed computing environment. Data candidates are sampled for performing a classification-based compression upon the data candidates. A heuristic is computed on a randomly selected data sample from the data candidate, the heuristic computed by, for each one of the data classes, calculating an expected number of characters to be in a data class, calculating an expected number of characters that will not belong to a predefined set of the data classes, and calculating an actual number of the characters for each of the data classes and the non-classifiable data.

Type: Grant

Filed: June 22, 2015

Date of Patent: March 7, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan Amit, Lilia Demidov, George Goldberg, Nir Halowani, Ronen I. Kat, Chaim Koifman, Sergey Marenkov, Dmitry Sotnikov
HETEROGENEOUS COMPRESSION IN REPLICATED STORAGE

Publication number: 20170060976

Abstract: Various embodiments for data management in a replicated storage environment, by a processor device, are provided. In one embodiment, a method comprises storing a plurality of data replicas under a plurality of heterogeneous compression algorithms, wherein one of the data replicas is optimized for a data operation.

Type: Application

Filed: August 25, 2015

Publication date: March 2, 2017

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Danny HARNIK, Ety KHAITZIN, Sergey MARENKOV, Dmitry SOTNIKOV
Real-time reduction of CPU overhead for data compression

Patent number: 9564918

Abstract: Real-time reduction of CPU overhead for data compression is performed by a processor device in a computing environment. Non-compressing heuristics are applied on a randomly selected data sample from data sequences for determining whether to compress the data sequences. A compression potential is calculated based on the non-compressing heuristics. The compression potential is compared to a threshold value. The data sequences are either compressed if the compress threshold is matched, compressed using Huffman coding if Huffman coding threshold is matched, or stored without compression.

Type: Grant

Filed: January 10, 2013

Date of Patent: February 7, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ron Asher, Danny Harnik, Oded Margalit, Kat I. Ronen, Dmitry Sotnikov
Intra-rack and inter-rack erasure code distribution

Patent number: 9547458

Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include detecting multiple sets of storage objects stored in a data facility including multiple server racks, each of the server racks including a plurality of server computers, each of the storage objects in each set being stored in a separate one of the server racks and including one or more data objects and one or more protection objects. A specified number of the storage objects are identified in a given server rack, each of the identified storage objects being stored in a separate one of the server computers, and one or more server computers in the given server rack not storing any of the identified storage objects are identified. Finally, in the identified one or more server computers, an additional protection object is created and managed for the identified storage objects.

Type: Grant

Filed: December 24, 2014

Date of Patent: January 17, 2017

Assignee: International Business Machines Corporation

Inventors: Danny Harnik, Michael Factor, Dmitry Sotnikov, Paula Ta-Shma
STORAGE DATA REDUCTION ANALYSIS AND FORECAST

Publication number: 20160364401

Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include configuring a storage system to store multiple storage entities for access by one or more host computers in communication with the storage system, and specifying a compression condition including a minimum compression ratio. The storage system can then estimate an expected compression ratio for a given storage entity, compress the given storage entity upon the expected compression ratio meeting the compression condition, and provide, to a given host computer, access to the compressed given storage entity.

Type: Application

Filed: June 12, 2015

Publication date: December 15, 2016

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan AMIT, Nir FRIEDMAN, Danny HARNIK, Chaim KOIFMAN, Sergey MARENKOV, Lior SHLOMOV, Dmitry SOTNIKOV, Shai TAHARLEV
Container-based system administration

Publication number: 20160366104

Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include defining a first multiple of software container configurations and a second multiple of permission sets, and receiving, by a first computer, a request to perform a service operation on a second computer having multiple resources. Upon identifying one or more of the resources that are required for the service operation, a given software container configuration and a given permission set are selected based on the identified one or more resources, and the given software container configuration and the given permission set are conveyed to the second computer. Upon the second computer receiving the given software container configuration and the given permission set, a software container is generated. The software container is opened on the host computer prior to performing the service operation, and closed upon completing the service operation.

Type: Application

Filed: June 11, 2015

Publication date: December 15, 2016

Inventors: GEORGE GOLDBERG, YOSEF MOATTI, Dmitry Sotnikov, YARON WEINSBERG
Adaptive data compression

Patent number: 9515679

Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include accessing, from a sequence of multiple data segments including a first data segment at a first location in the sequence followed by additional data segments having respective additional locations in the sequence, a current given data segment in the sequence. In some embodiments, data to be compressed is received and partitioned into the multiple data segments. The current data segment is compressed the current data segment using a first minimal match length, and a compression ratio is calculated for the compressed current data segment. Based on the compression ratio and the respective location of the current data segment, a second minimal match length is selected, a subsequent data segment that immediately follows the current data segment in the sequence is accessed, and the subsequent data segment is compressed using the second minimal match length.

Type: Grant

Filed: May 14, 2015

Date of Patent: December 6, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
ADAPTIVE DATA COMPRESSION

Publication number: 20160336963

Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include accessing, from a sequence of multiple data segments including a first data segment at a first location in the sequence followed by additional data segments having respective additional locations in the sequence, a current given data segment in the sequence. In some embodiments, data to be compressed is received and partitioned into the multiple data segments. The current data segment is compressed the current data segment using a first minimal match length, and a compression ratio is calculated for the compressed current data segment. Based on the compression ratio and the respective location of the current data segment, a second minimal match length is selected, a subsequent data segment that immediately follows the current data segment in the sequence is accessed, and the subsequent data segment is compressed using the second minimal match length.

Type: Application

Filed: May 14, 2015

Publication date: November 17, 2016

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Danny HARNIK, Ety KHAITZIN, Sergey MARENKOV, Dmitry SOTNIKOV

prev 1 2 3 4 5 next