Patents by Inventor Dmitry Sotnikov
Dmitry Sotnikov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20170199892Abstract: A method, including identifying, using a sampling ratio, a random number of logical data units. A hash is calculated for each of the identified logical data units, and a first histogram is computed indicating a duplication count of each of the calculated hashes. Based on respective frequencies of the calculated hashes, a second histogram is computed indicating observed frequencies of each of the duplication counts in the first histogram, and based on the sampling ratio and the second histogram, a target function is derived. A range of acceptable results is derived for the target function, and based on the range of the acceptable results, a set of plausible duplication frequency histograms is defined. A first given plausible duplication frequency histogram having a highest number of distinct logical data units is identified, and a second given plausible duplication frequency histogram having a lowest number of distinct logical data units is identified.Type: ApplicationFiled: January 13, 2016Publication date: July 13, 2017Applicant: International Business Machines CorporationInventors: Danny Harnik, Ety Khaitzin, Dmitry Sotnikov
-
Publication number: 20170199895Abstract: A method, including partitioning a dataset into a first number of data units, and selecting, based on a sampling ratio, a second number of the data units. A hash value is calculated for each of the selected data units, and a first histogram is computed indicating a first duplication count for each of the calculated hash values. Based on respective frequencies of the calculated hash values, a second histogram is computed indicating an observed frequency for each of the first duplication counts in the first histogram, and based on the sampling ratio and the second histogram, a target function is derived. A third histogram that minimizes the target function is derived, the third histogram including, for the first number of the storage units, second duplication counts and a respective predicted frequency for each of the second duplication counts. Finally, a deduplication ratio is determined based on the third histogram.Type: ApplicationFiled: January 13, 2016Publication date: July 13, 2017Inventors: Danny Harnik, David Chambliss, Oded Margalit, Dmitry Sotnikov
-
Publication number: 20170201602Abstract: Methods and systems for data transfer include adding a data chunks to a priority queue in an order based on utilization priority. A reducibility score for the data chunks is determined. A data reduction operation is performed on a data chunk having a highest reducibility in the priority queue using a processor if sufficient resources are available. The data chunk having the lowest reducibility score is moved from the priority queue to a transfer queue for transmission if the transfer queue is not full.Type: ApplicationFiled: January 13, 2016Publication date: July 13, 2017Inventors: Danny Harnik, Alexei Karve, Andrzej Kochut, Dmitry Sotnikov
-
Publication number: 20170199904Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include partitioning a dataset into a full set of logical data units, and selecting a sample subset of the full set, the sample subset including a random sample of the full set based on a sampling ratio. A set of target hash values are selected from a full range of hash values, and, using a hash function, a respective unit hash value is calculated for each of the logical data units in the sample subset. A histogram is computed that indicates a duplication count of each of the unit hash values that matches a given target hash value, and based on the histogram, a number of distinct logical data units in the full set is estimated.Type: ApplicationFiled: January 13, 2016Publication date: July 13, 2017Applicant: International Business Machines CorporationInventors: Danny Harnik, Ety KHAITZIN, Dmitry SOTNIKOV
-
Patent number: 9678824Abstract: Embodiments include evaluating durability and availability of a distributed storage system. Aspects include receiving a configuration of the distributed storage system, identifying a failure model for each component of the distributed storage system. Aspects also include generating a series of failure events for each component of the distributed storage system based on the failure model and calculating a recovery time for each failed component based on a network recovery bandwidth, a disk recovery bandwidth, a total capacity of simultaneous failed storage devices and a resiliency scheme used by the in the distributed storage system. Aspects further include collecting data regarding the series of failures and the recovery times, calculating an observed distribution of component failures from the collected data and calculating the availability and durability of the distributed storage system based on the observed distribution of component failures and using probabilistic durability and availability models.Type: GrantFiled: November 5, 2015Date of Patent: June 13, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Amir Epstein, Michael E. Factor, Elliot K. Kolodner, Dmitry Sotnikov
-
Patent number: 9665446Abstract: A globally distributed scan list is determined. A determination is made whether the first data replica in the first plurality of data stored on a first device is in sync with a second data replica in the second plurality of data on a second device. In response to determining that the first data replica is not in sync with the second data replica, the first data replica is added to an unsynced queue. The neighbor data of the first plurality of data is added to a suspect queue. The priority to check the neighbor data is increased if the neighbor data is already in the suspect queue. Unsynced neighbor data is added to the unsynced queue. The priority for recovery of the data in the unsynced queue is determined. The priority is based on the vulnerability of the data. A data replica in the unsynced queue is recovered.Type: GrantFiled: December 29, 2015Date of Patent: May 30, 2017Assignee: International Business Machines CorporationInventors: David Hadas, Dmitry Sotnikov, Paula K. Ta-Shma
-
Publication number: 20170147458Abstract: A method for storage systems improvement includes collecting information that indicates one or more failure correlations for disks in a storage system. The disks are then separated into a plurality of virtual failure domains based on the indicated one or more failure correlations. The method then determines that all data objects of a set of redundant data objects are included in a first virtual failure domain. Responsive to determining that all data objects of the set of redundant data objects are included in the first virtual failure domain, the method then migrates at least one data object of the set of redundant data objects from a first disk in the first virtual failure domain to a second disk in a second virtual failure domain.Type: ApplicationFiled: November 20, 2015Publication date: May 25, 2017Inventors: Amir Epstein, Michael E. Factor, Danny Harnik, Ronen I. Kat, Elliot K. Kolodner, Dmitry Sotnikov
-
Publication number: 20170132273Abstract: Identification of data candidates for data processing is performed in real time by a processor device in a computing environment. Data candidates are sampled for performing a classification-based compression upon the data candidates. A heuristic is computed on a randomly selected data sample from the data candidate, the heuristic computed by, for each one of the data classes, calculating an expected number of characters to be in a data class, calculating an expected number of characters that will not belong to a predefined set of the data classes, and calculating an actual number of the characters for each of the data classes and the non-classifiable data.Type: ApplicationFiled: January 25, 2017Publication date: May 11, 2017Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan AMIT, Lilia DEMIDOV, George GOLDBERG, Nir HALOWANI, Ronen I. KAT, Chaim KOIFMAN, Sergey MARENKOV, Dmitry SOTNIKOV
-
Publication number: 20170132056Abstract: Embodiments include evaluating durability and availability of a distributed storage system. Aspects include receiving a configuration of the distributed storage system, identifying a failure model for each component of the distributed storage system. Aspects also include generating a series of failure events for each component of the distributed storage system based on the failure model and calculating a recovery time for each failed component based on a network recovery bandwidth, a disk recovery bandwidth, a total capacity of simultaneous failed storage devices and a resiliency scheme used by the in the distributed storage system. Aspects further include collecting data regarding the series of failures and the recovery times, calculating an observed distribution of component failures from the collected data and calculating the availability and durability of the distributed storage system based on the observed distribution of component failures and using probabilistic durability and availability models.Type: ApplicationFiled: November 5, 2015Publication date: May 11, 2017Inventors: AMIR EPSTEIN, MICHAEL E. FACTOR, ELLIOT K. KOLODNER, DMITRY SOTNIKOV
-
Publication number: 20170116229Abstract: Various embodiments for optimizing deduplication in a computing storage environment by a processor. Links between data regions are intelligently formed, based on up-to-date popularity statistics, including a number of times a particular one of the data regions was a target for a potential link with another one of the data regions.Type: ApplicationFiled: October 21, 2015Publication date: April 27, 2017Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Danny HARNIK, Ben SASSON, Yosef SHATSKY, Dmitry SOTNIKOV
-
Patent number: 9635109Abstract: Machines, systems and methods for optimizing data replication in a distributed storage network, the method comprising determining a need to create a replica for a data item in a remote failure zone in a data storage network; creating a temporary replica of the data item in a local failure zone defined in the data storage network, in response to determining that it is beneficial to create the temporary replica in the local failure zone based on a cost versus reliability improvement analysis; attempting to create the replica in the remote failure zone; and removing the temporary replica from the local failure zone, in response to successfully creating the replica in the remote failure zone.Type: GrantFiled: January 2, 2014Date of Patent: April 25, 2017Assignee: International Business Machines CorporationInventors: Ilias Iliadis, Elliot K. Kolodner, Dmitry Sotnikov, Paula K Ta-Shma, Vinodh Venkatesan
-
Publication number: 20170068475Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include detecting multiple sets of storage objects stored in a data facility including multiple server racks, each of the server racks including a plurality of server computers, each of the storage objects in each set being stored in a separate one of the server racks and including one or more data objects and one or more protection objects. A specified number of the storage objects are identified in a given server rack, each of the identified storage objects being stored in a separate one of the server computers, and one or more server computers in the given server rack not storing any of the identified storage objects are identified. Finally, in the identified one or more server computers, an additional protection object is created and managed for the identified storage objects.Type: ApplicationFiled: November 17, 2016Publication date: March 9, 2017Inventors: Danny Harnik, MICHAEL FACTOR, DMITRY SOTNIKOV, PAULA TA-SHMA, Lukas Kull, Thomas Morf
-
Patent number: 9588980Abstract: Identification of data candidates for data processing is performed in real time by a processor device in a distributed computing environment. Data candidates are sampled for performing a classification-based compression upon the data candidates. A heuristic is computed on a randomly selected data sample from the data candidate, the heuristic computed by, for each one of the data classes, calculating an expected number of characters to be in a data class, calculating an expected number of characters that will not belong to a predefined set of the data classes, and calculating an actual number of the characters for each of the data classes and the non-classifiable data.Type: GrantFiled: June 22, 2015Date of Patent: March 7, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan Amit, Lilia Demidov, George Goldberg, Nir Halowani, Ronen I. Kat, Chaim Koifman, Sergey Marenkov, Dmitry Sotnikov
-
Publication number: 20170060976Abstract: Various embodiments for data management in a replicated storage environment, by a processor device, are provided. In one embodiment, a method comprises storing a plurality of data replicas under a plurality of heterogeneous compression algorithms, wherein one of the data replicas is optimized for a data operation.Type: ApplicationFiled: August 25, 2015Publication date: March 2, 2017Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Danny HARNIK, Ety KHAITZIN, Sergey MARENKOV, Dmitry SOTNIKOV
-
Patent number: 9564918Abstract: Real-time reduction of CPU overhead for data compression is performed by a processor device in a computing environment. Non-compressing heuristics are applied on a randomly selected data sample from data sequences for determining whether to compress the data sequences. A compression potential is calculated based on the non-compressing heuristics. The compression potential is compared to a threshold value. The data sequences are either compressed if the compress threshold is matched, compressed using Huffman coding if Huffman coding threshold is matched, or stored without compression.Type: GrantFiled: January 10, 2013Date of Patent: February 7, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ron Asher, Danny Harnik, Oded Margalit, Kat I. Ronen, Dmitry Sotnikov
-
Patent number: 9547458Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include detecting multiple sets of storage objects stored in a data facility including multiple server racks, each of the server racks including a plurality of server computers, each of the storage objects in each set being stored in a separate one of the server racks and including one or more data objects and one or more protection objects. A specified number of the storage objects are identified in a given server rack, each of the identified storage objects being stored in a separate one of the server computers, and one or more server computers in the given server rack not storing any of the identified storage objects are identified. Finally, in the identified one or more server computers, an additional protection object is created and managed for the identified storage objects.Type: GrantFiled: December 24, 2014Date of Patent: January 17, 2017Assignee: International Business Machines CorporationInventors: Danny Harnik, Michael Factor, Dmitry Sotnikov, Paula Ta-Shma
-
Publication number: 20160364401Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include configuring a storage system to store multiple storage entities for access by one or more host computers in communication with the storage system, and specifying a compression condition including a minimum compression ratio. The storage system can then estimate an expected compression ratio for a given storage entity, compress the given storage entity upon the expected compression ratio meeting the compression condition, and provide, to a given host computer, access to the compressed given storage entity.Type: ApplicationFiled: June 12, 2015Publication date: December 15, 2016Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan AMIT, Nir FRIEDMAN, Danny HARNIK, Chaim KOIFMAN, Sergey MARENKOV, Lior SHLOMOV, Dmitry SOTNIKOV, Shai TAHARLEV
-
Publication number: 20160366104Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include defining a first multiple of software container configurations and a second multiple of permission sets, and receiving, by a first computer, a request to perform a service operation on a second computer having multiple resources. Upon identifying one or more of the resources that are required for the service operation, a given software container configuration and a given permission set are selected based on the identified one or more resources, and the given software container configuration and the given permission set are conveyed to the second computer. Upon the second computer receiving the given software container configuration and the given permission set, a software container is generated. The software container is opened on the host computer prior to performing the service operation, and closed upon completing the service operation.Type: ApplicationFiled: June 11, 2015Publication date: December 15, 2016Inventors: GEORGE GOLDBERG, YOSEF MOATTI, Dmitry Sotnikov, YARON WEINSBERG
-
Patent number: 9515679Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include accessing, from a sequence of multiple data segments including a first data segment at a first location in the sequence followed by additional data segments having respective additional locations in the sequence, a current given data segment in the sequence. In some embodiments, data to be compressed is received and partitioned into the multiple data segments. The current data segment is compressed the current data segment using a first minimal match length, and a compression ratio is calculated for the compressed current data segment. Based on the compression ratio and the respective location of the current data segment, a second minimal match length is selected, a subsequent data segment that immediately follows the current data segment in the sequence is accessed, and the subsequent data segment is compressed using the second minimal match length.Type: GrantFiled: May 14, 2015Date of Patent: December 6, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Danny Harnik, Ety Khaitzin, Sergey Marenkov, Dmitry Sotnikov
-
Publication number: 20160336963Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include accessing, from a sequence of multiple data segments including a first data segment at a first location in the sequence followed by additional data segments having respective additional locations in the sequence, a current given data segment in the sequence. In some embodiments, data to be compressed is received and partitioned into the multiple data segments. The current data segment is compressed the current data segment using a first minimal match length, and a compression ratio is calculated for the compressed current data segment. Based on the compression ratio and the respective location of the current data segment, a second minimal match length is selected, a subsequent data segment that immediately follows the current data segment in the sequence is accessed, and the subsequent data segment is compressed using the second minimal match length.Type: ApplicationFiled: May 14, 2015Publication date: November 17, 2016Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Danny HARNIK, Ety KHAITZIN, Sergey MARENKOV, Dmitry SOTNIKOV