DETECTING MALICIOUS SOFTWARE AND RECOVERING A STORAGE SYSTEM

Info

Publication number: 20230273999
Type: Application
Filed: Feb 28, 2023
Publication Date: Aug 31, 2023
Inventors: Siamak NAZARI (Mountain View, CA), Milad NOORI (Laguna Hills, CA), Shayan ASKARIAN NAMAGHI (San Jose, CA)
Application Number: 18/115,704

Abstract

A storage system sends information on input/output patterns and data blocks written to a cloud-based service for detection of suspected ransomware activity. Analysis of the I/O patterns and data may be performed by the storage system, the cloud-based service, or both. The cloud-based service can instruct the storage system to create or maintain snapshots that allow the storage system continue operation and allow the storage system to roll back for recovery after a ransomware attack is confirmed.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent document is a claims benefit of the earlier filing date of U.S. Provisional Pat. App. No. 63/314,996, filed Feb. 28, 2022, U.S. Provisional Pat. App. No. 63/314,970, filed Feb. 28, 2022, U.S. Provisional Pat. App. No. 63/314,987, filed Feb. 28, 2022, and U.S. Provisional Pat. App. No. 63/316,081, filed Mar. 3, 2022, all of which are hereby incorporated by reference in their entirety.

BACKGROUND

Coud-based services can be delivered on-demand to companies and other users over the Internet and are becoming more available and important. Many cloud-based services are data based or otherwise need to maintain the integrity and security of data, and these services suffer if their data is compromised. Accordingly, malicious attackers have created malware that targets user data. In particular, ransomware, which is one type of malware, may attempt to penetrate a system and encrypt stored files to prevent a user from accessing the files, or in some cases, the ransomware may attempt to lock a user’s system as a whole. Once an attacker controls access to the user data, the attacker demands payment (typically in a form of cryptocurrency) before the attacker will return control of the system or user data back to the user.

Enterprises including providers of cloud-based services often employ cluster storage systems to meet data storage needs. Cluster storage systems may distribute data volumes to a set of storage nodes forming a cluster, and one technique that cluster storage can use to improve data security is to maintain backup or mirror copies of volumes. A storage node maintaining a backup or mirror copy of a base volume is generally not the storage node that maintains the base volume, so that the storage node maintaining the backup copy can be used if the storage node maintaining the base volume becomes unavailable. In a mirrored volume environment, the storage cluster synchronizes the base and backup volumes, so that if the ransomware encrypts base volumes, the backup volumes may also be encrypted. After ransomware has penetrated the system and encrypted files, the user may therefore be unable to access their data even from the backup. Some prior systems and methods for defending against ransomware try to detect and prevent ransomware when the ransomware tries to take control of a system or its data. However, detection and blocking of ransomware can be difficult and is subject to false positives that may disrupt or interfere with the efficiency or operation of a user’s system.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating an architecture for a storage system with cloud-based malware detection in accordance with one example of the present disclosure.

FIG. 2 is a flow diagram of a process in which a storage node or storage processing unit in accordance with an example of the present disclosure provides storage service and reports to a cloud-based service that detects malware.

FIG. 3 is a flow diagram of a process in which a cloud-based service in accordance with an example of the present disclosure analyzes reported storage information to detect anomalies suggesting a malware attack.

FIG. 4 is a flow diagram of a process in which a storage system in accordance with an example of the present disclosure can recover from a confirmed malware attack.

The drawings illustrate examples for the purpose of explanation and are not of the invention itself. Use of the same reference symbols in different figures indicates similar or identical items.

DETAILED DESCRIPTION

A storage system in accordance with an example of the present disclosure may employ one or more storage nodes implemented in one or more servers with each storage node including at least one storage or service processing unit (SPU). Each of the SPUs provides storage services to storage clients seeking use of virtual volumes that the SPU maintains. In maintaining the virtual volumes, the SPUs always write new data for the virtual volumes to empty physical storage locations, i.e., physical storage location that do not currently store valid data. The SPUs do not overwrite old data of the virtual volumes, and the old data remains in physical storage until a garbage collection process determines that the old data is unneeded and therefore invalid. Each SPU has a communication link to a cloud-based service, and the SPUs when performing storage services can report I/O information and data analysis to the cloud-based service. Based on the I/O information and data analysis the SPUs report, the cloud-base service may employ methodologies to detect anomalies in storage activity suggesting a malware attack, and the cloud-based service may instruct the SPUs to retain the old data that anomalous activity was overwriting. If an attack is confirmed, e.g., ransomware or other malware is later determined to have encrypted or taken control of data, the SPUs still retain the old data, e.g., in snapshots of virtual volumes, and the virtual volumes may be rolled back to a state where the data is an unencrypted and accessible.

FIG. 1 is a block diagram illustrating a storage architecture including a storage platform 100 in accordance with one example of the present disclosure. Storage platform 100 includes one or more host servers 110-1 to 110-N, which are sometimes generically referred to herein as host server(s) 110. Each host server 110 may be a conventional computer or other computing system including one or more central processing unit (CPU), memory, and interfaces for connections to internal or external devices. FIG. 1 shows host servers 110-1 to 110-N having respective service or storage processing units (SPUs) 120-1 to 120-N, which are sometimes generically referred to herein as SPU(s) 120. SPUs 120-1 to 120-N may be installed in respective host servers 110-1 to 110-N, e.g., as daughterboards attached to the motherboards of servers 110. More generally, storage platform 100 may include one or more host servers 110, with each server 110 hosting one or more SPUs 120. A minimum configuration of storage platform 100 may include a single host server 110 in which one SPU 120 resides. To improve redundancy, storage platform 100 may be a cluster storage system using at least two host servers 110 or at least at least two SPUs 120, but more generally, a limitless number of different configurations are possible containing any number of host servers 110 and any number of SPUs 120. Storage platform 100 may accordingly be scaled to larger storage capacities by adding more SPUs 120 with associated backend storage.

Each SPU 120 has hardware including a host interface 122, communication interfaces 124, a storage interface 128, and a processing system 130.

Host interface 122 provides communications between the SPU 120 and its host server 110. For example, each SPU 120 may be installed and fully resident in the chassis of an associated host server 110, and each SPU 120 may be a card, e.g., a PCI-e card, or printed circuit board with a connector or contacts that plug into a slot in a standard peripheral interface, e.g., a PCI bus in host server 110. Host interface 122 includes circuitry that complies with the protocols of the host server bus.

Communication interfaces 124 in an SPU 120 provide communications with other SPUs 120 and to other network connected devices. Multiple SPUs 120, e.g., SPUs 120-1 to 120-N in FIG. 1, may be interconnected using high speed data links 125, e.g., one or more parallel 10, 25, 50, 100 or more Gbps Ethernet links, that form a dedicated data network for a pod or cluster of SPUs 120 in storage platform 100. Data links 125 may particularly form a high-speed data network that is independent of a private network 160 of the enterprise or other user of storage platform 100. Communication interfaces 124 may also allow each SPU 120 to communicate through private network 160 with user devices 162 and 164 on private network 160 and through private network 160, a firewall 161, and a public network 170, e.g., the Internet, to communicate with a remote or cloud-based infrastructure 180.

Storage interfaces 128 in SPUs 120-1 to 120-N include circuitry and connectors for attachment to devices of respective backend storage 150-1 to 150-N, sometimes generically referred to herein as backend or persistent storage 150. Each SPU 120 may thereby control its backend storage 150. Backend storage 150 may employ, for example, hard disk drives, solid state drives, or other nonvolatile/persistent storage devices or media in which data may be physically stored, and backend storage 150 particularly may have a redundant array of independent disks (RAID) 5 or 6 configuration for performance and redundancy.

Processing system 130 in an SPU 120 includes one or more microprocessors, microcontrollers, or CPUs 132 with memory 134 that the SPU 120 employs to manage one or more physical storage devices of backend storage 150 and provide storage services to clients. In the illustrated example, processing system 130 particularly implements a set of modules including a management module 141, an I/O processor 142, a garbage collection module 143, and a data analysis module 144. In other examples, SPU 120 may additionally implement modules that provide other storage functions such as data deduplication, encryption and decryption, or compression and decompression. PCT Pub. No. WO 2021/150576 A1, entitled “Primary Storage with Deduplication” describes some examples of storage systems with additional storage functions such as deduplication, which is hereby incorporated by reference in its entirety.

Management module 141 controls processes such as a setup or configuration process for an SPU 120 and communications with cloud-based infrastructure 180. I/O processor 142 processes storage service requests such as read and write requests from storage clients and performs storage operations to fulfill storage service request. In accordance with an aspect of the present disclosure, data analysis module 144 may perform analysis of data associated with the storage service requests to help detect encrypted data or the activities of malware such as ransomware. In one example of the current disclosure, data analysis module 144 may periodically sample data blocks (e.g., one 8 kb block of data for every 256th I/O per virtual volume) and may analyze or tests the encryption status of the sampled data, i.e., to determine whether the incoming block is encrypted or not. The SPU 120, e.g., management module 141, can communicate to cloud-based infrastructure 180 information regarding the storage services that I/O processor 142 handles and regarding the analysis results from data analysis module 144. Cloud-based infrastructure 180 may have its own analytics service 186 that analyzes the information regarding I/O processes and analysis results from data analysis module 144 to determine whether real-time I/O processes suggest a ransomware or other malware attack, and if an attack is suspected, a management services 182 provided by cloud-based infrastructure 180 may instruct management module 141 in the SPU 120 to preserve old data as described further below.

FIG. 1 illustrates storage platform 100 after each SPU 120 has been set up to provide storage services to storage clients via virtual volumes or logical unit numbers (LUNs). FIG. 1 particularly shows SPU 120-1 provides storage services relating to a boot volume BT1 and one or more other virtual volumes V1, and SPU 120-N provides storage services relating to a boot volume BTN and one or more other virtual volumes VN. Each SPU 120 generally exports only one boot volume, and boot volumes BT1 to BTN may be “unshared” virtual volumes and only used by respective host server 110-1 to 120-N, e.g., when powering up or rebooting. SPU 120-1 is sometimes referred to as “owning” virtual volumes BT1 and V1 in that SPU 120-1 is normally responsible for fulfilling I/O requests that are directed at any of volumes BT1 and V1. Similarly, SPU 120-N owns virtual volumes BTN and VN in that SPU 120-N is normally responsible for executing I/O requests that are directed at any of volumes BTN and VN. SPUs 120-1 to 120-N also maintain respective sets of backup volumes BK1 to BKN that mirror base virtual volumes owned by other SPUs 120 and maintain respective sets of snapshots S1 to SN that logically preserve the state that boot volumes BT1 to BTN, backup volumes BK1 to BKN, or other volumes V1 to VN had at a time of the snapshot.

I/O processors 142 of SPUs 120-1 to 120-N generally perform storage services in response to storage service requests targeting the virtual volumes that the SPUs 120-1 to 120-N own. In some implementations of storage platform 100, storage clients, e.g., applications 112 running on a host server 110 or a user device 162 or 164, may request storage service through an SPU 120 resident in the host server 110 associated with the storage client. The I/O processor 142 of the resident SPU 120 may receive the storage service request and provide the requested storage service if the SPU 120 owns the targeted virtual volume or may forward the storage service request through data network 125 to another SPU 120, e.g., to the SPU 120 that owns the virtual volume that the storage service request targeted.

In accordance with an example of the current disclosure, each I/O processor 142 maintains a set of generation numbers 136, each generation number corresponding to an associated virtual volume, and the I/O processor 142 uses a current value of the generation number for a virtual volume to tag and uniquely distinguish each I/O process that changes the content of that virtual volume. For example, for each write request that SPU 120-1 receives requesting writing data to an address or offset in a virtual volume V1, the I/O processor 142 of SPU 120-1 may increment the generation number 136 for the volume V1 and tag (or otherwise identify) the write request using the current value of generation number 136 for the virtual volume V1. The next write request to volume V1 will be tagged with the next value of generation number 136.

I/O processor 142, during each write operation, may record in a data index 138 an entry in which the generation number and volume/offset of the write operation are mapped to the physical storage locations where write data is stored in backend storage 150. Data index 138 may be any type of database, but in one example of the present disclosure, data index 138 is a key-value store where entries in data index 138 including a key and a value. The key in each entry contains a generation number of a write operation and a volume ID and offset or address in the virtual volume for the write operation, and the value in the entry contains a pointer to the physical location in backend storage 150 containing the data pattern written. When reading from a base volume, I/O processor 142 may query data index 138 to find the entries that correspond to the volume/offset to be read, and of those entries, the I/O processor 142 uses the entry having the newest generation number to identify where the requested data is in backend storage 150. The entries having older generation numbers may be required for snapshots or may be garbage that garbage collection module 143 can identify and reclaim for storage of new data. When reading from a snapshot, I/O processor 142 may query data index 138 to find the entries that correspond to the volume/offset to be read and of those entries uses the entry having the newest generation number that is at least as old as the snapshot, newer entries being ignored. Garbage collection module 143 acts to preserve any entries in data index 138 may be needed for reading any virtual volume or snapshot. Garbage collection module 143 can reclaim entries and identified data that are not needed for any virtual volume or snapshot.

Private network 160, as noted above, may provide a connection through firewall 161 to public network 170, so that user devices 162 and 164, servers 110, and SPUs 120 may communicate with remote devices and particularly with cloud-based infrastructure 180. Cloud-based infrastructure 180 may include a computer or server that is remotely located from host servers 110 and from user devices 162 and 164, and cloud-based infrastructure 180 may provide management service 182 for configuration and management of storage platform 100 to thereby reduce the burden of storage management on an enterprise using storage platform 100. Management service 182, for example, use an image library 180 to provide SPUs 120 with operating system or software images, provisioning or configuration setting, and operating instructions and thus allows an enterprise to offload the burden of storage setup and management to an automated process that cloud-based management 180 and the SPUs 120 execute. Management service 182 may particularly be used to configure SPUs 120 in a pod or cluster in storage platform 100, to monitor the performance of storage platform 100, to provide analysis services 186, or provide recovery services for storage platform 100. Management service 182, during a setup process, may determine an allocation of storage volumes to meet the needs of an enterprise or other users of storage platform 100, distribute the allocated volumes to SPUs 120-1 to 120-N, and create recipes for SPUs 120 to execute to bring storage platform 100 to the desired working configuration such as illustrated in FIG. 1. Management service 182 may also keep platform data 184 indicating information about storage platform 100, including what hardware, operating systems, and applications are included in or served by storage platform 100. During normal operation of storage platform 100, management service 182 may also employ an analytics module 186 to analyze I/O activity in storage platform 100 and data analytic information from SPUs 120 to detect malware activities such as encrypting data for ransom.

For detection malicious activity, SPUs 120-1 to 120-N can collect I/O information and analyze data blocks. I/O information may identify a time-series I/O operations on a per block level for all virtual volumes BT1 to BTN and V1 to VN. Data block analysis may include performing a subset of NIST-800 tests and using a multi-layer perceptron to predict whether a block of data is encrypted. For example, data analysis module 144 in an SPU 120 may only periodically analyze a byte distribution of an 8kb page of write data or may implement additional analysis techniques such as convolutional neural networks (CNN) and other statistical tests to examine the entire 8 kb page. An SPU 120 may report the I/O information or results from analysis module 144 to management service 182 or analytics service 186 in cloud-based infrastructure 180.

FIG. 2 is a flow diagram of a process 200 in accordance with an example of the present disclosure in which an SPU may report I/O information or data analysis information to a cloud-based service. In an initial process block 210 of process 200, the SPU receives a storage service request such as a write request that requests a change in a virtual volume that the SPU owns. In a block 220, the SPU performs a storage operation corresponding to the storage service request. In accordance with an example of the present disclosure as described above, the performance of the storage operation in block 220 avoids overwriting any old data in backend storage and only stores the new or changed data in backend storage at a physical location that was not storing valid data, i.e., a location designated to be empty during initialization of the storage system or during a garbage collection process that reclaims storage locations that no longer store required data. As a result of the SPU performing block 220, old data that was logically overwritten in the virtual volume still remains accessible in backend storage until a garbage collection process may later reclaim the physical storage location containing the old data. Further, because of the way the SPU handles data, ransomware or other malware, which may be running on a host server, cannot change or delete old data in the backend storage.

The SPU, in a process block 230, may analyze the data associated with the storage operation performed. For example, an SPU may analyze every storage operation that writes to a page in a virtual volume or may only analyze a sampling of the pages written. The number of pages analyzed may be chosen to minimize the impact that the processing has on storage perform, and in one example, each SPU analyzes 1 in 256 of the pages written to each virtual volume the SPU owns.

A primary focus of the analysis that process block 230 performs may be to determine whether the data is encrypted, and any desired tests or techniques for identifying encrypted data may be employed. In one example, the SPU may analyze I/O blocks using a series of statistical randomness tests, including one or more of the Frequency (Monobit) Test, Index of Coincidence, Chi-Square Test, Chi-Squared Test on Binary Bit Distribution (such as NIST-800-22 discloses), and the SPU may employ a Multi-Layer Perceptron (MLP) on the sampled byte distribution. (In machine learning, a perceptron is an algorithm for supervised learning of binary classifiers, which are functions that can determine whether or not an input vector belongs to some specific class.)

A reporting process 240 may follow analysis process 230. The storage platform or one or more of the SPUs in the storage platform may perform reporting process 240 to report information, e.g., I/O pattern information and encryption statistics, to the cloud-based service. In one example of reporting process 240, the SPU reports to a cloud-based service (e.g., management service 182 or analytics service 186 of FIG. 1) information concerning one or more storage operations that the SPU performed and concerning any analysis results. An SPU may report to the cloud-based service each time the SPU performs a storage operation that change a virtual volume. In another example, the SPU reports only after a stream or series containing a specified number of changes to any storage volume. In one specific example, the host machine or hypervisor sends write requests to the SPU. These requests go through the Storage Performance Development Kit (SPDK), and the SPDK processes 8 kb data blocks. The raw 8 kb of data pages at the SPDK layer may be sampled, analyzed, and combined. In another example, the SPU may collect I/O information from multiple storage operations before the SPU performs process block 240 periodically to send collected I/O information to the cloud-based service. For the I/O information associated with a volume, the information reported to the cloud-based service may include a volume ID along with a histogram of block compression ratios and the total number of blocks processed within a time span, e.g., within 30 seconds to a few minutes. The analysis module of the SPU may collect a histogram of the encrypted and non-encrypted block counts, along with the total number of blocks sampled for encryption analysis during the time span. All of this information may be accompanied by respective timestamps. The SPU can continue normal operation, e.g., processing of storage service requests, throughout process 200 unless the cloud-based service detects an anomaly and informs the SPU (or another component of the storage platform) of the anomaly as described further below.

FIG. 3 is a flow diagram of a process 300 in which a cloud-based service may analyze I/O information received from one or more SPUs in a storage platform. In a block 310 of process 300, the cloud-based service receives new information that may be relevant to one or more virtual volumes that the storage platform provides. The cloud-based service can add, in a block 320, the new information to an analysis pool. The analysis pool may, for example, include I/O information or analysis results that the storage platform generated during a running time window.

In a block 330, the cloud service analyzes the I/O data in the analysis pool. The cloud-base service could use many different analysis techniques to identify an anomaly that may suggest the activity of malware such as ransomware. The cloud-based service may, for example, implement encoder-decoder (AE) models that are trained using the information from the storage platform. The models may be trained during training periods, e.g., every five days, to recognize normal storage patterns for respective volumes in the storage platform. The models for the virtual volumes of the storage platform generally depend on the specifics of the storage platform and the storage client activity. In the event the cloud-based service does not have a model for a virtual volume, a model can be created using the previous I/O information spanning the required training period, e.g., five days. If there is an existing model, the existing model can be fine-tune with the unseen data collected in a past period. In one specific example, the data used for training the model is the histogram of the compression ratio and total I/O size recorded at a suitable frequency, e.g., every two minutes. Each AE model analyzes a window of data points, e.g., 20 data points equivalent to 40 minutes at 2 minutes per data point. The primary objective of the encoder-decoder model is to learn the write patterns of storage clients and to raise an alert if any unusual or suspicious pattern is detected.

The cloud-based service may perform anomaly detection using the encryption statistic signal from the storage platform. The cloud-based service may employ CUSUM (Cumulative Sum) to detect a level shift in the encryption percentage signal. The level shift may indicate the activity of ransomware because when ransomware is encrypting data on a machine, the percentage of encrypted data being written increases and creates a noticeable shift in the encryption signal.

Another part of anomaly detection may look at the entire 8kb sampled data from the SPDK and use a larger sub-set of NIST-800-22 tests along with a convolutional neural net (CNN) to distinguish between encrypted blocks and non-encrypted blocks. A CNN may be used since non-encrypted data has special dependencies while encrypted blocks should not have any patterns or special correlation.

A publicly available dataset may be used for training models and testing anomaly detection. For example, a public dataset containing approximately a suitable quantity, e.g., 100 GB, of compressed files, such as zip, gzip, tar, mkv, mp4, pdf, and more could be parsed into a series of 8 KB pages. The byte distribution of the data can be calculated and used this as the ground truth for non-encrypted data. Next, AES-256 can be used to encrypt the data and repeated the process, storing the results as encrypted data.

In information theory, a string, e.g., an 8kb page, is considered random if there is no shorter description of that string. However, when compression algorithms are applied to a data stream, the compressed version shares certain characteristics with the original uncompressed stream, which means that the compressed stream cannot be considered truly random, despite the fact that the difference in entropy between the compressed and encrypted streams may not be significant. This resemblance between the compressed and uncompressed data can be leveraged to differentiate between compressed and encrypted data. In contrast, encrypted data is truly random by definition because symmetric block encryption algorithms (such as AES-128, AES-256, etc.) use a random vector to XOR with the block of data, thereby producing a completely random output.

Entropy and compressibility are a data metrics that may be calculated and used. Compression aims to minimize the number of bytes used to store information. The entropy of compressed data is higher than that of uncompressed data, as fewer bits are used to represent the same information. This means that each bit in compressed data carries more information, resulting in an increased entropy. A similar phenomenon occurs in encryption, where the number of bits representing the data does not decrease, but each bit carries the same amount of information since it is XORed with a random vector. Therefore, randomness tests may be used to differentiate between compressed and encrypted data.

The cloud-based service may also identify I/O access patterns, e.g., whether writes are directed sequential or random addresses, as being an indicator of ransomware or other malware activity. In accordance with another example of the present disclosure, the cloud service can analyze I/O information using statistical analysis, artificial intelligence (A/I) and machine learning techniques, and predictive modeling to detect anomalies in I/O patterns. For statistical analysis, the cloud-based service may, for example, analyze historic data activity or patterns and compare the historic data activity or patterns with real-time data activity or patterns represented in the analysis pool. Anomalies or discrepancies between historic and real-time activity may indicate the activity of ransomware. The anomalies or discrepancies may be detected using techniques including but not limited to using intersection over union (IoU) of incoming and historic I/O.

Another technique that a cloud-based service may use is cross entropy loss over incoming I/O and historic I/O compression histogram and entropy histograms. For cross entropy loss over incoming I/O access patterns and historic access patterns, artificial intelligence (A/I) and machine learning and predictive modeling may take advantage of unsupervised learning and detect any anomalies in incoming I/O patterns as they occur. Alternatively, supervised learning may collect in lab data by running ransomware in a sandbox and use the data to train multi-layer perceptron (MLP) networks to predict whether data is clean or encrypted by ransomware. Again, cross entropy loss is a loss function used in training a machine learning model such as MLP. Cross entropy and a set of other loss functions may be used to optimize models.

Collected I/O information may be used to train support vector machines to be able to group different file types together and detect ransomware-infected files. Furthermore, using the kernel trick to enhance accuracy. Kernel trick is a technique to transfer the data into a higher dimension without computing the coordinates of the data in that dimension. A kernel function may be used to calculate the similarity between pairs of instances.

In process 300, a decision block 340 determines whether an anomaly has been detected. If an anomaly suggesting ransomware or other malware activity is detected, the cloud-based service alerts the storage platform, e.g., one or more SPU 120 in storage platform 100 of FIG. 1, and the cloud-based service may suggest a timestamp for snapshots that would provide a safe roll back of volume data, i.e., a roll back to a time before the suspect activity occurred. The storage platform may then take steps to ensure that snapshots corresponding to the time stamp are preserved. The storage platform does not need to do an immediate roll back but may instead perform a roll back to the suggested timestamp only after a malware attack is confirmed. A user may request that the storage system 100 roll back to a healthy unencrypted state.

FIG. 4 is a flow diagram of a process 400 for addressing a ransomware attack. In a block 410, a user detects that a ransomware attack has occurred, for example, when the user data is found to be encrypted and therefore in accessible. At that time, the user can instruct the storage platform to roll back virtual volume to snapshots corresponding to the timestamp that the cloud-based service provided. A rollback operation can promote snapshots created at or before the timestamp the cloud-based service provided. See FIG. 3, block 350. The rollback operation can simply control read operations so that data written between a generation number that was assigned to the snapshot and a generation at the time of the rollback is ignored. After which, the old and unencrypted data becomes accessible again.

As disclosed herein, systems and methods can automatically detect a ransomware attack and suggest a timestamp to roll back the volume data to the latest point at which, with high probability, the system was unencrypted by ransomware. The use of a cloud-based service may solve problems by automating all the processes of creating, distributing, and managing virtual volumes in a storage platform, and the cloud-based service may eliminate the need of having a dedicated storage administrator while also removing the need for guess work and hours and hours of experimentation to get the right setup for a storage platform. Futher, the user of the storage platform does not require ransomware detection software on host systems. The storage architecture with cloud services already has that capability.

In examples of the systems and methods disclosed herein, can avoid conventional ransomware detection techniques that have high false positive rates and that are CPU intensive. The cloud-based solution reduces the processing load on the devices performing the I/O operations, which may improve storage performance.

All or portions of some of the above-described systems and methods can be implemented in a computer-readable media, e.g., a non-transient media, such as an optical or magnetic disk, a memory card, or other solid state storage containing instructions that a computing device can execute to perform specific processes that are described herein. Such media may further be or be contained in a server or other device connected to a network such as the Internet that provides for the downloading of data and executable instructions.

Although particular implementations have been disclosed, these implementations are only examples and should not be taken as limitations. Various adaptations and combinations of features of the implementations disclosed are within the scope of the following claims.

Claims

1. A process for operating a storage system including a plurality of servers containing a plurality of service processing units (SPUs), the process comprising:

the SPUs receiving a series of storage service requests from client;

the SPUs performing storage operations to fulfill the storage service requests;

the SPUs reporting information on the storage operations to a cloud-based service; and

the cloud-based service analyzing the information to detect an indicator of malware activity.

2. The process of claim 1, further comprising, in response to detecting malware the indicator of malware activity, the cloud-based service instruction the SPUs to maintain snapshots of data in a state before the indicator occurred.

3. The process of claim 1, further comprising:

training a model using past information on the storage operations; and

the cloud-based service detecting the indicator of malware activity based on a difference between the model and the information reported by the SPUs.

4. The process of claim 1, further comprising the SPUs analyzing blocks of data written by performance of the storage operations, the information reported to the cloud-based service indicating results from the SPUs analyzing the blocks of data.

5. A storage system comprising:

one or more storage nodes, each storage node including: a server; a storage device; and a storage processing unit connected to the storage device, the storage processing unit operating the storage device to physically store data of one or more virtual volumes and to provide storage services to clients using the data of the virtual volumes; and

a cloud-based infrastructure in communication with the storage processing units in the storage nodes, the cloud-based infrastructure being configured to analyze information that the storage processing units provide about the virtual volumes and based on the analysis, to direct the storage processing units to maintain snapshots that permit recovery of data after a ransomware attack.

6. The system of claim 5, wherein the information that the storage processing units provides indicate patterns of storage operations targeting the virtual volumes.

7. The system of claim 5, wherein the information that the storage processing units provides includes tests results from analysis of data blocks written to the virtual volumes by the storage processing units.

8. The system of claim 7, wherein the test results indicate one or more of compressibility, entropy, and encryption of data.

9. The system of claim 7, wherein the information represents a histogram of the test results.

10. The system of claim 5, wherein cloud-based infrastructure analysis of the information includes detecting an anomaly by comparing the information to a model that results a machine learning process that trained using past information.