System, Method and Process for Protecting Data Backup from Cyberattack
System, method and process for securing and protecting data and data backups from cyberattack and implementing disaster recovery using machine learning and artificial intelligence. Embodiments learn and establish baseline parameters of routine, normal and non-compromised behavior and activity of virtual machines operative in cloud ecosystem, detect and recognize anomalous events related to advanced persistent threats to the instance, such as ransomware, and automatically implement preconfigured actions as determined by a user with the primary objective of protecting data and data backups.
This application is the Non-Provisional Application of Provisional Application No. 62/662,491 (Confirmation No. 6989) filed on Apr. 25, 2018 for “Artificial Intelligence (AI) triggering Backup and Disaster Recovery and other security measures to protect from Cyberattacks (Security and Backup as Services)” by Joseph Merces, et al. This Non-Provisional Application claims priority to and the benefit of that Provisional Application, the contents and subject of which are incorporated herein by reference in their entirety, including all references cited and incorporated within the Provisional Application.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENTNot Applicable
BRIEF DESCRIPTION OF THE INVENTIONEmbodiments of the invention are directed towards systems, methods and processes for securing and protecting data and data backups from cyberattack and implementing disaster recovery all using machine learning and artificial intelligence (“AI”) technology. Embodiments learn and establish baseline parameters of routine, normal and non-compromised behavior and activity of virtual machines operative in cloud ecosystem, detect and recognize anomalous events related to advanced persistent threats to the instance, such as ransomware, and automatically implement preconfigured actions as determined by a user with the primary objective of protecting data and data backups.
BACKGROUNDTraditionally, datacenter server backups are manually scheduled to occur at specific configured times during the day and/or evening hours or manually on demand. With the advent of Internet based cloud services and virtual machines (“VMs”), backups of servers and the data they house are generally still manually scheduled to occur at preconfigured times during the day/evening hours or manually. In today's world of increasing cybersecurity threats such as ransomware, not only are the servers, VMs and data at risk, but even the data backups are in jeopardy of infection, and worse, being encrypted and held for ransom or even destroyed. There are countless examples of organizations—occurring with greater frequency—within the public and private sectors worldwide that have had the very backups they rely on to recover from when disaster strikes either encrypted or totally erased in order to force ransom payment. Destruction, eradication or ransom of data backups inflicts incalculable harm. Government agencies, large enterprises, and small and medium size business are growing as prospective targets. The recent ransomware attacks on the U.S. cities of Baltimore, Md. and Atlanta, Ga., among others, are prime examples.
The prior art of legacy/traditional data backups fail to conduct real time monitoring and protection from threats to data backups. Moreover, since most data back-ups are “simple backups,” wherein the backup service merely provides file and folder level backups and restores, the backed-up data may be compromised at any time by an “advanced persistent threat” (“APT”), such as, but not limited to, ransomware. To complicate matters, data backup software products are now being marketed and sold as “data protection.” “Data backup” is not the same as “data protection.” This has created confusion in the market place on the part of organizations thinking that their so called “data protection” software will be able to restore critical systems after a ransomware attack, only to find that not only did the ransomware encrypt their servers, VMs and systems, but it also ruined their ability to recover from their data backups, and even destroyed the very data protection software used to create their backups. The recent ransomware attacks conducted against the City of Atlanta, Ga., the Erie County Medical Center in NY and the City of Baltimore, Md., among others, are prime examples. Labeling a “backup” product as data “protection” may have worked years ago, but in today's cyberattack riddled world, this description no longer works or fits for most backup products that continue to refer to themselves as “data protection.” When organizations attempt to recover following a ransomware attack and discover that the very backups they were relying upon to restore their systems have also been compromised and their backup software has been destroyed, many questions get directed at their “data protection” product representatives.
What is therefore needed for all cloud computing systems and operations, including public, private and government based data centers, is true “data protection”—a transformative system, method and/or process that incorporates real time cybersecurity countermeasures along with true data backup protection, since the backup is generally the last line of defense for an organization, particularly when an organization's cloud or private network system is under attack from an APT such as ransomware. Embodiments of the invention meet that need by utilizing machine learning and AI to establish baseline parameters of routine, normal and non-compromised machine behavior, recognizing anomalous events related to APTs, such as ransomware, and implementing a host of actions with the primary objective of protecting data backups wherever the backups exist. Embodiments of the invention operate or are otherwise implemented within a VM platform residing within a highly secure virtual cloud environment, such as that contained within, for example, Amazon Web Services (AWS), Microsoft Azure, Google Cloud or any such other public and private cloud environments.
While it is generally recognized that the AWS cloud is more secure in comparison to on-premises hosted environments and that government agencies can certainly reap the benefits of the Federal Risk and Authorization Management Program (FEDRAMP) certified AWS GovCloud, embodiments of the invention may be implemented in any number of cloud-based platforms. While the cyber-hacking community and nation state actors have remained focused on infiltrating local on-premises hosted data centers, such as those generally found with government agencies, migration to the public cloud continues to gain momentum. Cloud-based platforms and ecosystems will therefore increase as potential targets of infiltration and destruction for an APT like ransomware. While embodiments of the invention may be implemented to providing data backup protection within any public or private datacenter, embodiments are particularly suited for cloud-based systems, platforms and ecosystems such as AWS, Microsoft Azure, Google Cloud, or any other public and private cloud environments.
SUMMARY OF THE INVENTIONEmbodiments of the invention are directed towards systems and methods for securing and protecting cloud-based data and data backups and implementing disaster recovery from cyberattack using machine learning and AI technology.
In embodiments, machine learning and AI (as used herein, AI comprises the various machine learning of embodiments, including statistical and vector algorithms, and deep learning, as well as anomaly detection) are utilized to continuously and in real time analyze the system and event logs of a VM and the host machine on which it operates, in addition to online data streams as may be available from or provided by the cloud ecosystem platform, to establish various baselines indicative of normal, routine and non-compromised (typical) operations, activity and behavior. Through machine learning, embodiments of the invention establish various baseline(s) in terms of normal machine behavior and through the incorporation of machine learning and AI logic, are able to detect through interpretation and analysis anomalous events related to APTs (e.g., ransomware) and take remedial action with a primary directive of protecting data backups. In particular, and as further disclosed herein, system logs and online data streams of various operating resources such as the host's or VM's CPU activity, memory, disk usage (read/write/etc.), network and bandwidth activity are closely monitored and analyzed to establish baselines that fall within normal or typical operating parameters. When within the virtual public cloud of the AWS ecosystem, for example, system logs may also include VPC flow logs, DNS logs, CPU, memory, disk activity, network traffic, as well as CloudTrail event log analysis. Through machine learning, embodiments continually learn from the monitored resources as to what constitutes normal, routine and non-compromised (typical) activity and what constitutes anomalous (atypical) activity indicative of a threat to the VM. Through machine learning and AI, patterns of behavior and activity are continually accessed, processed and monitored. Embodiments of the invention continue with such real time monitoring of system logs and online data streams, and when one or more such monitored operating parameters deviates from its established baseline for normal, non-compromised activity—thereby signifying an anomaly and detecting a potential threat to the data and operating resources of the VM and host machine—embodiments of the invention may initiate various predetermined, specific automatic actions in response thereto, including specific automatic actions from the standpoint of backup and disaster recovery protection.
When anomalies and threat events are detected by embodiments of the invention, various pre-determined actions (as determined by users thereof or by the machine learning/AI module thereof) may be automatically initiated. Primary objectives of embodiments of the invention are directed towards protecting data backups and replications that have been previously performed, catalogued, and stored in various networked locations as described herein. Such “actionable automation” includes configurable feature capabilities such as: 1) alerting, such as, for example, alerting authorized, pre-determined (designated) users of the invention or any such other individuals in the form of Simple Notification Service (“SNS”), email, voice call and text message, 2) backup, such as, for example, system and data backup to other regions (such as, for example, another Cloud Region under the AWS cloud computing platform, discussed below) or to other accounts by the user (such as, for example, other AWS cloud computing accounts, discussed below), 3) quarantine, such as, for example, quarantine and catalogue prior backup(s) and replications performed by embodiments of the invention regardless of cloud cross-region and cross-account stored, quarantine a specific machine exhibiting anomalous behavior, etc., 4) restore, such as, for example, restoring a last known good (non-compromised) backup of a machine, selection and restoration from an older archived or quarantined backup or replication, 5) replication, such as, replication of previously created data backups, such as, for example, creation of an Amazon Machine Image or “AMI” and snapshots of data backups to other regions of the cloud ecosystem, such as data centers, replication to other accounts (e.g., cloud-based accounts containing different login credentials and passwords within different data centers for added security and access), and 6) shutdown, such as, for example, shutting down network connectivity to a machine exhibiting anomalous behavior, shutting down one or more ports to a machine, shutting down the machine, and in conjunction with any of the above, running various cybersecurity countermeasures as well as invoking a multitude of other actions, including copying or replicating VMs to an entirely different private cloud or public cloud service, such as, for example, Microsoft Azure, Google Cloud, etc., all for the objective of protecting enterprise backups and disaster recovery from cybersecurity compromise.
Embodiments of the invention utilize one or more constructive algorithms within various steps for processing and identifying events (both routine (typical) and anomalous (atypical) events) registered in system logs of VMs (a specific VM running within a server is also referred to herein as an “instance”—a term commonly utilized in the industry to refer to each such VM operating on a host digital processor or machine under the supervision of a hypervisor software module, such as, for example, Hyper-V® by Microsoft Corporation, Redmond, Wash., or VMWare ESXi byVMware, Inc., Palo Alto, Calif., or any other hypervisor software modules or systems that allow for the creation of one or more VMs on a host machine). Ideologically, such process algorithms are generally based on a combination of three well-known approaches: fuzzy sets theory (the fact that the event is “typical” or “atypical” is determined by the value of its membership degree), the method of potential functions (the metric properties of events are determined with the help of a nonnegative symmetric kernel of one or another form), and deep learning algorithms. From a theoretical point of view, it is an adaptive learning process algorithm that allows for identification of evaluated events. From a practical point of view, it is also a process algorithm that provides the opportunity to simultaneously estimate the degree of “typicality,” calculate 3D coordinates, and provide computer visualization of these events. (The visualization is provided for human analysis, such as that after an attack, and performing additional remediation of a cyberattack to better understand the anomalous behaviors detected and acted upon through any of the preconfigured actions and directives.)
As used herein, the term “host digital machine” or “host machine” refers to the actual physical machine upon which one or more VMs or instances may operate. The host machine is generally comprised of a digital processor or CPU that may have some associated volatile memory, generally in the form of RAM, a digital storage device generally in the form of one or more hard disk drives (including, but not limited to, solid state drives, or any such other storage devices that may evolve within the technology) that may serve as the main digital memory associated with the digital processor and where files and other associated data are generally stored, a network communications device, such as a network interface controller (NIC) or device, and other hardware commonly known and understood and upon which one or more operating systems and various software platforms or layers operate to comprise the entire host machine and upon which one or more VMs operate. The digital processor of the host machine is referred to herein as the host processor or host digital processor. Further, as used herein, the terms “digital memory,” “disk memory” and “memory” are used interchangeably and are generally intended as meaning the memory capability of the host disk drive(s), although without departing from the spirit and scope of the embodiments, additional forms of memory may be encompassed. It is also to be understood that host machines may employ multiple digital processors, digital storage devices, memory devices, etc. in various configurations commonly known.
As used herein, the term “instance” refers to a virtual machine or virtual server instance running on a cloud-based platform in a public or private cloud network. In the case of the virtualized cloud-based web hosting services offered by Amazon Web Services, Inc., a subsidiary of Amazon.com, Inc., Bellevue, Wash. (collectively, “Amazon”), also known as AWS (and the various permutations of the services offered by Amazon under AWS), an “EC2 instance” is a virtual server in Amazon's Elastic Compute Cloud (“EC2”) for running applications on the AWS infrastructure. AWS is a comprehensive, evolving cloud computing platform; EC2 is a service that allows business subscribers to run application programs in the computing environment. EC2 can serve as a practically unlimited set of VMs or EC2 instances. Users of AWS's EC2 services have at their disposal a virtual cluster of computers, available all the time through the Internet. The AWS EC2 platform of virtual computers, discussed in greater detail herein, emulates most of the attributes of a real computer including hardware (CPU(s) and GPU(s) for processing, local/RAM memory, hard-disk/SSD storage); a choice of operating systems; networking; and pre-loaded application software such as web servers, databases, CRM, etc. As used herein, the term “instance” also comprises an AWS EC2 instance.
As used herein, “AWS” shall generally refer to the virtual cloud computing platform services offered by Amazon, including Amazon's EC2 virtual servers (VMs or instances).
As used herein, “ransomware” shall generally refer to any type of malware that prevents or limits users from accessing their system, machine or instance, the data comprising same, and/or any backups of the instance and/or data of same, unless a ransom is paid. More modern ransomware families, collectively categorized as crypto ransomware, encrypt certain file types on infected systems, machines and instances and force users to pay the ransom through certain online payment methods to get a decrypt key. However, it is to be understood as used herein the term “ransomware” is also meant to comprise any type of malicious software or attack that presents a potential threat to the security and/or integrity of data backups.
These and other more detailed objects of the present invention will be disclosed when taken in conjunction with the following Detailed Description of the Invention in which like numerals represent like elements. The following is a listing of the reference numbers and the associated elements and features of embodiments as shown in the attached drawings:
COMPONENTS/ELEMENTS/FEATURES REFERENCE NUMBERS
-
- 00 AWS cloud
- 02 Digital communication network (internet)
- 04 AWS internet gateway
- 06 Router
- 10 AWS Cloud Region
- 12 AWS virtual private network (VPC)
- 14 AWS Subnet
- 14A Public AWS subnets
- 14B Private AWS subnets
- 14A-1 Public AWS subnet of an instance 60 in Availability Zone 1
- 14A-2 Public AWS subnet of an instance 60 in Availability Zone 2
- 14B-1 Private AWS subnet of an instance 60 in Availability Zone 1
- 14B-2 Private AWS subnet of an instance 60 in Availability Zone 2
- 16 AWS Security Group
- 20 Remote management platform console/dashboard for System 100
- 22 HTTPS functional network connection between System 100 and remote management console 20
- 24 AWS EBS data backup storage system
- 26 AWS EFS data backup storage system
- 28 AWS instance storage system
- 30 AWS S3 data backup storage system
- 32 Bucket of snapshots 34, e.g., AMIs, in AWS S3 data backup storage system 30
- 34 Snapshots 34, e.g., AMIs, in AWS S3 data backup storage system 30
- 40 Data backup (instance) memory/disk storage device
- 60 VM or instance operative on a host machine 80 in a virtual computing cloud-based platform or environment 00, such as, for example, the AWS cloud computing platform.
- 62 Virtual disk/memory
- 64 VM operating system (OS)
- 66 VM software/resources/applications
- 80 Host machine operative in a virtual computing cloud-based platform or environment 00, such as, for example, AWS, in which one or more VMs or instances 60 operate.
- 82 Hardware portion
- 84 Memory/disk
- 86 CPU/microprocessor
- 88 Software portion (comprising VMs/instances operative thereon)
- 89 Hypervisor module
- 100 An operative system/process embodiment of the invention generally designated herein for illustrative purposes as “System.”
- 112 Interface/interaction between machine learning logic element/feature/process/module 200 and AI engine element/feature/process/module 300 of System 100
- 122 Interface/interaction between and AI engine element/feature/process/module 300 and actionable logic element/feature/process/module 400 of System 100
- 200 Machine learning logic element/feature/process/module of artificial intelligence element/feature/process/module 300 of System 100
- 210 Access machine data step
- 220 Primary feature mapping step
- 230 Secondary feature mapping step
- 240 Clusterization step
- 250 Detection of cluster centers step
- 260 Data reduction step
- 270 Detection of scaling coefficients step
- 280 Construction of projections of quantized events step
- 300 Artificial intelligence element/feature/process/module of System 100, comprising inter-related machine learning logic process 200 and AI anomaly detection engine process 300A
- 300A AI anomaly detection engine of module 300
- 310 Quantizing input entries step
- 320 Clusterization of quantized events step
- 330 Visualization of quantized events step
- 400 Actionable logic element/feature/process/module of System 100
- 460 Various exemplary action items that may preconfigured by users
The within description and illustrations of various embodiments of the invention are neither intended nor should be construed as being representative of the full extent and scope of the present invention. While particular embodiments of the invention are illustrated and described, singly and in combination, it will be apparent that various modifications and combinations of the invention detailed in the text and drawings can be made without departing from the spirit and scope of the invention. For example, references to materials of construction, methods of construction, specific dimensions, shapes, utilities or applications are also not intended to be limiting in any manner and other materials and dimensions could be substituted and remain within the spirit and scope of the invention. Accordingly, it is not intended that the invention be limited in any fashion. Rather, particular, detailed and exemplary embodiments are presented.
The images in the drawings are simplified for illustrative purposes and are not necessarily depicted to scale. To facilitate understanding, identical reference numerals are used, where possible, to designate substantially identical elements that are common to the figures, except that suffixes may be added, when appropriate, to differentiate such elements.
Although the invention herein has been described with reference to particular illustrative and exemplary physical embodiments thereof, as well as a methodology thereof, it is to be understood that the disclosed embodiments are merely illustrative of the principles and applications of the present invention. Therefore, numerous modifications may be made to the illustrative embodiments and other arrangements may be devised without departing from the spirit and scope of the present invention. It has been contemplated that features or steps of one embodiment may be incorporated in other embodiments of the invention without further recitation.
DETAILED DESCRIPTION OF THE INVENTIONA more detailed description of the invention now follows.
In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, the use of similar or the same symbols in different drawings typically indicates similar or identical items, unless context dictates otherwise.
The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here.
One skilled in the art will recognize that the herein described components (e.g., operations), devices, objects, and the discussion accompanying them are used as examples for the sake of conceptual clarity and that various configuration modifications are contemplated. Consequently, as used herein, the specific exemplars set forth and the accompanying discussion are intended to be representative of the more general classes. In general, use of any specific exemplar is intended to be representative of its class, and the non-inclusion of specific components (e.g., operations), devices, and objects should not be taken as limiting.
The host machine 80 of
The hardware layer 82 may, for instance, be the physical components such as, but not limited to, a digital host processor or CPU 86 and an associated digital memory storage device 84, such as, for example, a digital hard disk drive (including, without limitation, solid state drives). These may, for instance, be any of the well-known digital computing processors and digital electronic memory/storage devices that are commercially available.
The software layer 88 may be an implementation of a virtual computing environment in which a hypervisor software module 89 may implement one or more instances or VMs 60. Each instance 60 may be further comprised of a guest operating system 64 that may be associated with a virtual digital memory (virtual disk) 62 and may run one or more guest software applications 66. The instances/VMs 60 of
Each instance 60 may appear to an end user to be functionally equivalent to a physical digital machine, allowing applications such as, but not limited to, word processors, spreadsheets and databases or other software applications and platforms, or some combination thereof, to be used. Each VM 60 may operate its own and separate operating system (OS) such as, but not limited to, Microsoft Windows®, Apple OS or Linux open source operating system, all of which may run or operate as a guest operating system 64 on the VM 60.
Translating the instructions issued by the guest software programs or applications 66 operating on each VM 60 into actions that can be performed by the digital host processor 86 may be accomplished by a hypervisor software module 89. The hypervisor software module 89 may, for instance, be one of the well-known virtualization platforms such as, but not limited to, one of the Hyper-V® family of software platforms provided by the Microsoft Corporation of Redmond, Wash., or VMWare ESXi byVMware, Inc., Palo Alto, Calif. (or such any other hypervisor software modules or systems that allow for the creation of one or more VMs on a host machine). While the Hyper-V® family of hypervisor platforms and ESXi byVMware are considered herein as examples, it is expressly understood that the disclosed embodiments of the invention are not in any way limited to that specific hypervisor module.
The hypervisor software module 89 may, for instance, translate requests by a VM 60 to access its virtual digital memory (virtual disk) 62 into a request to access to the physical, digital memory storage device 84 associated with the host processor 86.
Continuing with reference to
AWS provides users the ability to place resources, such as instances 60, and data in multiple locations; resources are not generally replicated across AWS Regions unless users do so specifically. AWS Regions currently established in the United States are shown in Table 1, below:
Continuing with
Continuing with
Continuing with
Continuing with
AWS Security Groups 16 associated with a respective instance 60 provide security at the protocol and port access level. As such, each Security Group 16—working much the same way as a firewall—contains a set of rules that filter traffic coming into and out of an instance 60. Generally, there are no “deny” rules. Rather, if there is no rule that explicitly permits a particular data packet, it will be dropped.
The actual rule set that filters traffic is made up of two tables: “inbound” and “outbound.” Since Security Groups 16 are stateful, users need not establish the same rules for both outbound traffic and inbound. As a result, any established rule that allows traffic into an instance 60, will allow responses to pass back out without an explicit rule in the “outbound” rule set. Each rule is comprised of four fields: “type,” “protocol,” “port range,” and “source.” The fields apply for both inbound and outbound rules.
Continuing with
Continuing with
Continuing with
While
Instance (VM) 60 data backup is a critical means of prophylactic protection in the event an operative instance 60 is compromised, particularly through a security breach or APT, such as ransomware. Data backup is a process of duplicating data, or, as presented here, an instance (VM) 60, to allow retrieval of the duplicate set after a data loss event. Today, there are many kinds of data backup services that help enterprises and organizations ensure that data is secure, and that critical information is not lost in a natural disaster, theft situation or other kind of emergency.
Considering the AWS cloud computing ecosystem, AWS provides several backup resources which would be readily understood by users of the platform and those skilled in the art. In addition, many third-party providers also provide robust instance backup applications with various features allowing customization of the desired backup process which would also be readily understood by users thereof and those skilled in the art. While many such backup processes are available, a brief discussion of the available AWS backup options is helpful with the understanding that embodiments of the invention are not limited to any particular data backup system, method or process for cloud computing operations or operation of the systems, methods and process of the disclosed invention.
AWS provides various flexible, cost effective, and easy-to-use data storage options for instances. Each option has a unique combination of performance and durability and available storage options may be used independently or in combination to suit a user's requirements. Options and features available for instance backup in the AWS ecosystem include Amazon Elastic Block Storage (“EBS”), Amazon EC2 Instance Storage, Amazon Elastic File System (“EFS”) storage and Amazon Simple Storage Service (“S3”).
Continuing with
The EBS storage volume operative on one or more data (instance) backup memory/disk storage devices 40 behaves like a raw, unformatted, external block storage device that may be attached to a single instance 60. The volume persists independently from the running life of an instance 60. After an EBS volume (defined and configured by a user) is attached to or associated with an instance 60, it operates like any other physical hard drive. Referring to
Continuing with
With regard to the AWS ecosystem, Amazon Machine Images or “AMIs” are also an important image backup especially for Windows servers because snapshots alone do not capture the content of the root volume of the Windows server, where the operating system resides. AMIs in conjunction with snapshots provide the ability for a complete restoration to be performed. Embodiments of the invention, as described in detail herein, provide the ability to maintain and catalogue as well as replicate (copy elsewhere to a designated AWS region), quarantine (copy backups to an alternate, secure area, restore, all AMIs and snapshots (collectively known as backups) of machines which may be stored anywhere within the AWS regional, global ecosystem.
Continuing with
Lastly, continuing with
Referring to
Continuing with
Within the actionable logic module 400, based on the detected anomaly, various alerts and actions may be triggered such as those more passive in nature, e.g., notifications to system and security admins, or those more active in nature, e.g., pause or stop instance/VM, invoke backup of instance/VM, etc., or any combination of such available action items. The actionable automation is inclusive of performing backup to other regions, backup cataloguing of existing backup(s) (AWS worldwide cross-region and cross-account), quarantine of instance AMIs and snapshots regardless of region stored, quarantine of instance, firewalling the instance, replication, performing restore from existing backup, shutting down network ports and connectivity, running SIEM tools, etc.
In the context of computing, a log or log file is generally a text file or XML, file used to register the automatically produced and time-stamped documentation of events, behaviors and conditions relevant to a particular system. Generally, the events recorded by the log file are often predetermined by the operating or other system itself and may contain information about device changes, device drivers, system changes, events, operations and more. While log files are generally associated with an operating system or OS of a machine (virtual or host), they are used in many such other environments, including, but not limited to, the following:
-
- On a web server: an access log can be useful to identify number of visitors, the domains from which they are visiting, the number of requests for each page, usage patterns according day of the week or even the hour of the day.
- Operating system: use syslog files to register events, errors, user access, warnings, etc. By reviewing its data, an administrator can check if all processes are loading successfully or the root cause of a specific problem.
- In Microsoft Exchange: transactions logs are files used to convey information (email messages, new users, folders deleted, etc.) to the database of Exchange. Everything is sent first to the transaction log and then to the database when the system allows it.
- In network routers: log files register failing processes, connections and disconnections from wan services and devices, VPN connections status, etc.
- In firewalls: log files register which network connections were allowed and dropped.
Logs have standard components that may vary depending on the OS. However, there are common components and information that are captured regardless of the OS. All entries are classified by type such as error, information, warning, success audit and failure audit for Windows systems, and emergency, alert, critical, error, warning, notice, info and debug for Mac OS and Linux systems. Events are classified into System, Security, Application, Directory Service, DNS Server & DFS Replication categories. Directory Service, DNS Server & DFS Replication logs are applicable only for Active Directory. Events that are related to system or data security are called security events and its log file is called Security logs.
Some of the events listed above include system errors, warnings, startup messages, system changes, abnormal shutdowns, etc. This list is applicable to most versions of the three common OSs (Windows, Linux and Mac OS).
Continuing with
Algorithms/processes employed by System 100 further allow for the detection of intrusions, malicious activity, and other anomalies of network activity registered in firewalls, web application firewalls or other system/performance logs, including pattern matching logic. The estimation of the “typicality” of events may be considered as the quantitative analysis of the studied system logs entries, while the visualization may be considered as their qualitative analysis.
In embodiments, functionality of the machine learning 200 and anomaly detection engine 300A processes—while inter-related and overlapping—may be separated into two phases: learning and classification. In embodiments, machine learning module 200 is generally comprised of the learning phase, which, in embodiments, is further comprised of seven steps, the last of which may be considered as optional. Input data is primarily comprised of system log entries that describe events that occur to or within instances (VMs) operating in standard operating mode. In embodiments, AI anomaly detection engine 300A is generally comprised of the classification phase, which, in embodiments, is further comprised of three steps, the last of which may be considered as optional. Data sources for the classification stage are primarily, but limited to, online data streams as described below.
Referring to
Continuing with
Continuing with
-
- the initial and final time of quantizing interval (two dates),
- the number of unique addresses of incoming and outcoming requests (two numbers),
- the number of incoming and outcoming requests (two numbers),
- the number of incoming and outcoming packages (two numbers), and
- the number of incoming and outcoming bytes (two numbers).
In embodiments, other statistics may be calculated and step 220 is not limited to the foregoing (i.e., calculation of other statistics is optional). The values of the calculated statistics are recorded to a table whose entries are quantized events of the system log.
Continuing with
Continuing with
As a result, at step 240, a number of clusters is constructed. Each cluster models a behavior of a homogeneous group of “faithful” (i.e., “typical”) or “malicious” (i.e., “atypical”) users. Such clusterization is analogous to a reduction of the kernel matrix to a block-triangular form. In this sense, “faithful” means that the events should be “trusted”—meaning part of the baseline of events collected during this learning step and further assuming no malicious events have occurred. “Malicious,” on the other hand, means that the events data has deviated from the baseline. In this manner, baselines of routine, non-compromised operative behavior and activity may be constructed.
Continuing with
As a result, in step 250, a number of cluster centers is constructed. Each center is a linear combination that, generally speaking, comprises all the quantized events from the corresponding cluster.
Continuing with
As a result, in step 260, a number of reduced linear combinations is constructed, one combination per cluster. In an embodiment, each linear combination of step 260 comprises a small number of basis events and sufficiently approximates the geometrical center of the corresponding cluster.
Continuing with
As a result, in step 270, a number of scaling coefficients is constructed, one coefficient per cluster.
Continuing with
As such, in embodiments and continuing with
Summarizing the learning stage/phase functional processes of machine learning logic module/process 200 of embodiments as depicted in
Continuing with
As used herein, “online streams” of data refers to the data streams made available by and through a cloud computing platform/ecosystem, such as, for example, AWS, to users of that service, and provided in real-time or near real-time and not stored locally in system log files within an instance. Examples of data from online streams include, but are not limited to, such data derived or originating from instance firewalls, web application firewalls, VPC flow logs, VPC flow logs, DNS logs, network traffic logs, AWS CloudTrail logs, netflow logs, snmp logs, network traffic logs, and the like. As previously noted, from the perspective of the AWS cloud environment, for example, VPC flow logs and other log files are created and accessible via one or more AWS APIs made available by AWS. Using AWS APIs, flow logs may be created and retrieved via CreateFlowLogs (Amazon EC2 Query API) and GetLogEvents (CloudWatch API) respectively. Logs may also be created and retrieved via Microsoft PowerShell as well as through the AWS CloudWatch service, which collects monitoring and operational data in the form of logs, metrics, and events, providing a unified view of AWS resources, applications and services that run on AWS, and on-premises servers. All of the above represent examples of “online streams” of data used by embodiments that originate from the cloud ecosystem and not necessarily within the operative instance.
Referring to
Continuing with
Continuing with
To summarize, input data of AI anomaly detection engine 300A comprises online data streams describing sequences of events occurring to the instance 60. The output is estimates of the membership degrees of the quantized events to the corresponding clusters, step 320, with optional opportunity of their visualization, step 330.
Upon the determination or detection of anomalous or malicious activity as described above, i.e., atypical behavior activity, System 100 proceeds to initiate one or more action items 460 in actionable logic module 400 as have been preconfigured by users. Such anomaly or atypical activity as determined by embodiments of System 100 may be range from any number of malicious and destructive events at the instance 60, host machine 80 or cloud ecosystem level, and include such items as rogue processes performing damaging actions on known system files, deleting or renaming system files, mass file encryption, mass file changes are any number of general patterns indicative of a ransomware attack. Other such actions indicative of a ransomware attack that embodiments of the invention would detect in accordance with the above method and process, and therefore determine as anomalous or atypical activity, include, but are certainly not limited to events such as: process-based modification of a file in a system folder, process-based modification of an executable file, excessive or spikes in computer, system or network activity, process-based deletion of a command or executable file, process-based creation of an executable file in a user folder, process-based modification of a file in a user directory, processed-based modification of an autorun registry key value, process-based execution of a command or execution file, deletion of a shadow copy, process-based disabled proxy, process-based disabled firewall, etc.
The actionable logic module, as previously described and depicted in
Action items 460 of the actionable logic module 400 that may be automatically initiated by System 100 based on the pre-configuration thereof by a user include, but are not limited to, the following:
Catalogue Instance Backups.
Referring to
In an embodiment, when an anomaly indicative of malicious behavior is detected, the actionable logic module 400 recognizes the operative instance 60 affected thereby and reads the instance ID of the instance 60 and catalogues all of the associated backups and replications wherever they may be stored anywhere in the world throughout the regional AWS ecosystem that are associated with and belonging to that instance ID for potential copying and replication.
Copy/Replicate Catalogued Instance Backups and Replications.
Continuing with
Quarantine.
Actionable logic module 400 may also be preconfigured to quarantine the instance 60 affected by the anomalous activity to prevent spread of an APT/ransomware infection to existing data backups and/or replications, regardless of where stored. In addition, actionable logic module 400 may also be preconfigured to quarantine one or more existing catalogued backups and/or replications regardless of cloud cross-region and cross-account stored to prevent further incursion by the APT/ransomware attack. Quarantine may also be performed to a specific host machine 80, also to prevent further incursion by the APT/ransomware attack.
Alert.
Actionable logic module 400 may also be preconfigured to transmit one or more alerts to one or more users of System 100. Alerts may be transmitted to authorized, pre-determined (designated) users (such as an administrator) or any such other individuals in the form of Simple Notification Service (“SNS”), email, voice call and text message. All such alert methods are generally well known, and others not specifically mentioned are intended for inclusion herein and the available alerts are not limited to those specifically listed.
Backup.
The actionable logic module 400 may also be preconfigured to perform an immediate backup of the instance 60 affected by the anomaly. A backup of the instance 60 may be desirable as an attempt to preserve the last known data of the instance in the event it has not been fully infiltrated by the threat and potentially recoverable. Referring to
Restore Instance.
Actionable logic module 400 may also be preconfigured to perform an immediate restoration of a backup of the instance 60 affected by the anomaly. Such backups or replications may be stored in any of the locations previously discussed with regard to the AWS ecosystem described in
Shutdown.
Actionable logic module 400 may also be preconfigured to shutdown the instance 60 to prevent further infestation of the threat to data backups and/or existing replications of the instance. Shutdown may take many forms, including, shutting down network connectivity to the instance 60, shutting down one or more ports to the instance, terminating operation of the instance, shutting down (turning off) the host machine 80, or any combination of the foregoing.
While many potential action items 460 have been described with respect to the actionable logic module 400 that uses may select and preconfigure in the event of an anomaly, it is to be expressly understood that there are many variations of the items so described and other related items that module 400 may include for pre-configuration that are not listed here. One skilled in the art would readily appreciate any such other items that may be desirable as effective response means to an APT or cyberattack threat to an instance. Such other response items, and combinations thereof, are intended for inclusion and pre-configuration in actionable logic module 400.
While the invention has been disclosed in connection with embodiments shown and described in detail, various modifications and improvements thereon will become readily apparent to those skilled in the art. Accordingly, the spirit and scope of the present invention is not to be limited by the foregoing examples but is to be understood in the broadest sense allowable by law.
This disclosure of the various embodiments of the invention, with accompanying drawings, is neither intended nor should it be construed as being representative of the full extent and scope of the present invention. The images in the drawings are simplified for illustrative purposes and are not necessarily depicted to scale. To facilitate understanding, identical reference terms are used, where possible, to designate substantially identical elements that are common to the figures, except that suffixes may be added, when appropriate, to differentiate such elements.
Although the invention herein has been described with reference to particular illustrative embodiments thereof, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. Therefore, numerous modifications may be made to the illustrative embodiments and other arrangements may be devised without departing from the spirit and scope of the present invention. It has been contemplated that features or steps of one embodiment may be incorporated in other embodiments of the invention without further recitation.
Claims
1. A data backup and recovery system for a virtual machine operative on a host machine in a cloud ecosystem, comprising:
- a machine learning module;
- an anomaly detection engine; and
- an actionable logic module,
- wherein, the machine learning module continuously accesses and reads a data recorded by a one or more system logs of the virtual machine to create and continuously update a baseline parameter of the data recorded by the system logs that is indicative of a typical, non-compromised operation of the virtual machine, and
- wherein, the anomaly detection engine continuously monitors in real-time the data recorded by the system logs and compares said real-time data to the baseline parameter and determines whether said real-time data is a statistical anomaly with reference to the baseline parameter, and
- wherein, if said real-time data represents a statistical anomaly with reference to the baseline parameter, the actionable logic module initiates a one or more response actions pre-configured from a set of pre-configurable actionable responses directed towards protecting a one or more existing data backups of the virtual machine.
2. The data backup and recovery system of claim 1, wherein:
- the real-time data monitored by the anomaly detection engine comprises an online streaming data of the cloud ecosystem of the virtual machine.
3. The data backup and recovery system of claim 1, wherein:
- the set of pre-configurable actionable responses is comprised of one or more of the following response actions:
- catalogue the existing data backups;
- catalogue a one or more existing replications of the existing data backups;
- copy the existing data backups to a one or more data backup storage devices;
- copy the existing replications to the data backup storage devices;
- quarantine the virtual machine;
- quarantine the existing data backups;
- quarantine the existing replications;
- quarantine the host machine;
- issue an at least one alert to one or more users of the system;
- perform a current backup of the virtual machine;
- perform a current replication the existing data backup to the data backup storage devices;
- restore the virtual machine from the existing data backups;
- restore the virtual machine from the existing replications;
- shutdown the virtual machine; and
- shutdown the host machine.
Type: Application
Filed: Jun 25, 2019
Publication Date: May 21, 2020
Applicant: Cloud Daddy, Inc. (Middletown, NJ)
Inventors: Konstantin Malkov (Holmdel, NJ), Joseph Merces (Middletown, NJ), Dmitry Tunitsky (Moscow)
Application Number: 16/451,497