Hybrid machine learning and knowledge graph approach for estimating and mitigating the spread of malicious software

Info

Patent number: 11914709
Type: Grant
Filed: Jul 20, 2021
Date of Patent: Feb 27, 2024
Patent Publication Number: 20230026135
Assignee: Bank of America Corporation (Charlotte, NC)
Inventors: George Anthony Albero (Charlotte, NC), Maharaj Mukherjee (Poughkeepsie, NY)
Primary Examiner: Aravind K Moorthy
Application Number: 17/380,450

Abstract

Aspects of the disclosure relate to predicting the spread of malicious software. The computing platform may identify malicious software at a computing device and may input characteristics of the malicious software into a machine learning model to produce time horizons for the malicious software. The computing platform may identify, using a knowledge graph and based on the time horizons, subsets of computing devices, each corresponding to a particular time horizon. The computing platform may perform, at a time within a first time horizon, a first security action for a first subset of computing devices within the first time horizon and a second security action for a second subset of computing devices located within a second time horizon, where the first time horizon and the second time horizon indicate that the first subset will be affected by the malicious software prior to the second subset.

Description

Description

BACKGROUND

Aspects of the disclosure relate to preventing the spread of malicious software. Once introduced to a network, malicious software may spread to additional devices. In some instances, a pure graphical approach may be used to analyze all nodes connected to a current point of attack. However, such methods may be computationally expensive and time consuming (e.g., due to size of such graphs, which may include billions of devices if not more). Accordingly, application of such methods may result in processing inefficiencies and/or time delays, during which malicious software may continue to spread undetected.

SUMMARY

Aspects of the disclosure provide effective, efficient, scalable, and convenient technical solutions that address and overcome the technical problems associated with preventing spread of malicious software. In accordance with one or more embodiments of the disclosure, a computing platform comprising at least one processor, a communication interface, and memory storing computer-readable instructions may generate a knowledge graph that includes a plurality of computing devices and indicates relationships between the plurality of computing devices. The computing platform may train, using historical information, a machine learning model, which may configure the machine learning model to predict time horizons for spread of malicious software, where each of the time horizons may include a subset of the plurality of computing devices. The computing platform may identify malicious software at one device of the plurality of computing devices. The computing platform may input characteristics of the malicious software into the machine learning model to produce time horizons for the identified malicious software. The computing platform may identify, using the knowledge graph and based on the time horizons for the identified malicious software, subsets of the plurality of computing devices, each corresponding to a particular time horizon. The computing platform may perform, at a time within a first time horizon, a first security action for at least a first subset of the plurality of computing devices located within the first time horizon and a second security action for at least a second subset of the plurality of computing devices located within a second time horizon, where the first time horizon and the second time horizon indicate that the first subset of the plurality of computing devices may be affected by the identified malicious software prior to the second subset of the plurality of computing devices.

In one or more instances, the plurality of computing devices may include devices of different device types. In one or more instances, training the machine learning model may include training the machine learning model to predict the spread of malicious software based on velocity and acceleration of the malicious software.

In one or more examples, inputting the characteristics of the malicious software into the machine learning model may include: 1) identifying, that the identified malicious software has not previously been identified, 2) predicting, based on similarities between the identified malicious software and previously identified malicious software, characteristics for the malicious software; and 3) inputting, into the machine learning model, the characteristics.

In one or more instances, inputting the characteristics of the malicious software into the machine learning model may include: 1) identifying that the identified malicious software was previously identified, 2) accessing historical characteristic information for the identified malicious software, and 3) inputting, into the machine learning model, the historical characteristic information. In one or more instances, the characteristics and the historical characteristic information may include one or more of: location information, network traffic, date of attack, time of attack, or holiday information.

In one or more examples, producing the time horizons for the malicious software may include: 1) identifying, using the machine learning model, a velocity and an acceleration of the malicious software, and 2) identifying, using the machine learning model and based on the velocity and the acceleration, the time horizons, where the time horizons indicate predicted boundaries of a network that will be affected by the malicious software at various future times. In one or more examples, identifying the subsets of the plurality of computing devices, each corresponding to a particular time horizon may include identifying, by applying the time horizons to the knowledge graph, computing devices that are predicted to be affected by the malicious software at different future times.

In one or more instances, performing the first security action may include quarantining the first subset of the plurality of computing devices based on identifying that the first subset of the plurality of computing devices is within the first time horizon, and performing the second security action may include sending a firewall update to the second subset of the plurality of computing devices based on identifying that the second subset of the plurality of computing devices is outside of the first time horizon and within the second time horizon, where the firewall update is configured to block the malicious software from affecting the second subset of the plurality of computing devices.

In one or more examples, the computing platform may identify that the malicious software has moved from the first time horizon to the second time horizon. Based on identifying that the malicious software has moved from the first time horizon to the second time horizon, the computing platform may identify whether or not the first subset of the plurality of computing devices remain affected by the malicious software. If the first subset of the plurality of computing devices is no longer affected by the malicious software, the computing platform may: 1) stop performance of the first security action for the first subset of the plurality of computing devices, and 2) replace performance of the second security action for the second subset of the plurality of computing devices with performance of the first security action. If the first subset of the plurality of computing devices is still affected by the malicious software, the computing platform may: 1) continue to perform the first security action for the first subset of the plurality of computing devices, and 2) replace performance of the second security action for the second subset of the plurality of computing devices with performance of the first security action.

In one or more instances, performing the first security actions and the second security actions may include steering the malicious software into a secure sandbox. In one or more instances, steering the malicious software into the secure sandbox may include: 1) identifying a type of device affected by the malicious software, and 2) causing the secure sandbox to resemble the type of device affected by the malicious software.

In one or more examples, the computing platform may analyze, at the secure sandbox, the malicious software to identify additional information about the malicious software, and may update the first security action and the second security action based on the additional information. In one or more examples, the computing platform may embed, at the secure sandbox and into the malicious software, tracking information, where the additional information about the malicious software may be identified using the tracking information.

These features, along with many others, are discussed in greater detail below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:

FIGS. 1A-1B depict an illustrative computing environment for implementing a hybrid machine learning and knowledge graph method to predict the spread of malicious software in accordance with one or more example embodiments;

FIGS. 2A-2F depict an illustrative event sequence for implementing a hybrid machine learning and knowledge graph method to predict the spread of malicious software in accordance with one or more example embodiments;

FIG. 3 depicts an illustrative method for implementing a hybrid machine learning and knowledge graph method to predict the spread of malicious software in accordance with one or more example embodiments; and

FIG. 4 depicts illustrative graphical user interfaces for implementing a hybrid machine learning and knowledge graph method to predict the spread of malicious software in accordance with one or more example embodiments.

DETAILED DESCRIPTION

In the following description of various illustrative embodiments, reference is made to the accompanying drawings, which form a part hereof, and in which is shown, by way of illustration, various embodiments in which aspects of the disclosure may be practiced. In some instances, other embodiments may be utilized, and structural and functional modifications may be made, without departing from the scope of the present disclosure.

It is noted that various connections between elements are discussed in the following description. It is noted that these connections are general and, unless specified otherwise, may be direct or indirect, wired or wireless, and that the specification is not intended to be limiting in this respect.

As a brief introduction to the concepts described further herein, one or more aspects of the disclosure relate to using a hybrid machine learning and knowledge-based approach to predict the spread of malicious software. For example, the spread of malicious software may depend on a point of a current attack, and may be based on, for example, the speed of connectivity with other locations, network traffic, day and time of attack, local holidays and celebration, demographics, human skills, and/or other factors. Current methods may use a graph or network-based connectivity analysis for all the points that are connected to current points of attack. However, there may be problems associated with this network-based analysis. For example, such methods may be computationally expensive since parsing large network graphs (e.g., which may include billions of computing devices connected on a network), may be very expensive (or otherwise impossible). Furthermore, such methods may consider a conservative and brute-force approach, which may be time consuming. Accordingly, there is a need for a fast method that may predict, in advance, sections of a network that may be vulnerable to a next attack, so that mitigating actions may be performed, in advance, at these locations.

Accordingly, a hybrid approach is described here. A machine learning method may be used to predict a boundary using historic data and computation of speed and acceleration of the boundary based on locations, network traffic, day and time of attack, local holidays, and/or other information. Once a location boundary is established, a network or graph-based method may be used to find actual locations that may be potentially attacked. Based on the next potential locations, mitigating actions may be taken such as taking the location off the network, getting the latest anti-virus patch, and/or other actions.

To predict the spread of malicious software, both current velocity and acceleration may be predicted. The velocity and acceleration at any location of the infection may depend on factors such as speed of connectivity with other locations, network traffic, day/time of attack, local holidays, demographics, human skills, and/or other factors. A machine learning method may use historic data along with a historic time series spread of malicious software to predict the attack horizon along with the speed and acceleration at each of the points and their directions. The attack horizon may be computed iteratively based on the current attack horizon and the velocity and acceleration vectors at each of the points on the current attack horizon. Alternatively, the velocity and acceleration vectors may be known at a sufficient number of points, and may be used to interpolate the points to create the next attack horizon. Once an attack horizon is computed, a network analysis may be completed only within that horizon to determine the systems that may be impacted and may need protection.

In some instances, automated mitigations of vulnerable systems may include one or more of: disconnecting/quarantining a system to prevent spread, applying new and more secure firewalls, applying a latest patch or antivirus software, guiding the virus to virtual dead-ends or a sandbox to watch replication and providing a feedback loop to AI, allowing the virus to pick up a Trojan horse, and/or other actions.

These and other features are described in further detail below.

FIGS. 1A-1B depict an illustrative computing environment that implements a hybrid machine learning and knowledge graph method to predict the spread of malicious software in accordance with one or more example embodiments. Referring to FIG. 1A, computing environment 100 may include one or more computer systems. For example, computing environment 100 may include a software spread prediction platform 102, administrator user device 103, first computing device 104, second computing device 105, and third computing device 106.

As described further below, software spread prediction platform 102 may be a computer system that includes one or more computing devices (e.g., servers, server blades, or the like) and/or other computer components (e.g., processors, memories, communication interfaces) that may be used to train, host, and/or otherwise maintain a machine learning model and knowledge graph, which may subsequently be used to predict the spread of malicious software.

Administrator user device 103 may be a laptop computer, desktop computer, mobile device, tablet, smartphone, and/or other device that may be used by an individual (e.g., a network administrator, cyber security analyst, or other employee) identify the spread of malicious software. In some instances, administrator user device 103 may be configured to display one or more user interfaces (e.g., interfaces depicting the spread of malicious software, or the like).

First computing device 104 may be a laptop computer, desktop computer, mobile device, tablet, smartphone, and/or other device that may be vulnerable to the spread of malicious software. In some instances, the software spread prediction platform 102 may predict that the first computing device 104 falls within a first time horizon.

Second computing device 105 may be a laptop computer, desktop computer, mobile device, tablet, smartphone, and/or other device that may be vulnerable to the spread of malicious software. In some instances, the software spread prediction platform 102 may predict that the second computing device 105 falls within a second time horizon, different than the first time horizon.

Third computing device 106 may be a laptop computer, desktop computer, mobile device, tablet, smartphone, and/or other device that may be vulnerable to the spread of malicious software. In some instances, the software spread prediction platform 102 may predict that the third computing device 106 falls within a third time horizon, different than the first time horizon and the second time horizon.

Computing environment 100 also may include one or more networks, which may interconnect software spread prediction platform 102, administrator user device 103, first computing device 104, second computing device 105, and third computing device 106. For example, computing environment 100 may include a network 101 (which may interconnect, e.g., software spread prediction platform 102, administrator user device 103, first computing device 104, second computing device 105, and third computing device 106).

In one or more arrangements, software spread prediction platform 102, administrator user device 103, first computing device 104, second computing device 105, and third computing device 106 may be any type of computing device capable of sending and/or receiving requests and processing the requests accordingly. For example, software spread prediction platform 102, administrator user device 103, first computing device 104, second computing device 105, third computing device 106, and/or the other systems included in computing environment 100 may, in some instances, be and/or include server computers, desktop computers, laptop computers, tablet computers, smart phones, or the like that may include one or more processors, memories, communication interfaces, storage devices, and/or other components. As noted above, and as illustrated in greater detail below, any and/or all of software spread prediction platform 102, administrator user device 103, first computing device 104, second computing device 105, and/or third computing device 106 may, in some instances, be special-purpose computing devices configured to perform specific functions.

Referring to FIG. 1B, software spread prediction platform 102 may include one or more processors 111, memory 112, and communication interface 113. A data bus may interconnect processor 111, memory 112, and communication interface 113. Communication interface 113 may be a network interface configured to support communication between software spread prediction platform 102 and one or more networks (e.g., network 101, or the like). Memory 112 may include one or more program modules having instructions that when executed by processor 111 cause software spread prediction platform 102 to perform one or more functions described herein and/or one or more databases that may store and/or otherwise maintain information which may be used by such program modules and/or processor 111. In some instances, the one or more program modules and/or databases may be stored by and/or maintained in different memory units of software spread prediction platform 102 and/or by different computing devices that may form and/or otherwise make up software spread prediction platform 102. For example, memory 112 may have, host, store, and/or include software spread prediction module 112a, software spread prediction database 112b, and/or machine learning engine 112c.

Software spread prediction module 112a may have instructions that direct and/or cause software spread prediction platform 102 to identify the spread of malicious software, as discussed in greater detail below. Software spread prediction database 112b may store information used by software spread prediction module 112a and/or software spread prediction platform 102 in application of advanced techniques to perform software spread prediction, and/or in performing other functions. Machine learning engine 112c may be used by software spread prediction platform 102 and/or software spread prediction module 112a to refine and/or otherwise update methods for predicting the spread of malicious software, and/or other methods described herein.

FIGS. 2A-2F depict an illustrative event sequence for implementing a hybrid machine learning and knowledge graph method to predict the spread of malicious software in accordance with one or more example embodiments. Referring to FIG. 2A, at step 201, the software spread prediction platform 102 may collect historical pattern information for the spread of malicious software across a plurality of computing devices (which may, e.g., include the first computing device 104, the second computing device 105, and the third computing device 106). For example, the software spread prediction platform 102 may collect information about previous spreads of malicious software across the plurality of computing devices (e.g., which devices were affected, velocity of the spread, acceleration of the spread, types of affected devices, location of the malicious software, network traffic, date/time of attack, holiday information, and/or other information).

At step 202, the software spread prediction platform 102 may train a machine learning model using the historical pattern information. For example, the software spread prediction platform 102 may train the machine learning model to identify and/or otherwise compute a velocity and acceleration of malicious software based on the other historical pattern information. Additionally, the software spread prediction platform 102 may train the machine learning model to identify, based on the velocity and the acceleration of the malicious software, time horizons corresponding to a predicted spread of the malicious software. For example, in identifying the time horizons, the software spread prediction platform 102 may identify a first distance, within which computing devices may be affected by the malicious software by a first time period, a second distance, within which computing devices will be affected by the malicious software by a second time period, and so on. For example, in training the machine learning model, the software spread prediction platform 102 may train the following algorithm: v*t=d, where v is the velocity of the malicious software, t is the elapsed time, and d is a distance traveled (which may e.g., be measured in hops on the network, physical distance, and/or other metrics). In some instances, the velocity may increase by a constant acceleration (a), which may be factored into the algorithm as well. For example, v may be equal to a*t, and d may be equal to 0.5(a)(t)².

In addition, the software spread prediction platform 102 may generate a knowledge graph that includes the plurality of computing devices and relationships between the plurality of computing devices. For example, the knowledge graph may show a network path between two or more computing devices of the plurality of computing devices. In some instances, the plurality of computing devices may include one or more types of devices (e.g., servers, server blades, laptop computers, mobile devices, smartphones, tablets, desktop computers, database storage systems, and/or other devices).

At step 203, the software spread prediction platform 102 may identify malicious software at a computing device (e.g., first computing device 104). For example, the software spread prediction platform 102 may identify the malicious software based on feedback from a user and/or automatically based on virus detection software running at the first computing device or other means.

At step 204, the software spread prediction platform 102 may identify characteristics of the malicious software (e.g., which devices are affected, types of affected devices, location of the malicious software, network traffic, date/time of attack, holiday information, and/or other information). In some instances, the software spread prediction platform 102 may identify, based on any identified characteristics, that the malicious software has previously been identified, and may identify additional characteristics of the malicious software based on the previous identification (e.g., access historical characteristic information, which may have been stored after a previous identification). In other instances, the software spread prediction platform 102 may identity, based on any identified characteristics, that the malicious software has not previously been identified (e.g., it is a new virus or the like), but may identify similarities between the identified characteristics and characteristics of other previously identified malicious software (e.g., a similar but different virus). In these instances, the software spread prediction platform 102 may use the similar characteristics to extrapolate additional characteristics for the identified malicious software based on the previously identified malicious software.

Referring to FIG. 2B, at step 205, the software spread prediction platform 102 may input the characteristics of the malicious software into the machine learning model to identify predicted velocity and/or acceleration for the spread of the malicious software. In some instances, the machine learning model may predict the velocity and/or acceleration based on similarities between the identified malicious software and the historical pattern information. Additionally or alternatively, the software spread prediction platform 102 may identify a current velocity and/or acceleration for the spread of the malicious software (e.g., based on identified spread of the malicious software and a duration of time corresponding to the spread). After predicting the velocity and/or acceleration of the spread, the software spread prediction platform 102 may use the machine learning model to predict time horizons for the spread. For example, in predicting the time horizons, the software spread prediction platform 102 may predict a circumference within which the malicious software may spread to other devices within a first time period, a second, larger circumference within which the malicious software may spread to yet additional devices within a second time period, and so on (e.g., as illustrated by the concentric circles depicted on the administrator interface 405 in FIG. 4). In some instances, these time horizons may represent geographic regions, nodal/hop distances, and/or other distances. Although three time horizons are described herein, any number of time horizons may be identified without departing from the scope of the disclosure, and similarly, and number of devices may be located in each horizon (e.g., not limited to a single device).

At step 206, software spread prediction platform 102 may identify, based on the predicted time horizons and using the knowledge graph, devices corresponding to each time horizon. For example, the software spread prediction platform 102 may use the knowledge graph to identify subsets of the plurality of computing devices that are located within each time horizon. For example, as shown on the administrator interface 405 in FIG. 4, a first subset of the plurality of computing devices may be located within the first time horizon (e.g., first time horizon 410), a second subset of the plurality of computing devices may be located outside the first time horizon, but within the second time horizon (e.g., second time horizon 415), a third subset of the plurality of computing devices may be located outside the second time horizon but within the third time horizon (e.g., third time horizon 420), and so on. For example, the software spread prediction platform 102 may identify that the first computing device 104 is located within the first time horizon, the second computing device 105 is located within the second time horizon, and the third computing device 106 is located within the third time horizon. In some instances, in identifying the devices for each time horizon, the software spread prediction platform 102 may identify all devices located in the corresponding time horizons. In other instances, the software spread prediction platform 102 may identify a type of device being affected by the malicious software, and identify only devices of that type located in each of the corresponding time horizons.

At step 207, the software spread prediction platform 102 may establish a connection with the administrator user device 103. For example, the software spread prediction platform 102 may establish a first wireless data connection with the administrator user device 103 to link the software spread prediction platform 102 to the administrator user device 103 (e.g., in preparation for sending parameters of the spread of the malicious software). In some instances, the software spread prediction platform 102 may identify whether or not a connection is established with the administrator user device 103. If a connection is already established with the administrator user device 103, the software spread prediction platform 102 might not re-establish the connection. If a connection is not yet established with the administrator user device 103, the software spread prediction platform 102 may establish the first wireless data connection as described herein.

At step 208, the software spread prediction platform 102 may send the parameters of the spread of the malicious software to the administrator user device 103. For example, the software spread prediction platform 102 may send the parameters via the communication interface 113 and while the first wireless data connection is established. In some instances, the software spread prediction platform 102 may also send one or more commands directing the administrator user device 103 to display the parameters.

At step 209, administrator user device 103 may receive the parameters. For example, the administrator user device 103 may receive the parameters while the first wireless data connection is established. In some instances, the administrator user device 103 may receive the one or more commands directing the administrator user device 103 to display the parameters.

Referring to FIG. 2C, at step 210, based on or in response to the one or more commands directing the administrator user device 103 to display the parameters, the administrator user device 103 may display the parameters. For example, the administrator user device 103 may display a graphical user interface that depicts the time horizons overlaid on the knowledge graph, and/or a velocity, acceleration, and/or other information about the malicious software and/or the spread thereof.

At step 211, the software spread prediction platform 102 may identify one or more protective actions to be performed at each computing device based on the corresponding time horizons. For example, the software spread prediction platform 102 may identify a first protective action to be performed at the first computing device 104 because the first computing device 104 is within the first time horizon. In addition, the software spread prediction platform 102 may identify a second protective action to be performed at the second computing device 105 because the second computing device 105 is within the second time horizon, but outside the first time horizon. Similarly, the software spread prediction platform 102 may identify a third protective action for the third computing device 106 because the third computing device 106 is within the third time horizon, but outside the second time horizon. In some instances, the software spread prediction platform 102 may identify a more severe protective action for computing devices within the first time horizon than those in the second or third time horizons because the computing devices within the first time horizon may be at the greatest risk of infection by the malicious software in the shortest amount of time. For example, the software spread prediction platform 102 may identify that the first computing device 104 should be taken offline and quarantined, whereas the second computing device 105 should receive a firewall update (which may e.g., be created to block the malicious software), and the third computing device 106 should receive a notification about the malicious software. In some instances, in identifying the protective actions, the software spread prediction platform 102 may disconnect/quarantine a system, apply new/more secure firewalls, apply a most recent patch or anti-virus software, guide the virus to a virtual dead-end or sandbox (which may e.g., provide feedback to the machine learning model), allow the virus to pick up a Trojan horse, and/or identify other protective actions.

At step 212, the software spread prediction platform 102 may perform the protective actions identified at step 211. For example, the software spread prediction platform 102 may quarantine the first computing device 104, send a firewall update to the second computing device 105, and send a notification to the third computing device 106.

At step 213, the software spread prediction platform 102 may identify that the malicious software has spread to computing devices in a second time horizon. In some instances, the software spread prediction platform 102 may identify that the malicious software has actually spread to computing devices in the second time horizon. In other instances, the software spread prediction platform 102 might not identify actual spread, but may determine that a predicted time of the spread (e.g., a time corresponding to the second time horizon) has been reached. In some instances, based on the spread of the malicious software, the software spread prediction platform 102 may update and/or otherwise refine the time horizons and/or the machine learning model so as to increase accuracy. For example, if the software spread prediction platform 102 identifies that the malicious software is spreading faster or slower than anticipated, predictions may be adjusted.

Referring to FIG. 2D, at step 214, the software spread prediction platform 102 may identify updated protective actions. For example, the software spread prediction platform 102 may identify whether or not the devices in the first time horizon are still affected by the malicious software. If the software spread prediction platform 102 identifies that devices in the first time horizon have been cleared of the malicious software, and it may identify that these devices may be brought back online. If instead the software spread prediction platform 102 identifies that devices in the first time horizon have not been cleared of the malicious software, it may continue to perform a protective action to these devices (e.g., continue to quarantine these devices). However, now that the malicious software has spread to the devices in the second time horizon, the software spread prediction platform 102 may identify that the devices in the second time horizon should be quarantined (e.g., the protective action originally applied to the devices in the second time horizon should be replaced by the protective action originally applied to the devices in the first time horizon, or other more severe protective action), and the firewall update should be deployed to devices in the third time horizon (e.g., because they are now closer (physically and/or temporally) to being infected by the malicious software than when the initial protective actions were identified). In some instances, in identifying the updated protective actions, the software spread prediction platform 102 may identify that the malicious software should be steered to a secure sandbox (e.g., by quarantining/not quarantining particular devices so as to allow the malicious software to travel through a selected path of devices that ends in the sandbox). In doing so, the software spread prediction platform 102 may focus/concentrate protective computing resources on the most at risk/vulnerable sections of the network.

At step 215, the software spread prediction platform 102 may perform the updated protective actions identified at step 214. For example, the software spread prediction platform 102 may bring the first computing device 104 back online, quarantine the second computing device 105, and send a firewall update to the third computing device 106.

At step 216, the software spread prediction platform 102 may update the machine learning model based on the spread of the malicious software. For example, the software spread prediction platform 102 may compare the time of the detected spread to the time prediction of the second time horizon. The software spread prediction platform 102 may use any discrepancy between these times to refine the machine learning model (e.g., so as to output increasingly accurate time horizons). In some instances, the software spread prediction platform 102 may update future time horizons based on the accuracy of the third time horizon.

At step 217, the software spread prediction platform 102 may identify that the malicious software has spread to computing devices in a third time horizon. In some instances, the software spread prediction platform 102 may identify that the malicious software has actually spread to computing devices in the third time horizon, while in other instances, the software spread prediction platform 102 might not identify actual spread, but may determine that a predicted time of the spread (e.g., a time corresponding to the third time horizon) has been reached. In some instances, based on the spread of the malicious software, the software spread prediction platform 102 may update and/or otherwise refine the time horizons and/or the machine learning model so as to increase accuracy. For example, if the software spread prediction platform 102 identifies that the malicious software is spreading faster or slower than anticipated, predictions/time horizons may be adjusted.

Referring to FIG. 2E, at step 218, the software spread prediction platform 102 may identify further updated protective actions. For example, the software spread prediction platform 102 may identify that devices in the second time horizon have been cleared of the malicious software, and may be brought back online. However, now that the malicious software has spread to the devices in the third time horizon, the software spread prediction platform 102 may identify that the devices in the third time horizon should be quarantined. In some instances, in doing so, the software spread prediction platform 102 may perform similar actions as those described above at step 214. For example, the software spread prediction platform 102 may identify whether devices in the second time horizon have been cleared and, if so, stop the corresponding protective actions, and if not, continue to perform the corresponding protective actions. Similarly, the software spread prediction platform 102 may replace a previously performed protective action for the devices in the third time horizon with an updated, more severe protective action (e.g., from firewall updates to quarantine). In doing so, the software spread prediction platform 102 may focus/concentrate protective computing resources on the most at risk/vulnerable sections of the network.

At step 219, the software spread prediction platform 102 may perform the further updated protective actions identified at step 214. For example, the software spread prediction platform 102 may bring the second computing device 105 back online and quarantine the third computing device 106.

At step 220, the software spread prediction platform 102 may update the machine learning model based on the spread of the malicious software. For example, the software spread prediction platform 102 may compare the time of the detected spread to the time prediction of the third time horizon. The software spread prediction platform 102 may use any discrepancy between these times to refine the machine learning model (e.g., so as to output increasingly accurate time horizons). In some instances, the software spread prediction platform 102 may update future time horizons based on the accuracy of the third time horizon.

At step 221, the software spread prediction platform 102 may identify that the malicious software has spread to a secure sandbox. For example, as part of the above described protective actions, the software spread prediction platform 102 may implement network policies so as to steer the malicious software into a secure sandbox. In some instances, to do so, the software spread prediction platform 102 may intentionally quarantine/not quarantine certain devices so as to intentionally allow the malicious software to spread along a path that leads to the secure sandbox. In some instances, the software spread prediction platform 102 may identify a type of device to which the malicious software is spreading (e.g., only to database systems, or some other type of system). In these instances, the software spread prediction platform 102 may cause the secure sandbox to appear (from the perspective of the malicious software) to be the type of device to which the malicious software is spreading (e.g., to effectively lure the malicious software to the secure sandbox).

Referring to FIG. 2F, at step 222, the software spread prediction platform 102 may analyze the malicious software at the secure sandbox. For example, the software spread prediction platform 102 may allow the malicious software to play out in a manner that does not cause harm to the remaining devices on the network. Additionally or alternatively, the software spread prediction platform 102 may implement a Trojan horse (e.g., tracking information, generic customer information, and/or other information undetected by the malicious software) into the malicious software. In these instances, the software spread prediction platform 102 may use this information to track the malicious software back to its origin and/or identify additional information about the malicious software.

At step 223, the software spread prediction platform 102 may perform one or more additional protective actions based on the information identified at step 222. For example, the software spread prediction platform 102 may perform protective actions similar to those described above at steps 212, 215, and/or 219, and/or perform additional protective actions (e.g., notify an administrator of an entity identified as the source of the malicious software, and/or other actions). In some instances, the software spread prediction platform 102 may use this information to update the previously applied protective actions (e.g., based on identifying additional information about the spread). Additionally or alternatively, the software spread prediction platform 102 may use the information identified at the secure sandbox to update, refine, and/or otherwise provide feedback to the machine learning model.

Although the steering of the malicious software into a sandbox and the corresponding analysis are described at steps 222-223, these actions may be performed at other times during/throughout the above described event sequence without departing from the scope of the disclosure. For example, the software spread prediction platform 102 may steer the malicious software to a secure sandbox throughout the various time horizons and/or initially use the secure sandbox to identify information that may be fed into the machine learning model to identify the time horizons. Furthermore, although a single point of origination for the malicious software is described, any number of origination points may be identified (and thus time horizons for each origination point) may be identified without departing from the scope of the disclosure. Additionally, although three time horizons are described, any number of time horizons may be identified (and similarly, any number of devices within each time horizon) without departing from the scope of this disclosure.

FIG. 3 depicts an illustrative method for a hybrid machine learning and knowledge graph method to predict the spread of malicious software in accordance with one or more example embodiments. Referring to FIG. 3, at step 305, a computing platform having at least one processor, a communication interface, and memory may collect historical software spread pattern information. At step 310, the computing platform may use the historical software spread pattern information to train a machine learning model. At step 315, the computing platform may identify malicious software at a computing device. At step 320, the computing platform may identify software characteristics for the malicious software. At step 325, the computing platform may predict a plurality of time horizons based on the software characteristics and using the machine learning model. At step 330, the computing platform may identify computing devices in each of the plurality of time horizons. At step 335, the computing platform may send one or more parameters about the software spread to an administrator computing device for display. At step 340, the computing platform may identify one or more protective actions to perform for various computing devices across the plurality of time horizons. At step 345, the computing platform may perform the identified protective actions. At step 350, the computing platform may identify whether or not the malicious software has spread to a new time horizon. If the malicious software has not spread to a new time horizon, the computing platform may proceed to step 370. If the malicious software has spread to a new time horizon, the computing platform may proceed to step 355.

At step 355, the computing platform may identify one or more updated protective actions for the new time horizons. At step 360, the computing platform may perform the updated protective actions. At step 365, the computing platform may update the machine learning model based on the updated protective actions. At step 370, the computing platform may identify whether or not the malicious software has spread to a secure sandbox. If not, the computing platform may return to step 350. If so, the computing platform may proceed to step 375.

At step 375, the computing platform may identify additional characteristics of the malicious software by deploying tracking information into the malicious software. At step 380, the computing platform may perform one or more additional protective actions.

One or more aspects of the disclosure may be embodied in computer-usable data or computer-executable instructions, such as in one or more program modules, executed by one or more computers or other devices to perform the operations described herein. Generally, program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types when executed by one or more processors in a computer or other data processing device. The computer-executable instructions may be stored as computer-readable instructions on a computer-readable medium such as a hard disk, optical disk, removable storage media, solid-state memory, RAM, and the like. The functionality of the program modules may be combined or distributed as desired in various embodiments. In addition, the functionality may be embodied in whole or in part in firmware or hardware equivalents, such as integrated circuits, application-specific integrated circuits (ASICs), field programmable gate arrays (FPGA), and the like. Particular data structures may be used to more effectively implement one or more aspects of the disclosure, and such data structures are contemplated to be within the scope of computer executable instructions and computer-usable data described herein.

Various aspects described herein may be embodied as a method, an apparatus, or as one or more computer-readable media storing computer-executable instructions. Accordingly, those aspects may take the form of an entirely hardware embodiment, an entirely software embodiment, an entirely firmware embodiment, or an embodiment combining software, hardware, and firmware aspects in any combination. In addition, various signals representing data or events as described herein may be transferred between a source and a destination in the form of light or electromagnetic waves traveling through signal-conducting media such as metal wires, optical fibers, or wireless transmission media (e.g., air or space). In general, the one or more computer-readable media may be and/or include one or more non-transitory computer-readable media.

As described herein, the various methods and acts may be operative across one or more computing servers and one or more networks. The functionality may be distributed in any manner, or may be located in a single computing device (e.g., a server, a client computer, and the like). For example, in alternative embodiments, one or more of the computing platforms discussed above may be combined into a single computing platform, and the various functions of each computing platform may be performed by the single computing platform. In such arrangements, any and/or all of the above-discussed communications between computing platforms may correspond to data being accessed, moved, modified, updated, and/or otherwise used by the single computing platform. Additionally or alternatively, one or more of the computing platforms discussed above may be implemented in one or more virtual machines that are provided by one or more physical computing devices. In such arrangements, the various functions of each computing platform may be performed by the one or more virtual machines, and any and/or all of the above-discussed communications between computing platforms may correspond to data being accessed, moved, modified, updated, and/or otherwise used by the one or more virtual machines.

Aspects of the disclosure have been described in terms of illustrative embodiments thereof. Numerous other embodiments, modifications, and variations within the scope and spirit of the appended claims will occur to persons of ordinary skill in the art from a review of this disclosure. For example, one or more of the steps depicted in the illustrative figures may be performed in other than the recited order, and one or more depicted steps may be optional in accordance with aspects of the disclosure.

Claims

1. A computing platform comprising:

at least one processor;

a communication interface communicatively coupled to the at least one processor; and

memory storing computer-readable instructions that, when executed by the at least one processor, cause the computing platform to: generate a knowledge graph that includes a plurality of computing devices and indicates relationships between the plurality of computing devices; train, using historical information, a machine learning model, wherein training the machine learning model configures the machine learning model to predict time horizons for spread of malicious software, wherein each of the time horizons includes a subset of the plurality of computing devices; identify malicious software at one computing device of the plurality of computing devices; input characteristics of the malicious software into the machine learning model to produce time horizons for the identified malicious software; identify, using the knowledge graph and based on the time horizons for the identified malicious software, subsets of the plurality of computing devices, each corresponding to a particular time horizon; and perform, at a time within a first time horizon, a first security action for at least a first subset of the plurality of computing devices located within the first time horizon and a second security action for at least a second subset of the plurality of computing devices located within a second time horizon, wherein the first time horizon and the second time horizon indicate that the first subset of the plurality of computing devices will be affected by the identified malicious software prior to the second subset of the plurality of computing devices.

2. The computing platform of claim 1, wherein the plurality of computing devices includes devices of different device types.

3. The computing platform of claim 1, wherein training the machine learning model comprises training the machine learning model to predict the spread of malicious software based on velocity and acceleration of the malicious software.

4. The computing platform of claim 1, wherein inputting the characteristics of the malicious software into the machine learning model comprises:

identifying, that the identified malicious software has not previously been identified;

predicting, based on similarities between the identified malicious software and previously identified malicious software, characteristics for the malicious software; and

inputting, into the machine learning model, the characteristics.

5. The computing platform of claim 4, wherein inputting the characteristics of the malicious software into the machine learning model comprises:

identifying that the identified malicious software was previously identified;

accessing historical characteristic information for the identified malicious software; and

inputting, into the machine learning model, the historical characteristic information.

6. The computing platform of claim 5, wherein the characteristics and the historical characteristic information includes one or more of: location information, network traffic, date of attack, time of attack, or holiday information.

7. The computing platform of claim 1, wherein producing the time horizons for the malicious software comprises:

identifying, using the machine learning model, a velocity and an acceleration of the malicious software; and

identifying, using the machine learning model and based on the velocity and the acceleration, the time horizons, wherein the time horizons indicate predicted boundaries of a network that will be affected by the malicious software at various future times.

8. The computing platform of claim 1, wherein identifying the subsets of the plurality of computing devices, each corresponding to a particular time horizon comprises:

identifying, by applying the time horizons to the knowledge graph, computing devices that are predicted to be affected by the malicious software at different future times.

9. The computing platform of claim 1, wherein performing the first security action comprises quarantining the first subset of the plurality of computing devices based on identifying that the first subset of the plurality of computing devices is within the first time horizon, and wherein performing the second security action comprises sending a firewall update to the second subset of the plurality of computing devices based on identifying that the second subset of the plurality of computing devices is outside of the first time horizon and within the second time horizon, wherein the firewall update is configured to block the malicious software from affecting the second subset of the plurality of computing devices.

10. The computing platform of claim 9, wherein the memory stores additional computer-readable instructions that, when executed by the at least one processor, cause the computing platform to:

identify that the malicious software has moved from the first time horizon to the second time horizon;

based on identifying that the malicious software has moved from the first time horizon to the second time horizon, identifying whether or not the first subset of the plurality of computing devices remain affected by the malicious software;

if the first subset of the plurality of computing devices is no longer affected by the malicious software: stop performance of the first security action for the first subset of the plurality of computing devices, and replace performance of the second security action for the second subset of the plurality of computing devices with performance of the first security action; and

if the first subset of the plurality of computing devices is still affected by the malicious software: continue to perform the first security action for the first subset of the plurality of computing devices, and replace performance of the second security action for the second subset of the plurality of computing devices with performance of the first security action.

11. The computing platform of claim 10, wherein performing the first security actions and the second security actions further comprise steering the malicious software into a secure sandbox.

12. The computing platform of claim 11, wherein steering the malicious software into the secure sandbox comprises:

identifying a type of device affected by the malicious software, and

causing the secure sandbox to resemble the type of device affected by the malicious software.

13. The computing platform of claim 12, wherein the memory stores additional computer-readable instructions that, when executed by the at least one processor, cause the computing platform to:

analyze, at the secure sandbox, the malicious software to identify additional information about the malicious software; and

update the first security action and the second security action based on the additional information.

14. The computing platform of claim 13, wherein the memory stores additional computer-readable instructions that, when executed by the at least one processor, cause the computing platform to:

embed, at the secure sandbox and into the malicious software, tracking information, wherein the additional information about the malicious software is identified using the tracking information.

15. A method comprising

at a computing platform comprising: at least one processor, a communication interface, and memory: generating a knowledge graph that includes a plurality of computing devices and indicates relationships between the plurality of computing devices; training, using historical information, a machine learning model, wherein training the machine learning model configures the machine learning model to predict time horizons for spread of malicious software, wherein each of the time horizons includes a subset of the plurality of computing devices; identifying malicious software at one computing device of the plurality of computing devices; inputting characteristics of the malicious software into the machine learning model to produce time horizons for the identified malicious software; identifying, using the knowledge graph and based on the time horizons for the identified malicious software, subsets of the plurality of computing devices, each corresponding to a particular time horizon; and performing, at a time within a first time horizon, a first security action for at least a first subset of the plurality of computing devices located within the first time horizon and a second security action for at least a second subset of the plurality of computing devices located within a second time horizon, wherein the first time horizon and the second time horizon indicate that the first subset of the plurality of computing devices will be affected by the identified malicious software prior to the second subset of the plurality of computing devices.

16. The method of claim 15, wherein the plurality of computing devices includes devices of different device types.

17. The method of claim 15, wherein training the machine learning model comprises training the machine learning model to predict the spread of malicious software based on velocity and acceleration of the malicious software.

18. The method of claim 15, wherein inputting the characteristics of the malicious software into the machine learning model comprises:

identifying, that the identified malicious software has not previously been identified;

predicting, based on similarities between the identified malicious software and previously identified malicious software, characteristics for the malicious software; and

inputting, into the machine learning model, the characteristics.

19. The method of claim 18, wherein inputting the characteristics of the malicious software into the machine learning model comprises:

identifying that the identified malicious software was previously identified;

accessing historical characteristic information for the identified malicious software; and

inputting, into the machine learning model, the historical characteristic information.

20. One or more non-transitory computer-readable media storing instructions that, when executed by a computing platform comprising at least one processor, a communication interface, and memory, cause the computing platform to:

generate a knowledge graph that includes a plurality of computing devices and indicates relationships between the plurality of computing devices;

train, using historical information, a machine learning model, wherein training the machine learning model configures the machine learning model to predict time horizons for spread of malicious software, wherein each of the time horizons includes a subset of the plurality of computing devices;

identify malicious software at one computing device of the plurality of computing devices;

input characteristics of the malicious software into the machine learning model to produce time horizons for the identified malicious software;

identify, using the knowledge graph and based on the time horizons for the identified malicious software, subsets of the plurality of computing devices, each corresponding to a particular time horizon; and

perform, at a time within a first time horizon, a first security action for at least a first subset of the plurality of computing devices located within the first time horizon and a second security action for at least a second subset of the plurality of computing devices located within a second time horizon, wherein the first time horizon and the second time horizon indicate that the first subset of the plurality of computing devices will be affected by the identified malicious software prior to the second subset of the plurality of computing devices.