SECURITY POLICY ANALYSIS
Security policy analysis is disclosed. Configuration information, including at least one policy, associated with a live production security appliance, is received. The received configuration information is used to instantiate the policy in a sandbox environment. The sandbox environment is used to evaluate a proposed change to the configuration information, including by building a model using the received configuration information.
This application claims priority to U.S. Provisional Patent Application No. 63/459,494 entitled APPLICATION ACCESS ANALYZER filed Apr. 14, 2023, and claims priority to U.S. Provisional Patent Application No. 63/459,492 entitled SECURITY POLICY ANALYSIS-DEVOPS APPROACH filed Apr. 14, 2023, and claims priority to U.S. Provisional Patent Application No. 63/459,500 entitled TOPOLOGICAL CO-RELATION filed Apr. 14, 2023, each of which are incorporated herein by reference for all purposes.
BACKGROUND OF THE INVENTIONMalware is a general term commonly used to refer to malicious software (e.g., including a variety of hostile, intrusive, and/or otherwise unwanted software). Malware can be in the form of code, scripts, active content, and/or other software. Example uses of malware include disrupting computer and/or network operations, stealing proprietary information (e.g., confidential information, such as identity, financial, and/or intellectual property related information), and/or gaining access to private/proprietary computer systems and/or computer networks. Unfortunately, as techniques are developed to help detect and mitigate malware, nefarious authors find ways to circumvent such efforts. Accordingly, there is an ongoing need for improvements to techniques for identifying and mitigating malware.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
I. IntroductionA firewall generally protects networks from unauthorized access while permitting authorized communications to pass through the firewall. A firewall is typically a device, a set of devices, or software executed on a device that provides a firewall function for network access. For example, a firewall can be integrated into operating systems of devices (e.g., computers, smart phones, or other types of network communication capable devices). A firewall can also be integrated into or executed as one or more software applications on various types of devices, such as computer servers, gateways, network/routing devices (e.g., network routers), and data appliances (e.g., security appliances or other types of special purpose devices), and in various implementations, certain operations can be implemented in special purpose hardware, such as an ASIC or FPGA.
Firewalls typically deny or permit network transmission based on a set of rules. These sets of rules are often referred to as policies (e.g., network policies or network security policies). For example, a firewall can filter inbound traffic by applying a set of rules or policies to prevent unwanted outside traffic from reaching protected devices. A firewall can also filter outbound traffic by applying a set of rules or policies (e.g., allow, block, monitor, notify or log, and/or other actions can be specified in firewall rules or firewall policies, which can be triggered based on various criteria, such as are described herein). A firewall can also filter local network (e.g., intranet) traffic by similarly applying a set of rules or policies.
Security devices (e.g., security appliances, security gateways, security services, and/or other security devices) can include various security functions (e.g., firewall, anti-malware, intrusion prevention/detection, Data Loss Prevention (DLP), and/or other security functions), networking functions (e.g., routing, Quality of Service (QOS), workload balancing of network related resources, and/or other networking functions), and/or other functions. For example, routing functions can be based on source information (e.g., IP address and port), destination information (e.g., IP address and port), and protocol information.
A basic packet filtering firewall filters network communication traffic by inspecting individual packets transmitted over a network (e.g., packet filtering firewalls or first generation firewalls, which are stateless packet filtering firewalls). Stateless packet filtering firewalls typically inspect the individual packets themselves and apply rules based on the inspected packets (e.g., using a combination of a packet's source and destination address information, protocol information, and a port number).
Application firewalls can also perform application layer filtering (e.g., application layer filtering firewalls or second generation firewalls, which work on the application level of the TCP/IP stack). Application layer filtering firewalls or application firewalls can generally identify certain applications and protocols (e.g., web browsing using HyperText Transfer Protocol (HTTP), a Domain Name System (DNS) request, a file transfer using File Transfer Protocol (FTP), and various other types of applications and other protocols, such as Telnet, DHCP, TCP, UDP, and TFTP (GSS)). For example, application firewalls can block unauthorized protocols that attempt to communicate over a standard port (e.g., an unauthorized/out of policy protocol attempting to sneak through by using a non-standard port for that protocol can generally be identified using application firewalls).
Stateful firewalls can also perform state-based packet inspection in which each packet is examined within the context of a series of packets associated with that network transmission's flow of packets. This firewall technique is generally referred to as a stateful packet inspection as it maintains records of all connections passing through the firewall and is able to determine whether a packet is the start of a new connection, a part of an existing connection, or is an invalid packet. For example, the state of a connection can itself be one of the criteria that triggers a rule within a policy.
Advanced or next generation firewalls can perform stateless and stateful packet filtering and application layer filtering as discussed above. Next generation firewalls can also perform additional firewall techniques. For example, certain newer firewalls sometimes referred to as advanced or next generation firewalls can also identify users and content (e.g., next generation firewalls). In particular, certain next generation firewalls are expanding the list of applications that these firewalls can automatically identify to thousands of applications. Examples of such next generation firewalls are commercially available from Palo Alto Networks, Inc. (e.g., Palo Alto Networks' PA Series firewalls). For example, Palo Alto Networks' next generation firewalls enable enterprises to identify and control applications, users, and content—not just ports, IP addresses, and packets-using various identification technologies, such as the following: APP-ID for accurate application identification, User-ID for user identification (e.g., by user or user group), and Content-ID for real-time content scanning (e.g., controlling web surfing and limiting data and file transfers). These identification technologies allow enterprises to securely enable application usage using business-relevant concepts, instead of following the traditional approach offered by traditional port-blocking firewalls. Also, special purpose hardware for next generation firewalls (implemented, for example, as dedicated appliances) generally provide higher performance levels for application inspection than software executed on general purpose hardware (e.g., such as security appliances provided by Palo Alto Networks, Inc., which use dedicated, function specific processing that is tightly integrated with a single-pass software engine to maximize network throughput while minimizing latency).
Advanced or next generation firewalls can also be implemented using virtualized firewalls. Examples of such next generation firewalls are commercially available from Palo Alto Networks, Inc. (e.g., Palo Alto Networks' VM Series firewalls, which support various commercial virtualized environments, including, for example, VMware® ESXi™ and NSX™, Citrix® Netscaler SDX™, KVM/OpenStack (Centos/RHEL, Ubuntu®), and Amazon Web Services (AWS)) as well as CN Series container next generation firewalls. For example, virtualized firewalls can support similar or the exact same next-generation firewall and advanced threat prevention features available in physical form factor appliances, allowing enterprises to safely enable applications flowing into, and across their private, public, and hybrid cloud computing environments. Automation features such as VM monitoring, dynamic address groups, and a REST-based API allow enterprises to proactively monitor VM changes dynamically feeding that context into security policies, thereby eliminating the policy lag that may occur when VMs change.
Overview of Techniques for an Application Access AnalyzerGenerally, existing Information Technology (IT) operations have to go through thousands to millions of logs and a multitude of devices in enterprise infrastructures to identify application connectivity issues for users or groups of users. Troubleshooting and debugging connectivity issues typically require domain knowledge expertise, such as network architecture, routing/switching, server configuration, understanding of complex network security policies, and vendor specific Operating System (OS) and Command Line Interface (CLI) knowledge. As such, this significantly increases the person hours and mean time to detect and resolve the application connectivity issues.
Specifically, identifying Software as a Service (SaaS)/Private application connectivity issues in a large network infrastructure is technically challenging due to vast areas of domains where thorough check and analysis is generally required. This leads to a significant increase in mean time to detect and remediate issues in SaaS/Private application (App) connectivity issues, particularly in a large network infrastructure. As such, many Secure Access Service Edge (SASE) providers and enterprise organizations are attempting to solve this problem through different ways using artificial intelligence (AI) and/or Machine Learning (ML) technology. Automating detection and remediation of application connectivity issues can reduce the Mean Time To Recovery (MTTR) and operational costs to the organization. Further, providing a solution that facilitates an automated detection and remediation of application connectivity issues can help SASE providers to increase their customer base with quality product and customer satisfaction.
Accordingly, new and improved solutions that facilitate an application access analyzer are disclosed with respect to various embodiments.
Specifically, an Application Access Analyzer (AAA) is disclosed that provides an interface (e.g., a natural language (NL) query interface) to operators (e.g., IT/admin, such as for an IT help desk or other technology support personnel/users) to detect application reachability, connectivity, and access/permission issues. The disclosed AAA facilitates auto remediation. As an example, the AAA provides an actionable verdict for a query submitted by the operator with comprehensive details of analysis and checks performed in different categories (e.g., distinct domains, including user/endpoint analysis, networking analysis, and security policy analysis, such as further described below). Specifically, the AAA auto-discovers the network topology that a given user (e.g., the user(s) specified in the query) uses to access a given application (e.g., the SaaS/Private App specified in the query), analysis of operational state of an underlying network infrastructure, a user authentication analysis, checks on health and reachability of Domain Name System (DNS) and Authentication (Auth) servers that the user reaches before accessing the application, and security policy reasoning specific to the user or user groups for any access/permission issues.
Actionable verdict, root cause analysis, and pinpointing the problem significantly reduces the mean time to resolve application connectivity issues. Actionable verdict, root cause analysis, and pinpointing the problem also saves the hassle and time operators would be required to otherwise perform by following a runbook/playbook and debugging multiple devices, which generally requires domain knowledge expertise.
As an example, the disclosed AAA can be used for checking connectivity issues between one or more of the following: (1) a user, users, and/or a group of users to a SaaS application from mobile user gateways; (2) a user, users, and/or a group of users to a private application hosted on premise data centers or on a remote branch office; and (3) a user, users, and/or a group of users to remote site connectivity to a remote branch or data center.
In some embodiments, a system/process/computer program product for an application access analyzer (AAA) includes monitoring access to an application over a network; automatically determining a root cause of an issue (e.g., an anomaly in network connectivity, performance degradation, and/or a permission denial and/or policy blocking) associated with the access to the application over the network for a user using an application access analyzer; and performing an action in response to determining the root cause of the issue associated with the access to the application over the network for the user.
In one embodiment, the disclosed application access analyzer (AAA) can be used to determine a root cause of an application access issue by correlating a plurality of data sources across a plurality of domains (e.g., network, authentication, DNS, SaaS/Private App health, security policy configuration, etc.) using AI and ML as will be further described below.
In one embodiment, the disclosed AAA can be used to automatically detect an anomaly in network connectivity and/or a performance degradation (e.g., an anomaly in network connectivity and/or a performance degradation, such as based on configurable thresholds for determining reachability and/or performance degradation to given apps for a user(s) based on their location/access point) as will be further described below.
In one embodiment, the disclosed AAA can be used to generate human consumable/understandable and actionable verdict analysis that greatly reduces the mean time to detect and remediate application connectivity issues as will be further described below.
In one embodiment, the disclosed AAA can be used to perform an exhaustive analysis of various troubleshooting domains within a short period of time (e.g., a few minutes), which would otherwise typically require many hours to troubleshoot each domain, such as will be further described below.
In one embodiment, the disclosed AAA can be used to perform an analysis that includes identifying issues in a network infrastructure, customer network services, client connectivity issues, SaaS/private application (app) health, and reachability issues as will be further described below. For example, the disclosed AAA can provide an actionable summary of each troubleshooting domain, and the operator does not need to have domain knowledge expertise to detect and remediate the issue(s).
In one embodiment, the disclosed AAA can automatically discover (autodiscover) a network topology that would be used by a user to access the application and perform analysis for possible application access issues.
In one embodiment, the disclosed AAA can be used to provide a security posture evaluation by building a unified logical model of computation for security policies of the firewall.
In one embodiment, the disclosed AAA can be used for managing and maintaining the track of network topology issues, configuration issues with networking, network services, and security policy, which can often be cumbersome and error prone, such as will be further described below. For example, the disclosed AAA can provide a comprehensive analysis of each of these domains with a convenient natural language (NL) query interface.
In one embodiment, the disclosed AAA incorporates domain knowledge in the form of playbooks and can perform playbook analysis through execution of Directed Acyclic Graphs (DAGs) (e.g., implemented as computational DAGs as further described below).
In one embodiment, the disclosed AAA can be used to significantly reduce operational and support costs for enterprises and their users for accessing their SaaS/Private Apps.
In an example implementation, the disclosed AAA is implemented as a Prisma AI Operations (AIOPs) platform that provides proactive service level management across customers globally and is designed for use by Network Operations Center (NOC) personnel supporting SASE customers, such as will be further described below. Specifically, the Prisma AIOPs platform provides proactive monitoring, alerting, problem isolation, and playbook-driven remediation to provide SLA (MTTK/I, MTTR) as desired/required by customers.
Accordingly, new and improved security solutions that facilitate an application access analyzer are disclosed in accordance with some embodiments.
These and other embodiments and examples for an application access analyzer (AAA) will be further described below.
Example System Environments for an Application Access AnalyzerAccordingly, in some embodiments, the disclosed techniques include providing a security platform (e.g., the security function(s)/platform(s) can be implemented using a firewall (FW)/Next Generation Firewall (NGFW), a network sensor acting on behalf of the firewall, or another (virtual) device/component that can implement security policies using the disclosed techniques, such as PANOS executing on a virtual/physical NGFW solution commercially available from Palo Alto Networks, Inc. or another security platform/NFGW, including, for example, Palo Alto Networks' PA Series next generation firewalls, Palo Alto Networks' VM Series virtualized next generation firewalls, and CN Series container next generation firewalls, and/or other commercially available virtual-based or container-based firewalls can similarly be implemented and configured to perform the disclosed techniques) configured to provide DPI capabilities (e.g., including stateful inspection), for example, which can be provided in part or in whole as a SASE security solution, in which the cloud-based security solution (e.g., SASE) can be monitored using the disclosed techniques for an application access analyzer, as further described below.
“Malware” as used herein refers to an application that engages in behaviors, whether clandestinely or not (and whether illegal or not), of which a user does not approve/would not approve if fully informed. Examples of malware include ransomware, Trojans, viruses, rootkits, spyware, hacking tools, etc. One example of malware is a desktop/mobile application that encrypts a user's stored data (e.g., ransomware). Another example of malware is C2 malware, such as similarly described above. Other forms of malware (e.g., keyloggers) can also be detected/thwarted using the disclosed techniques for sample traffic based self-learning malware detection as will be further described herein.
Techniques described herein can be used in conjunction with a variety of platforms (e.g., servers, computing appliances, virtual/container environments, desktops, mobile devices, gaming platforms, embedded systems, etc.) and/or for automated detection of a variety of forms of malware (e.g., new and/or variants of malware, such as C2 malware, etc.). In the example environment shown in
Data appliance 102 is configured to enforce policies regarding communications between client devices, such as client devices 104 and 106, and nodes outside of enterprise network 140 (e.g., reachable via external network 118). Examples of such policies include ones governing traffic shaping, quality of service, and routing of traffic. Other examples of policies include security policies such as ones requiring the scanning for threats in incoming (and/or outgoing) email attachments, web site content, files exchanged through instant messaging programs, and/or other file transfers. In some embodiments, data appliance 102 is also configured to enforce policies with respect to traffic that stays within enterprise network 140.
An embodiment of a data appliance is shown in
Functionality described herein as being performed by data appliance 102 can be provided/implemented in a variety of ways. For example, data appliance 102 can be a dedicated device or set of devices. The functionality provided by data appliance 102 can also be integrated into or executed as software on a general purpose computer, a computer server, a gateway, and/or a network/routing device. In some embodiments, at least some services described as being provided by data appliance 102 are instead (or in addition) provided to a client device (e.g., client device 104 or client device 110) by software executing on the client device.
Whenever data appliance 102 is described as performing a task, a single component, a subset of components, or all components of data appliance 102 may cooperate to perform the task. Similarly, whenever a component of data appliance 102 is described as performing a task, a subcomponent may perform the task and/or the component may perform the task in conjunction with other components. In various embodiments, portions of data appliance 102 are provided by one or more third parties. Depending on factors such as the amount of computing resources available to data appliance 102, various logical components and/or features of data appliance 102 may be omitted and the techniques described herein adapted accordingly. Similarly, additional logical components/features can be included in embodiments of data appliance 102 as applicable. One example of a component included in data appliance 102 in various embodiments is an application identification engine which is configured to identify an application (e.g., using various application signatures for identifying applications based on packet flow analysis). For example, the application identification engine can determine what type of traffic a session involves, such as Web Browsing-Social Networking; Web Browsing-News; SSH; and so on.
As shown, data appliance 102 comprises a firewall, and includes a management plane 232 and a data plane 234. The management plane is responsible for managing user interactions, such as by providing a user interface for configuring policies and viewing log data. The data plane is responsible for managing data, such as by performing packet processing and session handling.
Network processor 236 is configured to receive packets from client devices, such as client device 108, and provide them to data plane 234 for processing. Whenever flow module 238 identifies packets as being part of a new session, it creates a new session flow. Subsequent packets will be identified as belonging to the session based on a flow lookup. If applicable, SSL decryption is applied by SSL decryption engine 240. Otherwise, processing by SSL decryption engine 240 is omitted. Decryption engine 240 can help data appliance 102 inspect and control SSL/TLS and SSH encrypted traffic, and thus help to stop threats that might otherwise remain hidden in encrypted traffic. Decryption engine 240 can also help prevent sensitive content from leaving enterprise network 140. Decryption can be controlled (e.g., enabled or disabled) selectively based on parameters such as: URL category, traffic source, traffic destination, user, user group, and port. In addition to decryption policies (e.g., that specify which sessions to decrypt), decryption profiles can be assigned to control various options for sessions controlled by the policy. For example, the use of specific cipher suites and encryption protocol versions can be required.
Application identification (APP-ID) engine 242 is configured to determine what type of traffic a session involves. As one example, application identification engine 242 can recognize a GET request in received data and conclude that the session requires an HTTP decoder. In some cases, such as a web browsing session, the identified application can change, and such changes will be noted by data appliance 102. For example, a user may initially browse to a corporate Wiki (classified based on the URL visited as “Web Browsing-Productivity”) and then subsequently browse to a social networking site (classified based on the URL visited as “Web Browsing-Social Networking”). Distinct types of protocols have corresponding decoders.
Based on the determination made by application identification engine 242, the packets are sent, by threat engine 244, to an appropriate decoder configured to assemble packets (which may be received out of order) into the correct order, perform tokenization, and extract out information. Threat engine 244 also performs signature matching to determine what should happen to the packet. As needed, SSL encryption engine 246 can re-encrypt decrypted data. Packets are forwarded using a forward module 248 for transmission (e.g., to a destination).
As also shown in
Returning to
Suppose data appliance 102 has intercepted an email sent (e.g., by system 120) to a user, “Alice,” who operates client device 104. In this example, Alice receives the email and clicks on the link to a phishing/compromised site that could result in an attempted download of malware 130 by Alice's client device 104. However, in this example, data appliance 102 can perform the disclosed techniques for sample traffic based self-learning malware detection and block access from Alice's client device 104 to the packed malware content and to thereby preempt and prevent any such download of malware 130 to Alice's client device 104. As will be further described below, data appliance 102 performs the disclosed techniques for sample traffic based self-learning malware detection, such as further described below, to detect and block such malware 130 from harming Alice's client device 104.
In various embodiments, data appliance 102 is configured to work in cooperation with security platform 122. As one example, security platform 122 can provide to data appliance 102 a set of signatures of known-malicious files (e.g., as part of a subscription). If a signature for malware 130 is included in the set (e.g., an MD5 hash of malware 130), data appliance 102 can prevent the transmission of malware 130 to client device 104 accordingly (e.g., by detecting that an MD5 hash of the email attachment sent to client device 104 matches the MD5 hash of malware 130). Security platform 122 can also provide to data appliance 102 a list of known malicious domains and/or IP addresses, allowing data appliance 102 to block traffic between enterprise network 140 and C2 server 150 (e.g., where C&C server 150 is known to be malicious). The list of malicious domains (and/or IP addresses) can also help data appliance 102 determine when one of its nodes has been compromised. For example, if client device 104 attempts to contact C2 server 150, such attempt is a strong indicator that client 104 has been compromised by malware (and remedial actions should be taken accordingly, such as quarantining client device 104 from communicating with other nodes within enterprise network 140).
As will be described in more detail below, security platform 122 can also receive a copy of malware 130 from data appliance 102 to perform cloud-based security analysis for performing sample traffic based self-learning malware detection, and the malware verdict can be sent back to data appliance 102 for enforcing the security policy to thereby safeguard Alice's client device 104 from execution of malware 130 (e.g., to block malware 130 from access on client device 104).
A variety of actions can be taken by data appliance 102 if no signature for an attachment is found, in various embodiments. As a first example, data appliance 102 can fail-safe, by blocking transmission of any attachments not allow-listed as benign (e.g., not matching signatures of known good files). A drawback of this approach is that there may be many legitimate attachments unnecessarily blocked as potential malware when they are in fact benign. As a second example, data appliance 102 can fail-danger, by allowing transmission of any attachments not block-listed as malicious (e.g., not matching signatures of known bad files). A drawback of this approach is that newly created malware (previously unseen by platform 122) will not be prevented from causing harm. As a third example, data appliance 102 can be configured to provide the file (e.g., malware 130) to security platform 122 for static/dynamic analysis, to determine whether it is malicious and/or to otherwise classify it.
Security platform 122 stores copies of received samples in storage 142 and analysis is commenced (or scheduled, as applicable). One example of storage 142 is an Apache Hadoop Cluster (HDFS). Results of analysis (and additional information pertaining to the applications) are stored in database 146. In the event an application is determined to be malicious, data appliances can be configured to automatically block the file download based on the analysis result. Further, a signature can be generated for the malware and distributed (e.g., to data appliances such as data appliances 102, 136, and 148) to automatically block future file transfer requests to download the file determined to be malicious.
In various embodiments, security platform 122 comprises one or more dedicated commercially available hardware servers (e.g., having multi-core processor(s), 32G+of RAM, gigabit network interface adaptor(s), and hard drive(s)) running typical server-class operating systems (e.g., Linux). Security platform 122 can be implemented across a scalable infrastructure comprising multiple such servers, solid state drives, and/or other applicable high-performance hardware. Security platform 122 can comprise several distributed components, including components provided by one or more third parties. For example, portions or all of security platform 122 can be implemented using the Amazon Elastic Compute Cloud (EC2) and/or Amazon Simple Storage Service (S3). Further, as with data appliance 102, whenever security platform 122 is referred to as performing a task, such as storing data or processing data, it is to be understood that a sub-component or multiple sub-components of security platform 122 (whether individually or in cooperation with third party components) may cooperate to perform that task. As one example, security platform 122 can optionally perform static/dynamic analysis in cooperation with one or more virtual machine (VM) servers, such as VM server 124.
An example of a virtual machine server is a physical machine comprising commercially available server-class hardware (e.g., a multi-core processor, 32+Gigabytes of RAM, and one or more Gigabit network interface adapters) that runs commercially available virtualization software, such as VMware ESXi, Citrix XenServer, or Microsoft Hyper-V. In some embodiments, the virtual machine server is omitted. Further, a virtual machine server may be under the control of the same entity that administers security platform 122, but may also be provided by a third party. As one example, the virtual machine server can rely on EC2, with the remainder portions of security platform 122 provided by dedicated hardware owned by and under the control of the operator of security platform 122. VM server 124 is configured to provide one or more virtual machines 126-128 for emulating client devices. The virtual machines can execute a variety of operating systems and/or versions thereof. Observed behaviors resulting from executing applications in the virtual machines are logged and analyzed (e.g., for indications that the application is malicious). In some embodiments, log analysis is performed by the VM server (e.g., VM server 124). In other embodiments, analysis is performed at least in part by other components of security platform 122, such as a coordinator 144.
In various embodiments, security platform 122 makes available results of its analysis of samples via a list of signatures (and/or other identifiers) to data appliance 102 as part of a subscription. For example, security platform 122 can periodically send a content package that identifies malware files, including for network traffic based heuristic IPS malware detection, etc. (e.g., daily, hourly, or some other interval, and/or based on an event configured by one or more policies). The subscription can cover the analysis of just those files intercepted by data appliance 102 and sent to security platform 122 by data appliance 102, and can also cover signatures of malware known to security platform 122.
In various embodiments, security platform 122 is configured to provide security services to a variety of entities in addition to (or, as applicable, instead of) an operator of data appliance 102. For example, other enterprises, having their own respective enterprise networks 114 and 116, and their own respective data appliances 136 and 148, can contract with the operator of security platform 122. Other types of entities can also make use of the services of security platform 122. For example, an Internet Service Provider (ISP) providing Internet service to client device 110 can contract with security platform 122 to analyze applications which client device 110 attempts to download. As another example, the owner of client device 110 can install software on client device 110 that communicates with security platform 122 (e.g., to receive content packages from security platform 122, use the received content packages to check attachments in accordance with techniques described herein, and transmit applications to security platform 122 for analysis).
In various embodiments, analysis system 300 makes use of lists, databases, or other collections of known safe content and/or known bad content (collectively shown in
In various embodiments, when a new sample is received for analysis (e.g., an existing signature associated with the sample is not present in analysis system 300), it is added to queue 302. As shown in
Coordinator 304 monitors queue 302, and as resources (e.g., a static analysis worker) become available, coordinator 304 fetches a sample from queue 302 for processing (e.g., fetches a copy of malware 130). In particular, coordinator 304 first provides the sample to static analysis engine 306 for static analysis. In some embodiments, one or more static analysis engines are included within analysis system 300, where analysis system 300 is a single device. In other embodiments, static analysis is performed by a separate static analysis server that includes a plurality of workers (i.e., a plurality of instances of static analysis engine 306).
The static analysis engine obtains general information about the sample, and includes it (along with heuristic and other information, as applicable) in a static analysis report 308. The report can be created by the static analysis engine, or by coordinator 304 (or by another appropriate component) which can be configured to receive the information from static analysis engine 306. As an example, static analysis of malware can include performing a signature-based analysis. In some embodiments, the collected information is stored in a database record for the sample (e.g., in database 316), instead of or in addition to a separate static analysis report 308 being created (i.e., portions of the database record form the report 308). In some embodiments, the static analysis engine also forms a verdict with respect to the application (e.g., “safe,” “suspicious,” or “malicious”). As one example, the verdict can be “malicious” if even one “malicious” static feature is present in the application (e.g., the application includes a hard link to a known malicious domain). As another example, points can be assigned to each of the features (e.g., based on severity if found; based on how reliable the feature is for predicting malice; etc.) and a verdict can be assigned by static analysis engine 306 (or coordinator 304, if applicable) based on the number of points associated with the static analysis results.
Once static analysis is completed, coordinator 304 locates an available dynamic analysis engine 310 to perform dynamic analysis on the application. As with static analysis engine 306, analysis system 300 can include one or more dynamic analysis engines directly. In other embodiments, dynamic analysis is performed by a separate dynamic analysis server that includes a plurality of workers (i.e., a plurality of instances of dynamic analysis engine 310).
Each dynamic analysis worker manages a virtual machine instance (e.g., emulation/sandbox analysis of samples for malware detection, such as the above-described C2 malware detection based on monitored network traffic activity). In some embodiments, results of static analysis (e.g., performed by static analysis engine 306), whether in report form (308) and/or as stored in database 316, or otherwise stored, are provided as input to dynamic analysis engine 310. For example, the static report information can be used to help select/customize the virtual machine instance used by dynamic analysis engine 310 (e.g., Microsoft Windows 7 SP 2 vs. Microsoft Windows 10 Enterprise, or iOS 11.0 vs. iOS 12.0). Where multiple virtual machine instances are executed at the same time, a single dynamic analysis engine can manage all of the instances, or multiple dynamic analysis engines can be used (e.g., with each managing its own virtual machine instance), as applicable. As will be explained in more detail below, during the dynamic portion of the analysis, actions taken by the application (including network activity) are analyzed.
In various embodiments, static analysis of a sample is omitted or is performed by a separate entity, as applicable. As one example, traditional static and/or dynamic analysis may be performed on files by a first entity. Once it is determined (e.g., by the first entity) that a given file is malicious, the file can be provided to a second entity (e.g., the operator of security platform 122) specifically for additional analysis with respect to the malware's use of network activity (e.g., by a dynamic analysis engine 310).
The environment used by analysis system 300 is instrumented/hooked such that behaviors observed while the application is executing are logged as they occur (e.g., using a customized kernel that supports hooking and logcat). Network traffic associated with the emulator is also captured (e.g., using pcap). The log/network data can be stored as a temporary file on analysis system 300, and can also be stored more permanently (e.g., using HDFS or another appropriate storage technology or combinations of technology, such as MongoDB). The dynamic analysis engine (or another appropriate component) can compare the connections made by the sample to lists of domains, IP addresses, etc. (314) and determine whether the sample has communicated (or attempted to communicate) with malicious entities.
As with the static analysis engine, the dynamic analysis engine stores the results of its analysis in database 316 in the record associated with the application being tested (and/or includes the results in report 312 as applicable). In some embodiments, the dynamic analysis engine also forms a verdict with respect to the application (e.g., “safe,” “suspicious,” or “malicious”). As one example, the verdict can be “malicious” if even one “malicious” action is taken by the application (e.g., an attempt to contact a known malicious domain is made, or an attempt to exfiltrate sensitive information is observed). As another example, points can be assigned to actions taken (e.g., based on severity if found; based on how reliable the action is for predicting malice; etc.) and a verdict can be assigned by dynamic analysis engine 310 (or coordinator 304, if applicable) based on the number of points associated with the dynamic analysis results. In some embodiments, a final verdict associated with the sample is made based on a combination of report 308 and report 312 (e.g., by coordinator 304).
Application Access Visibility Using an Application Access Analyzer (AAA)Multiple computing components/entities and network connections between these different computing components/entities generally makes it technically challenging for a customer (e.g., a customer Network Operations Center (NOC) and/or IT/helpdesk personnel) to determine a root cause for any application connectivity issues. As a primary focus for SASE/Prisma Access 406, such as shown at 408, the disclosed techniques for the Application Access Analyzer provide an automated tool for the customer/customer NOC to analyze and detect potential access issues for a user(s)/group of users to access one or more applications (e.g., SaaS/Private Apps), such as will be further described below with respect to various embodiments.
Specifically, the disclosed techniques for the Application Access Analyzer (AAA) addresses various technical problems as will now be described. Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR) for application (App) access issues are typically in hours, which can increase application downtime and adversely impact productivity of the customers/users and revenue for enterprises. Troubleshooting and debugging generally requires domain knowledge expertise. Further, co-relating and tracking multiple factors to perform root cause analysis (RCA) is often cumbersome and error prone when performed manually.
For example, an enterprise and/or cloud service provider having multiple hosted network services, large network infrastructure, and complex security policy configuration can encounter significant challenges to reduce MTTR of App access issues.
As another example, identifying RCA in an enterprise organization can generally require comprehensive checks of various domains, such as network connectivity, infrastructure reachability, infrastructure availability, and security policy reasoning.
The disclosed Application Access Analyzer (AAA) provides an effective and efficient solution to the above-described problems, as will be further described below.
Referring to
In an example implementation, the AAA Service (608) provides an automated solution for isolating faults and reducing mean time to detect and remediate issues. Specifically, the AAA Service checks for issues with the following: User Authentication; App Access Topology; Network Services (e.g., DNS, Auth Servers, etc.); SASE Access Nodes (e.g., Prisma Access Nodes, such as Mobile Gateways (MUs), Portals, Remote Networks (RNs), Service Connections (SCs), etc.); Network Reachability/Connectivity (e.g., Routes, etc.); Security Policy Analysis (e.g., Formal Methods, such as for validating permissions/access to a network/services/resource, etc.); Logs from various different sources (e.g., SASE/PA nodes, VPN/GP logs, Traffic logs, etc.); and/or Known Incidents (e.g., known ISP outages, Cloud Provider outages, Internal SASE/PA issues including underlay connectivity problems, etc.) impacting the connectivity.
As similarly described above, the AAA Service can also automatically generate a human consumable and an actionable verdict (e.g., a summary report/alert). The analysis can cover the following: (1) Infrastructure Issues (e.g., SASE/PA internal tunnels, nodes, underlay routing, overlay routing, etc.); (2) Customer Network Services Issues (e.g., Reachability to a DNS server, LDAP, Radius, etc.); (3) Client Connectivity Issues, including VPN/GP Client Connectivity Issues (e.g., the AAA Service can utilize the ADEM (agent details and MTR) logs for analyzing client connectivity issues (e.g., ADEM is an endpoint agent-based solution that is commercially available from Palo Alto Networks, Inc. headquartered in Santa Clara, CA, or another commercially available or open source endpoint agent can be similarly used)); (4) SaaS Apps Connectivity Issues, including SaaS Apps Reachability Issues; and (5) Private Apps Connectivity Issues, including Private Apps Reachability Issues.
Referring to
AAA service 608 is in network communication with a Cloud Storage 604 (e.g., a cloud-based data store, such as commercially available cloud-based storage solutions from Google Cloud, AWS, or another vendor can be used). User Auth/Traffic Analysis component 612 is in network communication with BigQuery CDL Databases 618 (e.g., storing traffic logs). Network Access Analysis component 614 is in communication with Cosmos Databases 620. Cosmos Databases 620 include a BigQuery database, a Cloud SQL database, and a Graph database as shown in
AAA service 608 is also in network communication with a PA AIOPs Data Services component 602. Specifically, AAA Service 608 is in network communication with PA AIOPs Data Services component 602 via a publish/subscribe (PubSub) communication mechanism as shown at 606.
As also shown in
For example, the disclosed AAA Service can be used for checking connectivity between the following: (1) a User/Users/User Group to a SaaS application; (2) a User/Users/User Group to a Private Application hosted on premise data centers or on a remote branch office; (3) a User/Users/User Group to remote site connectivity (e.g., Remote branch (RN) or Data Center (SC)); (4) a Site to a Network; and (5) a Site to another Site.
At 631, the Data service receives a ‘user to app’ connectivity query string. The UI accepts NLQ queries as similarly described above with respect to
At 632, the Data service creates a folder (e.g., a Google Cloud Service (GCS) folder) for each request and creates an entry in the AAA Query BigQuery (BQ) table. The UI can get the query status from the App Access Analyzer Query table. The Data service then posts the query string along with the GCS folder info to the AAA Service through a PubSub message. The final results of analysis are updated in the GCS folder and BQ table.
At 633, the AAA Service parses the query string and invokes one or more playbooks to analyze the user/users to application connectivity.
At 634, the Playbook Engine/Authentication Analysis Playbook gathers user authentication information. The results are published in the GCS folder and BQ table, and the playbook status is updated in the BigQuery table.
At 635, the Playbook Engine/Network Connectivity Analysis utilizes the network service analysis to check for network connectivity between a requested source and destination, and the verdicts can be updated as shown at 635A and 635B. The network service analysis is run for the following: Analyzing Network Services endpoint (e.g., DNS server, Auth Server, etc.) connectivity; and App Connectivity (e.g., verifying user to app connectivity). The network connectivity analysis uses the following sources for analysis: Instance Status, Tunnel Status, Instance Metrics, etc. (e.g., available on the Cosmos Platform); Cortex Data Lake (CDL) logs (e.g., a data repository for storing user-app traffic logs); and Firewall routing information. The results from the analysis and the Playbook status are stored in the GCS folder and BQ table.
In an example implementation, the User to Application Connectivity analysis utilizes the following Playbooks: (1) User Authentication Analysis; (2) Network Service Connectivity Analysis; (3) Network Service Security Policy Analysis; (4) User Network Connectivity Analysis; and (5) User Security Policy Analysis, such as will now be further described below.
In this example implementation, the User Authentication Analysis Playbook analyzes firewall auth logs for user authentication status. The User Authentication Analysis Playbook utilizes the following input parameters:
The User Authentication Analysis Playbook returns the user auth status, device information, and gateway information for performing further analysis.
In this example implementation, the Network Service Connectivity Analysis Playbook utilizes the following input parameters:
The network service IP addresses for the Auth Server, DNS server, etc. are fetched from the user provided configuration.
In this example implementation, the User Network Connectivity Analysis Playbook utilizes the following input parameters:
For example, the DNS lookup on the firewall can map to multiple IP addresses. The connectivity check analysis is performed for all the IP addresses. If all of the IP addresses are reachable, then the Connectivity check passes. If any IP address reachability fails, then a Partial failure with associated analysis is returned in the result. If all IP addresses are not reachable, then the Connectivity check fails.
In this example implementation, the AAA Service calls a Formal Security Policy Analysis component (e.g., implemented as a Formal Security Policy Analysis library) with the following input parameters:
The formal method security analysis function returns the following result.
The AAA Service uses the security policy summary results to determine whether the security policy allows or denies access.
In this example implementation, the AAA Service checks for all user configured DNS servers. The AAA Service depends on the ADEM probe (ping and curl) test results (e.g., active probing for performing health analysis of Apps, such as SaaS/Private Apps) and collects unique DNS servers from the test results and checks or performs for the following: (1) Connectivity of the DNS server from each ingress node (e.g., Mobile Gateway (MU) or Remote Network (RN)) instance; (2) L3 forwarding path trace for the DNS server by running test FIB lookup command for each unique DNS server IP discovered in the test result; (3) Updates the topology for the DNS server connectivity based on L3 forwarding path; and (4) Queries each ingress firewall instance to look up match-rules for each DNS server. The DNS analysis result is returned in the result dictionary under key ‘DNS’ as follows: for each DNS server, the result includes match-rules highlighting which domain names are resolved by a particular DNS server, the L3 forwarding result, and security policy (if any) that prevents connectivity to the DNS server.
In this example implementation, the AAA Service depends on the ADEM test result to check the Auth server connectivity. Specifically, the AAA Service queries the ADEM ping/curl test results for unique auth servers. The AAA Service summarizes the Auth server status for each ingress node. The auth server results are returned in the result dictionary under key “auth.”
At 636, the Security Policy Analysis performs a Formal Method analysis of security policies for the following: Network Service Endpoint connectivity; and App connectivity, and the verdicts can be updated as shown at 636A and 636B. The results from the analysis and the Playbook status are stored in the GCS folder and BQ table.
At 637, the AAA Service updates the status in the GCS folder and BQ table based on the results received from each playbook.
At 638, the AAA Service summarizes the connectivity analysis with the final results and updates the analysis status as completed.
III. Security Policy Analysis IntroductionFormal methods are techniques, often supported by tools, for developing software and hardware systems. Given a model of a system (often an abstract model of a system), and a language to specify desired properties of the system, proofs can be generated to exhaustively verify that the specified properties are satisfied. If the proof is carried out substantially by a machine, the verification can be referred to as automatic (also referred to as “automatic reasoning”). Formal verification is applicable to circuits, protocols, software, etc. As will be discussed variously below, it can also be applied to permissions and/or other security policy related information/systems, such as privileged identity management (PAM) and identity access management (IAM). By coupling a model with a language, one is able to not only analyze a system, but also optimize it and reason about it. This allows for semantic modeling of intent, and building a model of computation for an entire security policy (and/or networking configuration), as will be described in more detail below.
DPV generally has concrete/compiled/structural/instantiated views. An example deployment is one performed in LUTs/security processing nodes (SPNs) and uses a poll-and-process architecture. CPV can answer questions, with guarantees about current and future state of the network. DPV can usually answer a subset for only the current state of the network. CPV can provide proactive guarantees, before the configuration is pushed/deployed, unlike DPV.
Referring to Figure
-
- Reachability: Can retail store branch (RSB) talk to credit card app (CCA)? ASSERT (RSB can talk to CCA). Is the DNS server globally reachable?
- Connectivity: ASSERT (RSB can talk to CCA on TCP port 443).
- Isolation/Security: Segmentation: Are two subsets, tenants, or application groups isolated from each other with respect to all traffic? Is all communication within specified boundaries secure? ASSERT (RSB can talk to CCA ONLY on TCP port 443). ASSERT (HVAC network cannot talk to CCA).
- Security/Audit/Compliance: Can (or has) an unencrypted packet go (gone) between two branch offices now (in the past six months)?
- Fairness: Do spine routers treat all destinations identically?
- Robustness: Will any interface failure lead to connectivity loss?
- Reliability: What is the impact of an external event on internal connectivity?
Examining an individual rule, P2, within the security policy: where the traffic originates from the RSB, and the destination is any of: RSB, CCA, or D1 (another specified destination), and also the traffic is over port 443, the traffic should be allowed.
Firewalls are provided with security policies such as are shown in
The policy shown in
{HVAC, RSB, CCA, S1, S2, S3}×{HVAC, RSB, CCA, D1}×{443, p1, p2}→{allow, deny}
In this example, the source field can take one of six values (i.e., has cardinality six), and the destination can have one of four values. The result is a three-dimensional Boolean space mapping to a two dimensional Boolean space, mapping to a two dimensional Boolean space, into a single action (B3×B2×B2→B).
The sets can be enumerated-converted into an if-else like clause which describes how the firewall is operating. An example is shown in
As will be described in more detail below, a policy analyzer tool (authored in an appropriate scripting language in some embodiments, such as python) can consume pre-, post-, and default security policies, along with various metadata, and build a model. It can handle addresses and address groups (nested). It can handle applications and application groups (nested). It can handle services and service groups. It can resolve DNS queries. The policy analyzer can be used for a variety of purposes, including post change continuous analysis and a variety of pre change simulations. In an example implementation, the policy analyzer tool makes use of two components: a frontend normalizer and a backend solver. The frontend normalizer (also referred to as a canonicalization layer) consumes behavioral/realized specifications and builds models of computation for a system based off of different input formats. Policies are normalized as propositions/propositional formulae. Three examples of source of information for frontend normalization include RDS XML, exports from Panorama (e.g., of dynamic lists), and exports from a firewall.
The backend solver models the specification as a predicate (e.g., first-order/predicate logic). Hierarchical aggregation of individual policies is expressed in propositional logic. True/false answers are returned as a function of symbolic inputs, and various logical operations are available (e.g., ==>, <==>, ∀, ∃). A variety of technologies can be used for a backend solver. One example is the CVC4 open source public domain theorem prover for satisfiability modulo theories (SMT) problems. Another example is the Z3 Theorem Prover. The backend solver is often dependent on domain specific data structures, which allows for algorithms and structured query languages to be layered on top.
Policies are written, in some embodiments, in terms of strings. An example way of expressing various aspects of the policy shown in
-
- src=HVAC is sterm=mkTerm (EQUAL, src, “HVAC”)
- dst=HVAC,D1 is dterm=mkTerm (OR, mkTerm (EQUAL, dst, “HVAC”), mkTerm (EQUAL, dst, “D1”))
- P1=mkTerm (AND, sterm, dterm)
- Policy=mkTerm (Or, P1, mkTerm (AND, mkTerm (NOT, P1), P2))
- etc.
These approaches can help reduce operational costs, both for a provider of security services (e.g., operating security platform 122) and a consumer of those services (e.g., an enterprise customer, such as one operating network 140). The techniques described herein can enhance troubleshooting through proactive detection and remediation, including by looking at the policy and configuration space. The policies involved can be security policies, plain network configurations, and/or policy at a very behavioral abstract level (e.g., as seen in an RDS store or state as deployed within firewalls). A way of expressing this is as the Cartesian product: (Security+Network)×(Policy/Config+State). These approaches can also be used to determine enforcement status (e.g., whether an intent has been deployed/realized as expected). And, to reduce change risk (determining the net repercussions of large changes, identifying whether undiscovered contradictions within the proposed change will actually increase the attack surface). The approaches can also be used in the compliance/audit context: allowing for automated reasoning about policies and being able to provide strong guarantees about correctness. And, the approaches can be used to help with historical troubleshooting (e.g., a configuration from three weeks ago appeared correct, but the current configuration has a problem—how to perform a differential analysis). Yet another way the approaches described herein can be used is to examine multi-domain/transitive analysis scenarios).
In some embodiments, system 122 includes a core formal engine that supports a structured query language against a complete policy object model. Various example demonstrations described herein can be executed using embodiments of the core formal engine.
Example: Shadowing Analysis: Probable Vs. Definitive
Suppose there exists a Policy 100, defined as follows:
-
- Policy 100: user= {a,b,c}, action=Allow; Logs: hits=0
This policy states that if traffic from any of users A, B, or C, is seen, their traffic should be allowed to pass through. However, in this scenario, suppose that according to logs, there are zero hits for this policy. One reason this could happen is because one of the policies in the range 1-99 shadows Policy 100 (i.e., the higher up policy is more permissive/expressive than Policy 100, making Policy 100 redundant). Examples of policies where this could occur are as follows:
-
- Policy 35: user= {a,b}, action=Allow; Logs: hits=54
- Policy 47: user= {b,c}, action=Allow; Logs: hits=5
- Policy 100: user= {a,b,c}, action=Allow; Logs: hits=0
In the above scenario, Policy 35 covers users A and B, while Policy 47 covers users B and C. Collectively, Policies 35 and 47 cover users A, B, and C, meaning that Policy 100 will never be triggered.
A second reason that a rule could receive zero hits is that traffic to date has not triggered the policy. It could be because the rule was recently added, and/or that the traffic is rare (e.g., applying to a user that rarely logs in, an application that is rarely used, etc.).
By using formal methods, it can be provably and exhaustively established whether Policy 100 is shadowed. And, if so, a recommendation can be made (e.g., by security platform 122 to an administrator of data appliance 102) to eliminate the policy. Further, those policies that are collectively responsible for the shadowing can be presented (e.g., surfacing that Policy 100 is shadowed by the combination of Policy 35 and Policy 47). In this example, all three policies have an allow action (the actions are aligned). It could also be the case, however, that Policy 35 and 47 are deny actions, while Policy 100 is an allow (or some other combination). Formal methods can identify this situation as well.
Example: Contra-ShadowingPolicies are typically evaluated from top to bottom, with the highest priority rule being examined first. In some situations, however, low priority rules may be more permissive.
Suppose there exists a set of policies as follows:
-
- Policy 1 user= {a}, action=Allow; Stats: user=a, hits=10
- Policy 2: user= {a,b}, action=Allow; Stats: user=b, hits=5
- Policy 3: user= {a,b,c}, action=Deny; Stats: user=c, hits=3
Policy 1 is very specific to user A. Policy 2 covers users A and B, but since Policy 1 covers user A, Policy 2 will only trigger with respect to user B. Similarly, while Policy 3 covers users A, B, and C, it will only trigger with respect to user C. This is a potential case of three progressively coarser policies. The second policy could have explicitly been written just for B and the third policy could have been written explicitly just for C. Here, there is a contradiction in intent because the first two policies recommend that users A and B be allowed to access the system, while policy for C is set to deny. Contra-shadowing analysis, which can be performed by embodiments of security platform 122 can establish that Policy 2 contra-shadows Policy 1 (is an intent over-specification) and that Policy 3 contra-shadows Policies 1 and 2 (is an intent contradiction). The analysis can be performed in a live system (e.g., making recommendations to an administrator and guiding the administrator to make any desired changes) and can also be performed in a non-interactive, offline/periodic basis.
In the interface shown in
As the analysis continues, another problem is discovered, as illustrated in
Another example of a shadowed policy is shown in
In addition to identifying instances of shadowing and contra-shadowing in existing policies, another feature provided in various embodiments is the ability to perform queries. One example of query usage is to perform connectivity analysis (e.g., given a policy specification, which if any existing policies would be contradicted). The following is an example of a query that can be used to determine how a user accesses Instagram. The query is constructed that lists the corporate private network as the source (representing 224 addresses), and Instagram's IP address is the destination. Other information, such as FQDN information can also be used. The action is allow, the type of traffic is any type, and some additional arguments are included (e.g., source zone, destination zone). APP-ID can be used, if desired, to specify particular applications or types of applications (e.g., “social-networking”). String fields can be regular expressions (e.g., source_user==*alice*). As needed, pre-processing and object/active directory/LDAP resolution is performed:
-
- --query source==10.0.0.0/8 destination==157.240.229.174 action==Allow type==all--query-args source_zone==trust destination_zone==untrust application==instagram service==application-default
One way of executing the query is by clicking on the “Query” button of
Features described above, such as shadowing analysis and connectivity analysis, along with a structured query language can be used to evaluate a proposed change as a query against, or in the context of, a current policy. Example use cases for policy change management include performing a check before a rule is created/updated/deleted, performing a check after a rule is created/updated/deleted, and performing “what if” analysis (e.g., trying out a rule before applying it).
Suppose a new employee (e.g., in the marketing department), Nancy Ram, would like to be able to access Instagram from the corporate network. Interface 2000 can be used to determine whether a new policy that an administrator intends to add is redundant (i.e., Nancy is already able to access Instagram without any new rules being made). In the example shown in
-
- --policy source==10.0.0.0/8 destination==157.240.229.174 action==Allow--policy-args source_user==nram@paloaltonetworks.com source_zone==trust destination_zone==untrust application==instagram service==application-default
As with the example shown in
As illustrated in
Suppose an enterprise has a set of internal Solarwinds servers and desires traffic control between other internal servers (on the 10.0.0.0/8 subnet) and the Solarwinds servers. The following is an example of a query that can be run:
-
- --query source==Solarwinds_servers_alias destination== (10.0.0.0/8-Solarwinds_servers_alias) action==Deny type==all
In this example, “Solarwinds_servers_alias” is an alias that expands to approximately 400 source IP addresses. The destination is all other servers (10.0.0.0/8 less those in the alias). The goal of the query is to determine which (if any) rules are implicated by such traffic). Suppose the following policies are returned:
-
- Query partial block: Policy A Inter FW Rule
- Query partial block: Policy B Solarwinds_Monitor_lowrisk_Custom
- Query partial block: Policy C Solarwinds_monitoring_to_Internal_low-risk
- Query partial block: Policy D Solarwind-to-Internal_SSL_Any
- Query success: Policy E SolarWinds_To_Internal_Block
A partial block means that some traffic is allowed between the Solarwinds servers and other internal servers. Policies A-D correspond to various low risk/monitoring activities. Policy E is a catchall that blocks the remainder of traffic. From an audit or security posture standpoint, the purpose behind each of Policies A-D can be defended (i.e., allowing traffic for limited purposes and in limited contexts), and the “query success” result for Policy E indicates that all other traffic is blocked. This scenario is also a contra-shadow (also referred to as a reverse shadow) scenario in that Policy E blocks everything (after the four previous allows). However, this is an intentional policy choice which can be confirmed during an audit.
In various embodiments, security platform 122 includes a repository of invariants which administrators (e.g., of network 140) can modify/augment from current policies, queries, standards of practice, etc. The invariants can be used to periodically check policy (e.g., every morning at 4 am) to make sure that policy drift has not occurred. Examples of such invariants include: block all Tor traffic except for members of the research group on research nodes, block all Whatsapp file transfers, guest WiFi in retail stores cannot access the data center, etc. If any checks fail, an alert can be generated, a report can be provided, etc.
At 2202, configuration information is received. As mentioned above, examples of such configuration include security rules/policies (e.g., extracted live from executing firewalls, copies of historical configuration information, etc.) and other configuration information (e.g., address objects, filters, service resolution information, active directory information, LDAP information, etc.). At 2204, the received configuration information is used to build a model. As described above, an example way of building a model is by using an SMT solver. At 2206, the model is used to perform a policy analysis. Various types of analysis are described above (e.g., shadow analysis, contra-shadow analysis, pre- and post-change management analysis, query analysis, on-demand policy simulation, sandbox analysis, etc.) and additional information about performing the various types of analysis are described throughout this Specification. Finally, at 2208 a result of performing the policy analysis is provided. An example of such a result is shown in
Enterprise customers deploy firewalls to protect their network infrastructure and applications. They specify firewall security policies that determine what traffic will be allowed and what traffic will not be allowed. An example firewall policy comprises multiple rules. Typically, customers provide different rules based on types of traffic. In the following discussion, suppose the enterprise is a clothing brand, ACME Clothes. One example policy would allow fashion designers to access applications meant for the purpose of apparel design. Another example policy prevents those fashion designers from accessing financial documents or data center resources. The enterprise provides, explicitly, rules which allow/deny access to various sources and destinations. Sources/destinations can be specified by network identifiers (e.g., source and destination IP subnets). Sources/destinations can also be specified using group or other dynamic identifiers (e.g., members of the fashion designer group vs. members of the finance department; Windows 10 computers; bring-your-own-device devices, etc.).
One common problem is that, over time, an enterprise might have hundreds or thousands of rules in their security policy, particularly accumulating rules as network administrators join and then leave the enterprise. A significant number of stale rules can build, some of which may provided broader access than what is needed/desired, but with no efficient way to locate such rules. Some rules are redundant. As an example, suppose Becky is a fashion designer at ACME. While there is a rule allowing members of the fashion designer group to access apparel related applications, the security policy may also have a rule explicitly allowing Becky to access those applications. One reason for this could be that Becky joined the company early on-before there was an explicit fashion designer group. Later, when a rule applicable to all fashion designers was added, a cleanup to remove the line item for Becky did not occur. As a related example, Becky changes roles from fashion designer to sales. While her group membership would change, the vestigial line item would allow her to continue to have access to apparel applications even though such access might be inappropriate for her new role. As yet another example, suppose Becky remains a member of the fashion design team, but also gets promoted to a management role. In that case she might be a member of both the fashion designer group, and also a management group. There could exist two conflicting rules governing Becky's ability to access sensitive financial information. There could be one rule that blocks access to financial servers/applications to fashion designers and another rule that allows such access to managers. In such a scenario, what is the intent? Should Becky be allowed to access the financial servers/applications or not? An approach described variously throughout this Specification is to perform automated, verifiable analysis on policies to help enterprises ensure that their policies are current and relevant, and do not have conflicting intents.
For any rule change (adding a rule, deleting a rule, changing a priority order of a rule, etc.) there is a specific point in time at which the change occurs. Relative to that change, there is a “pre-world”-before the policy change is rolled out to production, it can be modeled/simulated to help determine whether making the change will create problems. This is also referred to as “shifting left.” In spite of various tools, bugs inevitably will be introduced (e.g., due to human error). Further, because enterprise networks are dynamic environments, it can be the case that a rule/policy that was previously operating as expected, is now causing a problem (e.g., where a member of one department joins another department but corresponding updates to group membership aren't made). “Shifting right” can help address these situations. After a policy is rolled out, ongoing monitoring and analysis of the policy can help detect issues.
Use Case: Shift Right—Formal Modeling Based Analysis (Post Change Continuous Analysis)Suppose an inadvertent change has occurred to ACME's security policy posture. The change could be due to human error, or due to a change in the environment. Continuous monitoring/policy analysis can be performed, e.g., any time a change is pushed, or on a recurring basis (e.g., once every 24 hours). In an example embodiment, using a tool such as Prisma Access, information can be collected from all of the firewalls (or other data appliances) deployed by the enterprise (i.e., a collection of firewalls). The policies are analyzed for errors (e.g., using techniques described throughout the Specification). For each detected error (also referred to as a policy anomaly), an incident can be generated. As applicable, additional contextual information can also be provided, such as how much traffic was associated with the anomalous rule or set of rules. One example of an anomaly is a policy having priority 10 that has its intent completely covered by a policy having priority 5 (e.g., a rule allowing fashion designers to access a resource at position 5 and a rule allowing Alice (who is a fashion designer) access to the same resource at position 10). Another example of an anomaly is a pair of conflicting policies (e.g., both allowing and denying access to a particular resource to Alice). An incident can be automatically generated that provides, e.g., a name for the anomaly type, shows both rules, shows any traffic hitting such rules, enumerates any implicated users/groups, implicated address objects, etc. The incident can be integrated into a ticketing system so that an appropriate member of a security operations center, network operations center, IT support, etc. can investigate the incident and attempt to fix the problem. An example way of fixing the problem would be to identify that, for example, rule 10 is redundant over rule 5 and should be removed. The administrator can manually make the change (e.g., deleting rule 10) and can also use a guided tool to automate the change (e.g., clicking on a suggested remedy—to delete the rule). Once the change is made, the ticket can be updated as “resolved,” e.g., after the operator confirms that the change successfully resolved the problem.
Another type of anomaly is a hit count anomaly. In this scenario, a particular rule is not in use—there is no traffic matching the rule for an extended period of time (e.g., 30 days, 60 days, or 120 days of traffic). In this case, the rule can also be flagged as anomalous/an incident generated for an operator to investigate.
Use Case: Shift Left—on Demand Policy Simulation (New Rule Intent Satisfaction Analysis)Suppose there exists a policy for ACME branch users. In the policy is a rule that states that fashion designers should have access to apparel design applications. When a new employee, Hank, is hired into that role, part of the new hire checklist IT follows includes granting him access to the application. It is possible that the operator responsible for granting access is unaware that a rule already exists to grant members of the fashion design group access. So, instead, the operator creates a new rule explicitly granting Hank access. As additional employees join, the operator continues adding rules granting them access individual (continuing to be unaware that a group-applicable rule already exists). In another example, suppose an employee, Ed, states that he needs access to a sensitive application (e.g., a financial application). Ed is a member of the engineering team. The IT operator assigned to Ed's request reviews the request, decides it seems legitimate, and adds a rule granting Ed access to the application. As with the previous scenario, suppose there is an existing rule that the operator is not aware of that blocks access to the financial application to members of the engineering team. Now there is an intent conflict.
With continuous monitoring, such issues (e.g., newly created duplicate rules and/or newly created conflicting rules) will be caught (e.g., within 24 hours) as anomalies and incidents/tickets can be opened to correct them. An alternate approach to addressing these types of situations is for the operator to use a “New Intent Analysis” feature (e.g., provided in an interface by security platform 122). By using this tool, before committing a new rule to production, the operator can identify mistakes before they are made. With the tool, the operator proposes a change (e.g., explicitly grant access to Hank as a new Policy 202). The system then evaluates the policy to see if the proposed change is redundant (i.e., access is already granted by existing policy), using modeling/analysis techniques described herein. If so, the system responds that the rule is not needed and provides a reason (e.g., Hank is already granted access by rule 33). The report could also indicate that the proposed rule contradicts an existing rule. As an example, suppose that while there exists a rule granting access to fashion designers (rule 33), there is also a rule denying access to new hires (rule 20). Before the operator is able to determine whether or not Hank should be granted access (due to the seeming conflict), he can perform further research. As an example, the rule denying access to new hires may have been implemented at the request of the legal department—to ensure that all new hire paperwork has been received/signed before access should be allowed to any system (or various systems/applications). In this scenario, the contradiction may be intentional and serve a purpose. Or, the contradiction could be an error (e.g., the priority order of the rules may be incorrect- and a rule granting access to fashion designers should take higher priority over a rule blocking access to new hires). Either way, the operator can be alerted that inserting a rule for Hank (without further investigation) is potentially not desirable.
Use Case: Shift Left—on Demand Policy Simulation (Production Anomaly Policy Analysis and Hit Count Analysis)One approach to keeping security policies current/correct is to automatically perform continuous/periodic analysis (shift right). In an example implementation, production policies are pulled whenever changes are committed and/or at regular intervals (e.g., every six, twelve, or twenty four hours), analyzed, and any detected anomalies can be automatically inserted into a ticketing system.
Some enterprises (e.g., international banks) prefer to rely on a dedicated team of security policy analysts to handle policy management/audits. Instead of continuous monitoring, the team evaluates policies every three or six months (i.e., whatever cadence they use when evaluating their security policies). In between reviews, there may be many stale, redundant, conflicting, or otherwise problematic rules that are created.
Suppose the enterprise has one set of policies for branches (e.g., individual bank locations) and another set of policies for mobile users (e.g., employees working from home or frequently travel). Each policy has an associated set of rules. One feature provided by embodiments of security platform 122 is the ability to do an on-demand batch analysis of a policy (e.g., against historical information). As an example, the security policy team could run a report determining, over the last three months, which policies had zero hit counts, which policies have conflicts, etc. Instead of creating individual incidents (e.g., via integration with a ticketing system) to be addressed by an operator, the security policy team can use the information included in the report to inform which actions they should take (e.g., make changes to the production policy or ignore certain identified issued). After the changes are made, the security policy team can re-run the on-demand analysis and determine whether they have successfully resolved the issues that they intended to resolve (and determine whether new problems have arisen as a result of their changes).
Use Case: Shift Left—on Demand Policy Simulation (Security Policy Sandbox Anomaly Analysis)Suppose an administrator instantiates a sandbox on Monday using a branch user policy. Over the course of a week, the administrator makes various changes to the sandboxed policy (e.g., granting different branches access to different resources based on jurisdiction-such as a GDPR-compliant version for European branches). The administrator would like to confirm that their changes in the sandbox have not created new anomalies (e.g., do not create redundancies, do not create conflicts, etc.) before pushing the sandbox modifications to production. In various embodiments, security platform 122 provides the ability for the operator to perform policy anomaly analysis against the sandbox (e.g., by providing an identification of the sandbox and requesting analysis). Once analysis is complete, the operator is provided with a report of the anomalies identified in the sandbox. The operator can then make further changes in the sandbox and run analysis again, to confirm whether the identified anomalies are now resolved (and/or whether the fixes have surfaced or introduced new problems). The operator can iteratively request analysis and make adjustments until the operator is satisfied with the sandboxed policy, at which point the operator can push the policy to production during an appropriate change window.
Use Case: Shift Left—on Demand Policy Simulation (Security Policy Anomaly Incident Resolution Using Sandbox)When an anomaly is detected in production (e.g., through post change continuous analysis described above), one option is for the incident to be directly resolved in production. For example, if a determination is made (e.g., as part of a nightly job) that a redundant rule was added to a production system, an incident can be created and automatically added to a ticketing system for an operator to resolve (and, for example, assigned an incident ID number such as incident #10382). The operator, reviewing the information, can then choose to delete the redundant rule (e.g., based on a suggested recommendation provided by the policy analysis system or based on the operator's own judgment) during a change window. The operator might be lucky, and the change might fix the problem (which will be confirmed, e.g., during the next routine policy analysis). Unfortunately, the operator may also be unlucky. It could be the case that instead of deleting the redundant rule, the operator mistypes the rule number, deleting an adjacent rule. Now the production system has two problems: the originally identified redundant rule remains (i.e., the anomaly identified as incident #10382), and also a rule that should not have been deleted was deleted—in production.
Changes made in production that do not fix the problem (and potentially introduce new problems) can be very expensive. An alternate approach is for the operator to create a sandbox (using the production rules) and make the change(s) the operator believes will address the identified anomalies. The operator can then submit the sandbox policy for analysis (e.g., using an incident resolution analysis feature provided by security platform 122). In an example embodiment, the operator provides an identification of the sandbox, and a list of incident(s) (e.g., incident #10382) that the operator believes are resolved. Security platform 122 performs policy analysis on the sandbox and generates a report (e.g., indicating whether incident #10382 and/or any other enumerated incidents are resolved by the changes made in the sandbox, whether the problem(s) remain, and/or whether new incidents are detected, as well as reasons for the determinations). The operator can iterate (making changes in the sandbox and re-running incident resolution analysis) until the operator is satisfied. At that point, the operator can push the sandbox changes to production.
Use Case: User-Group Normalization and Formal ModelingAs mentioned above, when building a formal model, various information is used as input, including security policies and other information (e.g., address objects, filters, etc.). Rules can generally be thought of as having one of two types. The first type is a network-style rule, e.g., enumerating source/destination information using network constructs such as IP addresses, subnets, etc. The second type uses information such as user/device information and application information. Unfortunately, building a model using the second type of information can be challenging. Suppose a first rule specifies that employees are allowed to access a fashion design application. A second rule specifies that fashion designers are allowed to access the fashion design application. A third rule specifies that new hires are not allowed to access the fashion design application. Each of these rules is a group based rule. In this scenario, three user groups exist: employees, fashion designers, and new hires. How can security platform 122 determine whether there is a redundancy across these three rules? The source column for the first rule contains a string, “employees.” The source column for the second rule contains another string, “fashion-designers.” The source column for the third rule contains a string, “new-hires.” Naïvely, comparing the three source values would seem to indicate that no redundancy exists, because the strings are different. Just examining the string values is insufficient to identify redundancies. Instead, each group needs to be broken down into a normalized list of its respective membership. Similarly, individual users can be included in rules in a variety of ways (e.g., email address, active directory name, wildcards such as “FirstName=Jeff, LastName=*” etc.). Those names also need to be normalized/canonicalized. An example approach is as follows.
First, any individually specified users are normalized. Second, groups are broken down into user lists (similarly normalized). Third, overlaps between groups are determined. Finally, the model can be built, using the normalized names and any identified overlaps as applicable.
Use Case: Policy Sandbox (Multiple Sandbox Per Operator with Edit, Production Refresh, Annotation, and Push to Production)
In the following discussion, suppose a retail enterprise (e.g., a home improvement chain) would like to deploy a new suite of applications for use by employees at retail locations that provide functionality such as inventory tracking, timekeeping, return processing, etc. Instead of deploying the new application suite company-wide (e.g., across 3,000 stores), a handful of stores in various locations are selected (e.g., ten stores on the West Coast, ten stores on the East Coast, ten stores in Canada, etc.) for a pilot.
Corporate IT would like to gain information about how the application suite is performing (e.g., are sales improving, are employees adopting the tools provided by the suite, etc.). One task corporate IT will perform as part of the project is to specify a set of restricted branches (i.e., those in the pilot) for which access to the application suite should be granted. One approach that corporate IT might take is to define a “pilot” group that includes the various pilot locations, and to grant access to the new applications to the networks/devices at those locations. Corporate IT might also explicitly block access to the suite for the other 2,970 locations. Adding these rules can be particularly complex—as an example, where within existing corporate policy (which may comprise several hundred rules or more) should new allow/block rules be inserted (i.e., at what priority). A variety of anomalies are likely to arise, particularly given the complexity likely to be involved in implementing a pilot.
In various embodiments, security platform 122 provides a policy sandbox feature. With the feature, an operator (e.g., in IT) with permissions to update security policy can ask security platform 122 to create a policy sandbox (e.g., instantiated using a copy of the branch security policy currently executing in production) or multiple policy sandboxes (e.g., one for branch policy and one for mobile policy). The operator can then modify the policy in the sandbox, adding new rules and/or moving/deleting/editing existing rules. Further, multiple operators can each have access to sandboxes (whether as a shared resource, such as a team sandbox) or individual sandboxes (e.g., with two operators each having three sandboxes).
Suppose the operator working on the branch pilot has made various changes in a sandbox and is ready to push the changes to production. The operator requests a change ticket and gets approval (e.g., from 1 am to 3 am on Sunday). One situation that occur is that, in the time between when the operator is satisfied with the sandbox version of the branch policy, and the change window, other modifications are made to the production security policy. For example, at midnight, another operator could have modified the production security policy. That production change will not be present in the sandbox because the sandbox was instantiated based on the production environment as it existed at the time of the instantiation request. If the operator proceeds with pushing the sandbox version of the policy to production during the change window, one thing that can occur is that the production change made at midnight will be overridden. In some embodiments, security platform 122 provides protection against this scenario. When the operator is ready to push changes made in a sandbox to production, the operator can ask that the underlying instantiation be refreshed (i.e., the change made at midnight will be refreshed into the sandbox). Any changes made by the operator in the sandbox since it was initially instantiated can be replicated to the refreshed sandbox. Four example scenarios can occur. First, new rules may have been added to production that were not present in the original sandbox instantiation. Those new rules will be added to the sandbox. Second, rules present in production at the time of sandbox instantiation may have been deleted. Those rules will similarly be deleted from the sandbox. Third, it is possible that a rule present in production at the time of sandbox instantiation was changed, but the change does not implicate any changes made in the sandbox. That rule will be updated in the sandbox. The final scenario is the most complex—in which a rule was modified in production and also modified in the sandbox. Now, conflicting versions of a rule exist.
In various embodiments, when an operator refreshes a sandbox, differences between the current production version and the sandbox version will be highlighted, e.g., indicating which rules were added, which were deleted, which were updated, and which represent conflicts. For any conflicts, the operator can determine which version of the rule should be used—the production version or the sandbox version. Once the operator is satisfied with the sandbox version, the operator can push the sandbox version to production.
V. AppendixThe following sections provide additional detail regarding example implementations/embodiments of policy analysis techniques described herein.
Operational Use Cases
-
- 1. Business Disruption-Policy is disallowing access that should be permitted, resulting in an operations ticket to fix the access issue.
- 2. Security Issue-Policy is overly permissive and is allowing access that should be blocked. Majority of data breaches are on allowed policy and if policy is too permissive the breach can access even more data than would have been possible with the right policy.
- 3. Policy Sprawl-Rule's intent is covered by another broader rule or set of rules.
- 4. Customer is be looking to tighten posture, added granular policies but forgot to remove the coarser ones.
- 5. Customer is looking to clean up policies and combine several granular into a coarser and forgot to remove the granular ones.
- 6. Customer was satisfying new business intent.
- a. Decided to add a new policy and did not realize that it would end up making other policies redundant.
- b. Their new policy is already being shadowed by a coarser one at higher order.
- c. Their new policy is already covered by a lower order policy making their new policy redundant (Reverse Shadow).
- 7. Policy Drift
- a. Customer wants to make a policy change but make a copy of policy and moves it lower down to keep as backup.
- b. Customer makes changes and tests the new copy (at higher order) over time. Meanwhile other operators have added new policies between the two.
-
- 1. Alert/Incident Name Format
The alerts are prefixed by a single adjective which is one of “REDUNDANT,” “SHADOWED,” “REDUNDANT,” “GENERALIZED,” or “CORELATED”.
Grammar is as follows:
-
- Policy Analysis Alert/Incident Code Format
- Code: AL_<INDUSTRY TERM_ADJECTIVE>_<POLICY1_ACTION>_<POLICY_TYPE>_COVERED_BY_<HIGHER|LOWER>_ORDER_<POLICY2_ACTION>_RULE
- Example Code: “AL__REDUNDANT_ALLOW_SECURITY_RULE_COVERED_BY_HIGHER_ORDER_ALL OW_RULE”
Display Name: “<INDUSTRY_TERM_ADJECTIVE>Policy: <POLICY1_ACTION><POLICY_TYPE> is covered by <higher|lower> order <POLICY2_ACTION><POLICY_TYPE>”
-
- Example Display Name: “Redundant Policy: Allow security rule is covered by a higher order allow security rule”
- INDUSTRY_TERM_ADJECTIVE: “Redundant”, “Shadowed”, “Generalized”, “Correlated”
- POLICY_TYPE: “Security Rule”, “Decryption Rule”, “Authentication Rule”, “Application Override Rule”, “DLP Rule”, “URL Filtering Rule” etc.
- POLICY1_ACTION, POLICY2_ACTION: This is Policy Type specific.
- For Security Rule action is “Allow” or “Block”
- For Decryption Policy action is “Decrypt” or “No Decrypt”
Example Alert/Incident List with Code/Display Names
Examples are shown in
Formal methods are approaches to exhaustively and provably model, analyze, and optimize the behavior of hardware and software systems. When applied to policy/configuration modeling/analysis/optimization in fields such as security, networking, and modern identity and access management (IAM), formal methods enable semantic modeling to realize a functionally accurate “model of computation” of the intent/behavior of the system.
Core Formal Modeling LibraryEmbodiments of the Core Formal Modeling library accept:
-
- 1. Security policy specification (e.g., user/customer intent) for different device groups (MU, RN, GW, SC) alongside:
- 2. Data to resolve all internal (objects, lists, etc.) and external (runtime firewall data, LDAP/AD data, predefined data, etc.) dependencies within the security policy specification.
It uses this data to build a single unified logical model of computation for the security policy of the firewall, which is also referred to herein as the formal model. This formal model forms the basis for supporting multiple analysis as well as security posture evaluation use cases within the AIOps platform, including but not limited to shadowing/contradiction analysis, policy change management, and application access analysis.
The Core Formal Modeling library uses the Security Policy, Firewall Configuration, Firewall Operational State (FQDN files, External Dynamic Lists, etc.) collected, for example, by the Prisma Access Artificial Intelligence for IT Operations (AIOps) platform to build the comprehensive logical model of computation using the CVC4 library. CVC4 is an automatic theorem prover for satisfiability modulo theories (SMT) problems. The formal model uses a mixture of Boolean/integer/enumeration/string-to-enumeration/string representations to resolve/flatten/normalize the security model by resolving all internal and external object dependencies.
The library supports, as examples, the following three use cases, each of which can be realized as a standalone service or collection of services in the Prisma Access AIOps platform:
-
- 1. Shadowing/Contradiction Analysis
- 2. Policy Change Management
- 3. Security Policy Query/Analysis within the Application Access Analyzer
A microservice can be built on top of, for example, Google Kubernetes Engine (GKE) to:
-
- 1. Retrieve a JSON configuration from Google Cloud Storage (GCS) (e.g., triggered via a configuration parser service or periodic trigger service),
- 2. Retrieve firewall command output using the firewall data fetch lib, and
- 3. Invoke the core formal modeling lib for the above use cases.
-
- 1. Security Policy from a configuration parser service (XMLtoJSON) microservice. In an example embodiment, this is JSON that is in the unresolved form, i.e., the security policies may refer to symbolic addresses, address lists, services, service lists, etc. The JSON will carry/embed additional information by way of dictionaries of key-value pairs, where the keys are the symbolic addresses, address lists, etc. referred to within the security policy and the values are the necessary information that can be used to fully resolve the security policy for formal modeling.
- 2. Firewall Operational State
- A. FQDN file
- B. External Dynamic Lists
- C. Predefined Object Information extracted from XML found in the firewall
- D. Exhaustive user-to-group mapping information (necessary for full fidelity formal modeling of the security policy)
- E. Exhaustive user-to-persona mapping information (necessary for full fidelity formal modeling of the security policy)
- F. One-off on-demand user-to-group mapping information (to support application access queries)
- G. One-off on-demand user-to-persona mapping information (to support application access queries)
-
- 1. Formal modeling can be triggered as a result of a new security policy commit.
- 2. Formal modeling can be triggered as a routine/periodic refresh, when only operational data from the firewall is used to refresh the formal model.
- 3. Formal modeling can be requested as part of the Policy Change Management workflow through AIOps datas service APIs.
-
- 1. Fully resolved security policies, modulo exceptions.
- 2. Any exceptions encountered during modeling due to incomplete or malformed data.
- A. Security policy issues like cut-and-paste or formatting errors, references to as-yet-undefined objects or lists, etc.
- B. Operational state data issues like missing FQDN entry, incomplete address object, etc.
- 3. Parsed firewall operational state that will be persisted in a structured form.
- 4. Fully resolved formal model of the security policy, modulo exceptions.
One purpose of shadowing and contradiction shadowing analysis based on formal modeling is to flag and root-cause over-specification or contradiction redundancies in intent, reducing policy/permissions/privilege sprawl, and fix potential security holes/vulnerabilities. Redundancies can be one (or a set) of higher priority policies, and root-causing incorporates forward as well as backward traversal to identify shadowing, generalization, and partial conflicts through an interactive model building and blocking framework, or a framework (as realized in AIOps) that generates incidents using the incident generation framework for eventual consumption by the user/customer.
Inputs
-
- 1. Fully-resolved security policy JSON with embedded policy model.
-
- 1. Inline during security policy formal model construction as a result of a new security policy commit.
- 2. Triggered as a result of a change in firewall operational state (frequency and list of changes that can trigger this analysis is to be finalized).
- 3. Can be requested as part of Policy Change Management workflow as part of three use cases.
- A. Shadow/Contradiction/Hit-count analysis.
- B. Incident Resolution Analysis.
- C. Shadow/Contradiction Security Policy Analysis.
-
- 1. A list of shadowee-shadower(s) objects, one per principal (shadowee). Each shadower and shadowee (or multiple shadowees) will contain raw (unresolved) as well as fully resolved information necessary for UI display. Depending on the mode of invocation of this analysis, the caller will process the output and forward it to either the Alerts/Incidents Engine or to the UI.
- 2. An Alert/Incident code will be populated per shadowee-shadower(s) object and will be one of 16 as described in the Security Policy Incidents documentation above when the results are forwards to the Alerts/Incidents Engine.
Formal modeling can be coupled with a structured query language to express policy constructs (the fields in these constructs can be fully or partially specified, and a field itself can be partially or fully specified). For example, a simple query can support a set/list of source IP addresses, and Boolean operations on such sets (for example, one can query connectivity on sets such as 10.0.0.0/8 minus [‘10.0.1.24’, ‘10.4.55.4’, . . . ]). This exposes a powerful interface that allows varied use cases such as policy change management (e.g., sandbox testing and validation of proposed changes before final commit) and connectivity analysis (e.g., which users/subnets can access what apps/servers/services/resources).
The following are five example workflows that can be supported under policy change management:
-
- 1. Against the currently committed security policy:
- A. Intent Satisfaction Analysis.
- B. Shadow/Contradiction/Hit-Count Analysis.
- 2. Against an uploaded security policy that captures all proposed changes to security policy:
- A. Incident Resolution Analysis.
- B. Shadow/Contradiction Security Policy Analysis.
- C. Application Access Queries.
In an Intent Satisfaction Analysis workflow, user input is XML containing proposed policy additions. For each new policy captured in this uploaded XML, the analyzer will report back if this intent is not/partially/fully met/blocked and provide a corresponding list of security policy matches. The reference security policy model used to perform this analysis is, in various embodiments, the formal model of the currently deployed security policy.
In a Shadow/Contradiction/Hit-Count analysis workflow, the reference security policy model used to perform analysis is the formal model of the currently deployed security policy. A comprehensive analysis of all shadows and contradictions (regardless of dashboard/configuration preferences/customizations that suppress certain Alerts/Incidents) is returned. The Hit-Count analysis extracts and aggregates the hit count of each security policy from the currently deployed security policy from all production firewalls for the specified device group. It reports those rules that have not seen any hits since the last successful commit (modulo certain limitations/assumptions described in more detail below).
Policy Change Management Against an Uploaded Security Policy that Captures Proposed Changes to Security Policy
In workflows under this category, the user models their proposed changes to the security policy (e.g., in Panorama) and exports XML of the proposed configuration, which is then uploaded as part of each use case (three of which are listed below). A Config Parser service is used to build a fully resolved policy model of this XML using information (operational data) retrieved from the latest fully resolved policy model for the device group.
-
- 1. In an Incident Resolution Analysis workflow, the user provides one or more incidents that are queried against the newly built formal model for the security policy XML that was uploaded by the user. For each incident, Shadow/Contradiction Analysis is performed to determine if the changes proposed in the uploaded policy XML will resolve that incident. The full list of incidents that are generated based upon the last analysis run are retrieved from the alerts that are maintained and tracked by the policy change management microservice.
- 2. Shadow/Contradiction Security Policy Analysis uses the formal model of the security policy XML that was uploaded with the request, and results are returned for consumption via the UI.
- 3. Application Access Queries use the newly built security policy model to answer well-formed queries with the structure (in an example): “Can user X access application Y?”
In various embodiments, Data Services provides an API interface (e.g., using Quarkus Framework) for triggering policy change management. In some embodiments, the UI interface is served through following example endpoints:
A query that is either a fully-specified or a partially-specified security policy JSON.
Modes of Invocation/Operation
-
- 1. Upon receipt of the query, a formal query microservice retrieves the last full resolved policy JSON that contains the embedded formal model for the security policy.
- 2. The Firewall command data fetch library is used to retrieve user-to-group and user-persona information using the following commands.
- A. show user user-attributes user <username>
- B. show user user-ids match-user <username>
- 3. The query is extended to ensure that connectivity satisfiability checks will include the results of the firewall data that has been fetched.
A list of policy objects that semantically match the received query is provided as output. For each policy object that is a match, raw (unresolved) as well as fully resolved information necessary for UI display is provided, as applicable.
Description (Core Formal Modeling Microservice)A microservice is provided for formal modeling based services and based on event type and parameters, it can call other libraries like the config parser library, firewall data fetch library, or formal modeling core library. These libraries can be called directly or as separate thread on need basis. The microservice monitors the output of libraries and updates status and puts final result in GCS. It raises/clears alerts via an incident generation workflow.
Config Parser LibraryFor various policy change use cases, XML is provided for analysis. As an example, the config parser library is used to convert XML to JSON. The config parser library defines a class for converting provided XML (in file or text format) to JSON based on provided schema.
Firewall Data Fetch LibraryThe firewall data fetch library uses the command framework library which in turn calls a firewall (e.g., PA) command framework to get firewall output. The firewall data fetch library can store output in GCS. It can convert XML/text to JSON based on provided schema. The firewall data fetch library can be used for periodic data pull from the firewall.
Examples of Commands Supported
-
- show dns-proxy fqdn all
- request system external-list show type IP name <edl name>
- show user group list
- show user group name “cn=it_operations, cn=users,dc=al,dc=com”
- show predefined xpath/predefined
- show user user-attributes user <username>
- show user user-ids match-user <username>
Example output comprises formal modeling output and resolved config JSON, shadow/contradiction policies raised as a result of analysis.
Incident Generation/ClearIncidents are raised on any shadow/contradiction found during analysis. Also, comparisons are done against open alerts for these tenants and alerts which are not currently present will be cleared by sending messages with alert state as clear to incident generation workflow. Current open incidents can be extracted from GCS.
Example Format
Security and Networking teams face several challenges in maintaining policy sets. The following are various examples:
Every security policy rule in an enterprise needs careful management to ensure the right balance between tight security and compliance posture and required application connectivity and performance. Large number of rules mean increases in operational overhead of maintenance of the rules.
Policy sprawl eventually happens and policies only grow. This makes policy analysis in the case of a disruption in connectivity or in case of a breach very difficult. On example is “policy Sprawl and drift”: as business and security needs change, new policies to allow new application/user/network connectivity or to segment/deny existing allowed connectivity are added. But, at times, existing policy may be sufficient to meet this intent or edits. However, much of the time it is hard for operators to analyze 100s to 1000s of policies to understand if they need to really add or delete policy.
Security/Network teams embark on periodic policy cleaning exercises but it is not always easy to find out what can be cleaned. Business intent changes and policies become redundant. Or, some policy intent is covered by one or more other policies (shadowing). Operators need to clean up policy while knowing that there is no change in policy posture in terms of allowed connectivity or required segmentation.
Another situation involves reducing change risk when meeting new business intent. When meeting new connectivity or segmentation intent operators, it is desirable to be sure they have not broken any previous intent. They need a way of analyzing the total expansion in connectivity or segmentation from prior policy and to confirm that it is limited to what their intent was.
Yet another situation includes reducing change risk in terms of continuing to meet regulatory compliance and important business connectivity mandates. To ensure successful audits and no regulatory fines, at the time of making a policy change, IT/Infosec executives and Legal/Finance want to be sure that crown jewel segmentation continues to stay in place following changes. Certain application connectivity is mandatory for business to operate. IT/Infosec executives and Business Unit executives also want to be sure this is not broken else business revenues, employee productivity will be disrupted. For allowed connectivity, operators want to be sure that Networking and Security Operator teams are only allowing a restricted amount of traffic (e.g. only ports 443 and 8080 or web). Operators need a way of providing rules to specify crown jewel mandates on required segmentation and connectivity.
Example Workflows
In the examples below, suppose “Tom” belongs to “group-barbara” and “Marie” belongs to “group-satish.” Example choices for the Policy Layer include: (1) Prisma Access Shared Pre-Rule, (2) Prisma Access Shared Post-Rule, (3) Mobile Users Pre-Rule, (4) Mobile Users Post-Rule, (5) Remote Workforce (GlobalProtect), (6) Explicit Proxy, and (7) Remote Networks. Various example report formats (and/or excerpts thereof) are provided below. Example choices for the Security Policy include: (1) Mobile Users Remote Workforce (GlobalProtect), (2) Mobile Users Explicit Proxy, (3) Remote Networks.
New Rule Intent Satisfaction Analysis (New Rule with Allow Action)—Report
New Rule Intent Satisfaction Analysis (New Rule with Deny Action)—Report Excerpt
Policy Anomalies—User Group Based Incidents with Examples
An example security policy analyzer service powered by Formal modeling needs Firewall Operational State information including: (1) Exhaustive user-to-group mapping information (necessary for full fidelity formal modeling of the security policy) and (2) Exhaustive user-to-persona mapping information (necessary for full fidelity formal modeling of the security policy). An example implementation is as follows:
A GKE Micro service, “user to Groups mapping collector service,” collects the user to group mapping info.
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
Claims
1. A system, comprising:
- a processor configured to: receive configuration information, including at least one policy, associated with a live production security appliance; use the received configuration information to instantiate the policy in a sandbox environment; use the sandbox environment to evaluate a proposed change to the configuration information, including by building a model using the received configuration information; and provide a result the evaluation as output; and
- a memory coupled to the processor and configured to provide the processor with instructions.
2. The system of claim 1, wherein at least some of the configuration information comprises live state information extracted from the live production security appliance.
3. The system of claim 1, wherein evaluating the proposed change includes determining whether a new anomaly is created as a result of implementing the proposed change.
4. The system of claim 1, wherein the processor is further configured to implement the proposed change and perform a re-evaluation.
5. The system of claim 1, wherein the policy is instantiated in the sandbox environment in response to an incident resolution analysis.
6. The system of claim 1, wherein the processor is further configured to receive a list comprising one or more incidents.
7. The system of claim 6, wherein the processor is further configured to determine whether one or more items on the list are resolved within the sandbox environment.
8. The system of claim 1, wherein the processor is further configured to receive a refresh request with respect to the live production security appliance.
9. The system of claim 8, wherein the processor is further configured to identify a difference between a current policy set deployed on the live production security appliance received as a result of the refresh request, and a current policy set deployed in the sandbox environment.
10. The system of claim 9, wherein the difference is an addition of a rule to the live production security appliance, and wherein the processor is configured to add a copy of the rule to the sandbox environment.
11. The system of claim 9, wherein the difference is deletion of a rule in the live production security appliance, and wherein the processor is configured to delete a copy of the rule in the sandbox environment.
12. The system of claim 9, wherein the difference is a change to a rule on the live production security appliance, and wherein the processor is further configured to determine whether making the change in the sandbox environment would result in a rule conflict.
13. The system of claim 12, wherein the processor is configured to prompt a user for a resolution to a determined rule conflict.
14. A method, comprising:
- receiving configuration information, including at least one policy, associated with a live production security appliance;
- using the received configuration information to instantiate the policy in a sandbox environment;
- using the sandbox environment to evaluate a proposed change to the configuration information, including by building a model using the received configuration information; and
- providing a result of the evaluation as output.
15. A computer program product embodied in a non-transitory computer readable medium and comprising computer instructions for:
- receiving configuration information, including at least one policy, associated with a live production security appliance;
- using the received configuration information to instantiate the policy in a sandbox environment;
- using the sandbox environment to evaluate a proposed change to the configuration information, including by building a model using the received configuration information; and
- providing a result of the evaluation as output.
Type: Application
Filed: Jan 31, 2024
Publication Date: Oct 17, 2024
Inventors: Kartik Mohanram (Pittsburgh, PA), Navneet Yadav (Saratoga, CA)
Application Number: 18/429,208