Managing Networks Using Dependency Analysis
In a network management system, dependency relationships of network clients and network elements are computed. In an implementation, a dependency graph is generated based on the relationships, and the probabilities of problems associated with the network client and network element are determined based on the dependency graph.
Latest Microsoft Patents:
- APPLICATION SINGLE SIGN-ON DETERMINATIONS BASED ON INTELLIGENT TRACES
- SCANNING ORDERS FOR NON-TRANSFORM CODING
- SUPPLEMENTAL ENHANCEMENT INFORMATION INCLUDING CONFIDENCE LEVEL AND MIXED CONTENT INFORMATION
- INTELLIGENT USER INTERFACE ELEMENT SELECTION USING EYE-GAZE
- NEURAL NETWORK ACTIVATION COMPRESSION WITH NON-UNIFORM MANTISSAS
The present application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Application No. 60/807,574 filed Jul. 17, 2006, the disclosure of which is incorporated herein.
BACKGROUNDUsers in a distributed network often encounter service disruptions, such as unavailability or poor performance. In such distributed networks, apart from clients and servers, a number of other components, such as routers, switches, links, etc., and services (e.g., Domain Name Service (DNS), Authentication Service (Active Directory, Kerberos)), may be a cause of disruption. When such problems arise, users may have to rely on network administrators or helpdesk to resolve their problems. Existing automated systems to counter these problems may either only present various types of raw data or focus on network-layer problems while overlooking problems experienced by applications.
Existing systems may employ designer-generated rules that spell out an application's dependencies. This approach has several problems that include, for example, the system may evolve faster than the rules are updated, and variations in the application's dependencies due to deployment of various forms of middle boxes (i.e., firewalls, proxies). Similarly, analysis of configuration files to determine dependencies may be insufficient as many dependencies among network components are dynamically constructed. For example, web browsers in enterprise networks are often configured to communicate through a proxy, sometimes named in the browser preferences, but frequently contacted through automatic proxy discovery protocols that themselves rely on resolution of well-known names.
In other approaches, systems have been proposed to expose dependencies by having applications run on a middleware platform instrumented to track dependencies at run time. In general, networks may run a plethora of platforms, operating systems, and applications, often from different vendors. While a single vendor might instrument their software, it is unlikely that all vendors will do so in a common fashion. Therefore, building all distributed applications over a single middleware platform may be infeasible. Furthermore, many underlying services on which other services depend (e.g., Domain Name Service), may be legacy services and cannot easily be instrumented or ported to run over a middleware platform instrumented to track dependencies at run time.
SUMMARYThis summary is provided to introduce simplified concepts of managing networks using dependency analysis, which is further described below in the Detailed Description. This summary is not intended to identify essential features of the claimed subject matter, nor is it intended for use in determining the scope of the claimed subject matter
In an embodiment, dependency analysis is performed on a managed network by receiving dependency relationships of network elements related to network clients, generating a dependency graph based on these dependency relationships. The dependency graph is then used to aid management of the network, which may include: (1) establishing probabilities of occurrence of problems correlated to network elements and network clients; (2) determining which network elements are dependent on which other network elements.
The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference number in different figures indicates similar or identical items.
The following disclosure describes systems and methods for managing networks using dependency analysis. While aspects of described systems and methods for managing network using dependency analysis can be implemented in any number of different computing systems, environments, and/or configurations, embodiments are described in the context of the following exemplary system architectures.
Exemplary Management SystemIn an exemplary implementation, one or more of the network elements 104-1, 104-2, 104-3, 104-4, . . . , 104-N respectively employ dependency agents 106-1, 106-2, 106-3, 106-4, . . . , 106-N, to automatically identify interactions and uncover dependency relationships between the network elements 104 and various resources in the network 102. In alternate embodiments, the network elements 104 may include one or more of PDAs, desktops, workstations, servers, routers, switches, hubs, services, etc. Dependency agents may also be connected to passive, non-electrical, or non-processing components of the network (e.g., optical fibers, Ethernet cables, links) via taps or sniffers.
An enterprise network is defined as hardware, software and media connecting information technology resources of an organization. A typical enterprise network is formed by connecting network clients, servers, a number of other components like routers, switches, etc., through a communication media. The network element 104 may be considered as a “network client”, where the network client is a part of the enterprise network that is characterized as an interface with an end user. The user may run an application or a program on the network client. The network client, for supporting the application being run on it, may have to depend upon other components in the enterprise network, such as, servers, routers, switches, services, links, etc. For the purposes of illustration with regard to an enterprise network, a network element 104 is referred to, in this description as a “network client” for the context described above. The other components of the enterprise network, on which the network client may depend, have been referred to as “other network elements”.
The network element 104, in an exemplary implementation, employs a distributive approach to approximate the dependency relationships using low-level packet correlations. This approach is explained in detail under the section titled “Exemplary Dependency Agent”. The network element 104 discovers dependency relationships of other network elements 104. These dependency relationships are represented as dependency graphs.
In one of the implementations, the discovered dependency relationships are received at the centralized computing device 108. The centralized computing device 108 employs an inference engine 110 to generate dependency graphs. In alternate embodiments, the centralized computing device 108 may include a cluster of servers, workstations, and the like. The centralized computing device 108 may be configured to assemble dependency relationships and generate a dependency graph for the network 102 spanning across all the network elements 104 and sub-network 112. In this implementation, the generated dependency graphs are utilized to determine the probability of occurrence of problems, and localize faults in the network 102. The dependency graphs thus generated are utilized for the management of distributed networks, for example, an enterprise network.
In yet another implementation, the dependency graphs include relationships representing network topology. The manner, in which the centralized computing device 108 generates the dependency graph and network topology is explained in the section titled “Exemplary Inference Engine”.
Exemplary Dependency AgentIn an exemplary implementation, the memory 204 stores operating system 206 providing a platform for executing applications on the network element 104. The memory further stores a dependency agent 106 capable of identifying interactions and discovering dependency relationships of the network element 104. To this end, the dependency agent 106 includes a network monitor 210, an application monitor 212, a dependency graph analyzer 214, an agent service 216 and a health summarizer 218. The dependency relationships thus generated is stored in dependency data 208 for drawing future inferences. A network interface 220 provides the capability of network element 104 to interface with the network 102 or other network elements 104. The dependency agent 106 takes a passive approach to generate a dependency graph for any network element while the inference engine may proactively or periodically instruct a dependency agent to generate a dependency graph.
In an exemplary implementation, the dependency agent 106 determines the dependency relationships of the network element 104 as follows. Local traffic correlations are inferred by passively monitoring packets and applying statistical learning techniques. The basic premise is that a typical pattern of messages is associated with accomplishing a given task. Therefore, the dependency relationships may be approximated by taking the transitive closure of strongly correlated network elements. Moreover, a fault can be detected by observing the absence of expected messages.
In this embodiment, the network monitor 210 builds an “activity model” for its own traffic in which it correlates input and output of the network element 104. This activity model is based on an “activity pattern” of input and output of the network element 104. The output and input represent channels between which data packets flow and thus between which an edge exists in the dependency graph. For example, all packets sharing the same source and destination address might be designated as belonging to a single channel. Additionally, an application protocol is utilized to identify a channel. Channels are described as input or output channels based on whether they represent messages received at or transmitted by the network element 104. A value of either active or inactive is assigned to each channel in the network over some fixed time window. A set of such assignments to channels at a network element 104 is an “activity pattern” for that network element, indicating whether or not a packet was observed on each channel during the observation time window. The activity pattern for the network element 104 is stored in dependency data 208.
In this embodiment, the activity model represents a matrix of correlation coefficients between the input and the output of the network element 104. Such correlation coefficients in the activity model encode the confidence level for a dependency between two network elements.
The “activity model” for a network element is a function, mapping the “activity pattern” of the input channels to a vector of probabilities for each output channel being active. Since activity patterns discard all packet timings and counts within the observation time window, picking a suitable duration for the window is critical. Over a very long time window all the channels can be found to be related, whereas selecting a window size that is too small will cause correlations to be missed. The network monitor 210, in one embodiment, can be configured to develop models for a given range of window size and combine the resulting models. The network monitor 210, according to this embodiment, may apply statistical learning techniques to passively monitor packets for the purpose of modeling. In particular, the learning technique is based on the likelihood of the outputs (i.e., the transmitted packets), given the observed inputs (i.e., the received packets), over some fixed time window.
The network monitor 210 extracts standard packet header information, such as timestamp, protocol, source and destination IP address, and identifies the packet's application or service, for example, by using well-known IP port numbers. In alternate embodiments, the network monitor 210 collects network data by, for example, sniffing the packets, tracing the route of packets, etc. An exemplary data packet monitored by the network monitor 219 is described under section titled “Exemplary Data Packet”. In an embodiment, the network monitor 210 is implemented by invoking functionality in the operating system 206 to make available to the dependency agent 106 and network monitor 210 a copy of part or all of each packet sent or received by the network element 104. Exemplary mechanisms providing such functionality are PCAP and NetMon. Alternate embodiments may obtain information about the packets in other ways or other forms, such as at layer 4 (e.g., socket-layer information from LSP).
In another implementation, the dependency graph analyzer 214 may be configured to set an appropriate threshold for deciding that a correlation is strong enough to be part of the dependency graph. The dependency graphs that are generated may be utilized for the management of distributed networks, for example, an enterprise network.
In an implementation, the health summarizer 218 reports the condition and health probability of network elements 104 in the network. The health summarizer 218 in the dependency agent 106 computes the probability of occurrence of a problem in the network elements 104. In an implementation, the health summarizer assigns a probability of sickness to the network elements. One embodiment of a health summarizer compares the response time of a request sent to another network element with a historical record of response times and assigns a probability of health or sickness to that network element based on the deviation of the response time above the historical median. Alternate embodiments of a health summarizer include: (1) processing system log files to identify error codes indicating potential sickness on the network element; (2) processing responses from network elements to identify response codes, strings, or patterns that indicate potential sickness on the network element.
The application monitor 212 enables the dependency agent 106 to determine the dependency relationships for an application or a service being provided to the network elements 104 by a particular network element. In an alternate embodiment, the application monitor 212 detects an application failure and generates a symptom report, which is stored with the dependency data 208.
In an alternative embodiment, this invention may be implemented by a network-based system that does not require deployment of dependency agents to clients or servers or changes to clients or servers. It could deploy, for example, packet extraction means like packet sniffers etc. at various locations in the enterprise network, and infer the dependency relationship of each network client 104 from these traces. In this embodiment, the traces of packets collected from each sniffer are processed to identify all packets sent or received by each network element. These virtual packet traces are then processed using the mechanisms taught in this application as if they had been collected by a dependency agent running on each of the clients. It may be appreciated that for purposes of exemplary illustration, collection, processing, and distribution of packet traces may be performed by methods known in the art.
Exemplary Inference EngineIn an exemplary implementation, the memory 304 stores operating system 306 providing a platform for executing applications on the network element. The memory further stores an inference engine 110 capable of aggregating and coordinating the dependency data 208 from one or more of the network elements 104 in the system 100. To this end, the inference engine 110 includes a dependency analyzer 310, dependency graph generator 312, probing agent 314 and a topology view generator 316. Any data that is required for the execution of inference engine 110 and dependency data received from network elements 104 is stored in the program data 308 for future uses. A network interface 318 provides the capability of centralized computing device 108 to interface with the network 102 or other network elements 104. In alternate embodiments, the inference engine 110 may be a part of one or more network elements 104. In yet another embodiment, the inference engine 110 may be distributed over multiple network elements 104. The inference engine 110 maintains a proactive approach to generate a dependency graph for the whole network or a part thereof.
In an exemplary implementation, the inference engine 110 incorporates “Analysis of Network Dependencies” or “AND” approach to determine the dependency relationships of the network elements 104 in the network 102. In this approach, the centralized inference engine 110 and the set of dependency agents 106 coordinate to assemble dependency data from one or more network elements 104. Each dependency agent 106 performs temporal correlation of the packets sent and received by the corresponding network elements 104 and makes summarized information, in the form of dependency data, available to the inference engine 110. The inference engine 110 therefore serves as an aggregation and coordination point for the dependency data received, assembling the dependency graph for applications by combining information from the dependency agents 106, ordering the dependency agents to conduct active probing as needed to flesh out the dependency graph or to localize faults and interfacing with the human network managers.
In this embodiment, the dependency analyzer 310 may invoke the probing agent 314 to send a request for the dependency data to one or more of the network elements 104. Upon receipt of such a request, the dependency agent 106 sends the local dependency data of the corresponding network element 104. The dependency data received from the dependency agents 106 is stored in the program data 308. In an alternate embodiment, instead of sending the whole dependency data, the dependency agents 106 may send only the change in the dependency data if any. The dependency analyzer 310 retrieves the dependency data from the program data 308 to assemble the dependency graph for the applications or services. In another embodiment, the dependency analyzer 310 computes the dependencies of the network elements using a report of deltas. The deltas refer to the change in the dependency data from the last received dependency data.
In an embodiment, the dependency graph generator 312 generates a combined dependency graph based on the assembled dependency data from the dependency graph analyzer 310.
The centralized computing device 108 is capable of being interfaced to an administrator or a human network manager to provide a statistical performance report of the network 102 and the network elements 104.
Fault Localization Using Dependency GraphsIn an exemplary implementation, each dependency agent 106 observes experiences of its network element 104, for example, by measuring response time between requests and replies etc. When a user on the network element, flags the experience as bad, for example, by restarting the browser or hitting a button that means “I'm unhappy now”; or when automated parsing discovers too many “invalid page” HTTP return codes, the dependency agent 106 sends a triggered experience report to the inference engine 110. A small number of randomly selected positive experiences, for example, the time to load a web page when the user did not complain, may be sent to the inference engine periodically. The dependency graph analyzer 310 keeps updating the dependency data and experience reports and in a given time window, batches experience reports from multiple agents. It applies Bayesian inference to find the most plausible explanation for the experience reports, for example, the minimum set of faulty physical components that would afflict all the network elements 104, routers and links with poor performance while leaving unaffected the network elements 104 experiencing acceptable performance.
In another embodiment, for accomplishing efficient fault localization, when the application monitor in the network client 104 detects application failures, it sends failure symptom reports to the inference engine 110. The symptom reports include the network elements such as routers, links and other applications which are affected by the detected failures. Since a single failure (e.g., a server down or link congestion) often affects many network clients or hosts (i.e., network element 104), the inference engine 110 will receive multiple symptom reports in a short period of time. The dependency analyzer 310 aggregates a burst of reports and uses a Bayesian inference algorithm to find the most plausible explanation to all these symptom reports (e.g., the minimum set of faulty physical components that can affect all the hosts, routers and links in the symptom reports).
In yet another implementation, the dependency graph is utilized to localize link congestion faults. To this end, layer-2 topology is mapped by using the dependency agents 106 to send and listen for MAC broadcast packets and the layer-3 topology is mapped by using trace routes. This may also be accomplished by, for example, extracting dependency data from SNMP data. The accuracy with which congestion faults are localized may increase as more and more accurate topology information is available.
The inference engine 110, therefore, builds the dependency graphs by continuously accumulating the dependency data that it receives from the dependency agents 106. Since important applications are typically hosted on servers with high fan-in, the inference engine 110 identifies these servers and automatically builds a dependency graph for each one. The same node may appear in multiple local dependency graphs generated by the network element itself, for example, a DNS server may be shared by multiple applications and network clients. In an implementation, the dependency graph generator 312 leverages this overlap by collapsing the shared nodes into one, aggregating the local graphs into a complete dependency graph of an enterprise network.
An exemplary data packet structure 500 as is monitored by the network monitor 210, is illustrated in
In an implementation, the inference engine 110 can utilize the dependency data received from network elements 104 to generate a network topology.
A dependency graph represents the dependencies between the network elements, with sub-graphs representing the dependencies pertaining to a particular application or activity. In an implementation, the dependency graph includes nodes and directed edges connecting the nodes. The nodes, in such an implementation, represent a network element 104 and the directed edge may represent interdependence between the connected nodes. In an alternate embodiment, the dependency graph may depict the interdependence of the network element 104 for an activity or a service.
The dependency graph that is generated may be stored in the dependency data 208. When the dependency graph is large, the “most likely path” can be searched for by the agent service 216. The dependency graphs may be generated on-demand and give a snapshot of recent history at each network element 104.
The inference engine 110, therefore, builds the dependency graphs by continuously accumulating the dependency-data that it receives from the dependency agents 106. Since important applications are typically hosted on servers with high fan-in, the inference engine 110 identifies these servers and automatically builds a dependency graph for each one. The same node may appear in multiple local dependency graphs generated by the network element itself, for example, a DNS server may be shared by multiple applications and network clients. In an implementation, the dependency graph generator 312 leverages this overlap by collapsing the shared nodes into one, aggregating the local graphs into a complete dependency graph of an enterprise network.
Each dependency agent 106 continuously updates a correlation matrix of the frequency with which two channels are active within a time window, for example, 100 ms. In an embodiment, the inference engine 110 polls the dependency agents 106 for their correlation matrices.
In another embodiment, each edge in the dependency graph also has a weight, which is the probability with which it actually occurs in a transaction. For example, in
In one of the implementations, in addition to the hosts 702 and application server 706 and application services 704, the AND approach extends the dependency graph by populating it with other network elements 104 which may include, for example, routers, switches and physical links, PDA's, servers, services etc.
Referring back to
Exemplary methods for managing networks using dependency analysis are described with reference to
At block 802, dependency relationships of network elements are computed by the dependency agent 106 configured to identify interactions of the network client 104 with other network elements. This may be done, in an embodiment, by invoking the network client 104 to send a probe request. Upon receipt of such a request, the dependency agent 106 of the corresponding network client 104 gathers dependency relationships and creates a correlation matrix depicting correlation between the input and output of the network client. The matrix is stored in dependency data 208. In another implementation, receipt of the dependency relationship is based on applications provided to the network client 104. In yet another embodiment, the dependency relationships are received based on applications provided to the network elements.
At block 804, dependency graphs are created based on the received dependency relationships, stored in dependency data 208. In an implementation, the dependency graphs may be generated by the dependency agent 106. In another embodiment, multiple dependency graphs are received at a centralized computing device 108. The inference engine 110 in the centralized computing device 108, acts a coordination and aggregation point for all such dependency data from multiple dependency agents 106. Upon receipt of dependency data from network clients 104 in the network, the inference engine 110 assembles and generates a comprehensive dependency graph for the whole network. In one of the embodiments, a network topology view of the network is created by the inference engine 110 by aggregating multiple network topology views as generated by the dependency agents 106 at the corresponding network elements.
At block 806, probabilities of problems associated with the network elements and network clients 104 are determined. This determination is based on the dependency graph generated at block 804. In an embodiment, this may be accomplished by the dependency agent 106, which assigns a probability of sickness to the network elements on which the network client 104 depends. In yet another embodiment, the probabilities assigned by the dependency agent 106 is received as dependency data by the inference engine 110 which keeps updating the probability upon receipt of one or more of such dependency data from the corresponding network client 104. In an alternate embodiment, the creation of dependency graphs and determination of probabilities is included as part of managing the network in which the multiple dependency graphs and network topology views are generated. In one of the embodiments, Bayesian inference is incorporated in a diagnosis algorithm for determining problems associated with the network client 104 and the network elements.
Accordingly at block 902, a model for representing the network elements 104 and their dependencies is developed. In this model, the network elements 104 are represented by nodes and the dependencies between any two nodes are represented by an edge connecting the two nodes.
At block 904, a dependency graph is generated based on the model developed at block 902. In an embodiment, the creation of the dependency graph may take into account the application provided to a network element by another.
At block 906, the observations from the dependency relationship as depicted by the dependency graph created at block 904 is interpreted. In an implementation, this may be done by a network administrator. This further includes turning raw observations into events signifying heath or sickness. In one of the embodiments, each edge in the dependency graph is assigned a weight which may be probability of sickness or health.
At block 908, a mathematical framework is developed to account for changes in the probabilities assigned to the edges at block 906. This may further include updating the probabilities when an event occurs.
At block 910, the observations from multiple network elements 104 are assembled and a comprehensive observation for the whole network is obtained. This observation may be updated based on a time window set by the administrator. The overall observation can be statistically processed to produce experience reports and performance analysis reports. In yet another embodiment, the processed report may be presented to an administrator.
At block 912, an action if required can be taken appropriate to the report presented at block 910.
Exemplary Computer EnvironmentComputer environment 1000 includes a general-purpose computing-based device in the form of a computer 1002. Computer 1002 can be, for example, a desktop computer, a handheld computer, a notebook or laptop computer, a server computer, a game console, and so on. The components of computer 1002 can include, but are not limited to, one or more processors or processing units 1004, a system memory 1006, and a system bus 1008 that couples various system components including the processor 1004 to the system memory 1006.
The system bus 1008 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures can include an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnects (PCI) bus also known as a Mezzanine bus.
Computer 1002 typically includes a variety of computer readable media. Such media can be any available media that is accessible by computer 1002 and includes both volatile and non-volatile media, removable and non-removable media.
The system memory 1006 includes computer readable media in the form of volatile memory, such as random access memory (M) 1010, and/or non-volatile memory, such as read only memory (ROM) 1012. A basic input/output system (BIOS) 1014, containing the basic routines that help to transfer information between elements within computer 1002, such as during start-up, is stored in ROM 1012. RAM 1010 typically contains data and/or program modules that are immediately accessible to and/or presently operated on by the processing unit 1004.
Computer 1002 may also include other removable/non-removable, volatile/non-volatile computer storage media. By way of example,
The disk drives and their associated computer-readable media provide non-volatile storage of computer readable instructions, data structures, program modules, and other data for computer 1002. Although the example illustrates a hard disk 1016, a removable magnetic disk 1020, and a removable optical disk 1024, it is to be appreciated that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes or other magnetic storage devices, flash memory cards, CD-ROM, digital versatile disks (DVD) or other optical storage, random access memories (RAM), read only memories (ROM), electrically erasable programmable read-only memory (EEPROM), and the like, can also be utilized to implement the exemplary computing system and environment.
Any number of program modules can be stored on the hard disk 1016, magnetic disk 1020, optical disk 1024, ROM 1012, and/or RAM 1010, including by way of example, an operating system 1027, one or more application programs 1028, other program modules 1030, and program data 1032. Each of such operating system 1027, one or more application programs 1028, other program modules 1030, and program data 1032 (or some combination thereof) may implement all or part of the resident components that support the distributed file system.
A user can enter commands and information into computer 1002 via input devices such as a keyboard 1034 and a pointing device 1036 (e.g., a “mouse”). Other input devices 1038 (not shown specifically) may include a microphone, joystick, game pad, satellite dish, serial port, scanner, and/or the like. These and other input devices are connected to the processing unit 1504 via input/output interfaces 1040 that are coupled to the system bus 1008, but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB).
A monitor 1042 or other type of display device can also be connected to the system bus 1008 via an interface, such as a video adapter 1044. In addition to the monitor 1042, other output peripheral devices can include components such as speakers (not shown) and a printer 1046 which can be connected to computer 1002 via the input/output interfaces 1040.
Computer 1002 can operate in a networked environment using logical connections to one or more remote computers, such as a remote computing-based device 1048. By way of example, the remote computing-based device 1048 can be a personal computer, portable computer, a server, a router, a network computer, a peer device or other common network node, and the like. The remote computing-based device 1048 is illustrated as a portable computer that can include many or all of the elements and features described herein relative to computer 1002.
Logical connections between computer 1002 and the remote computer 1048 are depicted as a local area network (LAN) 1050 and a general wide area network (WAN) 1052. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet.
When implemented in a LAN networking environment, the computer 1002 is connected to a local network 1050 via a network interface or adapter 1054. When implemented in a WAN networking environment, the computer 1002 typically includes a modem 1056 or other means for establishing communications over the wide network 1052. The modem 1056, which can be internal or external to computer 1002, can be connected to the system bus 1008 via the input/output interfaces 1040 or other appropriate mechanisms. It is to be appreciated that the illustrated network connections are exemplary and that other means of establishing communication link(s) between the computers 1002 and 1048 can be employed.
In a networked environment, such as that illustrated with computing environment 1000, program modules depicted relative to the computer 1002, or portions thereof may be stored in a remote memory storage device. By way of example, remote application programs 1058 reside on a memory device of remote computer 1048. For purposes of illustration, application programs and other executable program components such as the operating system are illustrated herein as discrete blocks, although it is recognized that such programs and components reside at various times in different storage components of the computing-based device 1002, and are executed by the data processor(s) of the computer.
Various modules and techniques may be described herein in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that performs particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
An implementation of these modules and techniques may be stored on or transmitted across some form of computer readable media. Computer readable media can be any available media that can be accessed by a computer By way of example, and not limitation, computer readable media may comprise “computer storage media” and “communications media.”
“Computer storage media” includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.
Alternately, portions of the framework may be implemented in hardware or a combination of hardware, software, and/or firmware. For example, one or more application specific integrated circuits (ASICs) or programmable logic devices (PLDs) could be designed or programmed to implement one or more portions of the framework.
CONCLUSIONThe above-described methods and system describe managing networks using dependency analysis. Although the invention has been described in language specific to structural features and/or methodological acts, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claimed invention.
Claims
1. A method comprising:
- computing dependency relationships of network elements related to one another; and
- creating a dependency graph based on the dependency relationships.
2. The method of claim 1, wherein the network elements gather the dependency relationships.
3. The method of claim 1, wherein the computing dependency is performed by code implementing dependency agents provided to the network elements.
4. The method of claim 1, wherein the creating comprises creating a network topology view of a network, the network comprising multiple network elements.
5. The method of claim 1, wherein the creating and determining are included as part of managing a network, wherein the managing comprises creating multiple dependency graphs and multiple network topology views.
6. The method of claim 1, wherein the dependency graphs are used in determining probabilities of problems associated with the network elements.
7. The method of claim 6, wherein the determining is performed by a diagnosis algorithm incorporating a Bayesian inference.
8. A network element comprising:
- a processor;
- a memory accessed by the processor;
- a dependency agent configured as part of the memory or separate from the memory, and controlled by the processor, wherein the dependency agent is configured to collect dependency data from a network, the network comprising multiple network elements.
9. The network element of claim 8, wherein the dependency agent comprises a network monitor to collect the dependency data, the network monitor comprising packet sniffing component to inspect packets transmitted and received at the network element and identify potential causalities between packet or co-occurrences between packets.
10. The network element of claim 8, wherein the dependency agent comprises an application monitor to collect dependency data for applications provided by other network elements in the network.
11. The network element of claim 8, wherein the dependency agent comprises a dependency graph analyzer that computes dependencies of the network elements in the network and reports deltas back to the network elements.
12. The network element of claim 8, wherein the dependency agent comprises an agent service that receives requests for collected dependency data and commands to probe the network.
13. The network element of claim 8, wherein the dependency agent comprises a health summarizer that reports the condition and health probability or sickness probability of the network elements in the network.
14. The network element of claim 8, wherein the dependency agent provides the dependency data to a centralized computing device comprising an inference engine.
15. An inference engine comprising:
- an aggregation and coordination point to receive dependency data from one or more network elements in a network;
- an assembler to create a dependency graph from the dependency data; and
- an ordering agent to actively request current dependency data from one or more network elements as to update the dependency graph.
16. The inference engine of claim 15, wherein the inference engine is part of a network element.
17. The inference engine of claim 15, wherein the inference engine is distributed over multiple network elements.
18. The inference engine of claim 15, wherein the dependency data is received from one or more dependency agents in the network.
19. The inference engine of claim 15, wherein the assembler in creating the dependency graph, is configured to batch experience reports to determine performance of the network.
20. The inference engine of claim 15 further comprising an interface to a user allowing the user to manage the network.
Type: Application
Filed: Nov 1, 2006
Publication Date: Jan 17, 2008
Applicant: Microsoft Corporation (Redmond, WA)
Inventors: Paramvir Bahl (Sammamish, WA), Ranveer Chandra (Kirkland, WA), David A. Maltz (Bellevue, WA), Suman Nath (Redmond, WA), Ming Zhang (Redmond, WA)
Application Number: 11/555,571
International Classification: G06F 17/00 (20060101);