Method and System for Visualizing Network Performance Characteristics
Techniques for visualizing and monitoring the quality of service for a computer network. Herein, a method and system monitor network transactions and behaviors for the computing network, which computing network includes one or more client subnets accessing one or more servers, the monitoring may be independent of client site monitors. Statistical data is gathered for relating to at least the network, the server, and the applications for generating a plurality of measurements. The measurements assess at least one quality of service indicator associated with the performance of the computer network. The method and system graphically display the plurality of measurements of the quality of service indicator according to the date and time of gathering the statistical data and further display graphically the degree by which each of the measurements of the quality of service indicator varies from a predetermined threshold quality of service level for the computing network. Also, dynamic quality of service indicators are monitored against average quality of service indicators.
Latest NETQOS, INC. Patents:
This application is a continuation of U.S. Non-Provisional patent application Ser. No. 11/110,973, entitled “METHOD AND SYSTEM FOR VISUALIZING NETWORK PERFORMANCE CHARACTERISTICS,” by Peter Mullarkey, et al. filed on Apr. 20, 2005, and is incorporated herein by reference in its entirety for all purposes.
FIELDThe disclosed subject matter relates to computer and related electronic systems networks. More particularly, this disclosure relates to a novel and improved method and system for visualizing network performance characteristics.
DESCRIPTION OF THE RELATED ARTMost network engineers are very familiar with tools that report statistics on individual components such as links, routers, and servers. These infrastructure monitors have been around for a long time. Newer to the market are performance monitoring appliances that report end-to-end statistics and the end-user experience. These appliances provide a comprehensive view of the enterprise, without the need for desktop or server agents. They measure how well response time Service Level Agreements (SLAs) are being met. They also help solve a wide variety of problems with solutions that lead to significant reductions in operating costs.
End-to-end performance monitoring can be extremely useful as a proactive method for both rapid troubleshooting and performance management of enterprise networks and server aggregations. Such monitoring has been successfully implemented to quickly identify and resolve the myriad of performance issues associated with networks, servers, and applications. The use of end-to-end performance monitoring appliances has uncovered serious inefficiencies with load balancers, poorly designed applications, bypassed proxy servers, ineffective cache servers, aggressive active agents, and badly designed “redundant” networks. They can provide the “big-picture” view of networks and applications to answer questions that are critical for the end-user experience. These questions may include knowing what impact server consolidation will have on users. Such applications can help address which will work better on a particular network, a thick or thin clients configuration. Also, performance monitoring applications can help identify which sites are in greatest need of upgrades or downgrades, and which web pages are the slowest to download.
Drill-down troubleshooting capabilities can reveal metrics that can save weeks or months of time in identifying and resolving issues. Analyses that previously required six weeks to complete with packet sniffing tools may be accomplished in minutes when end-to-end performance monitoring appliances are properly configured. Because they continuously monitor applications, such appliances notice and report even difficult intermittent issues that cannot readily be reproduced. If a problem occurred at 3:00 a.m. the previous morning, their stored reports can be used for a post-mortem analysis. There is no need to wait for a recurrence in order to capture the behavior the way legacy troubleshooting tools require.
End-to-end performance monitoring appliances with intelligent thresholds can alert a network performance management team to a developing problem before the problem severely impacts customers. Such proactive management and high-level views allow network managers to discover new ways to optimize the network.
Unfortunately, known end-to-end performance monitoring and management systems fail to provide completely satisfactory operation. There are several existing response-time monitoring tools (e.g., NetIQ's Pegasus and Compuware's Ecoscope) that require a hardware and/or software agent be installed near each client site from which end-to-end or total response times are to be computed. The main problem with this approach is that it can be difficult or impossible to get the agents installed and keep them operating. For a global network, the number of agents can be significant; installation can be slow and maintenance painful. For an eCommerce site, installation of the agents is not practical; asking potential customers to install software on their computers probably would not meet with much success. A secondary issue with this approach is that each of the client-site agents must upload their measurements to a centralized management platform; this adds unnecessary traffic on what may be expensive wide-area links. A third issue with this approach is that it is difficult to accurately separate the network from server delay contributions.
To overcome the issue with numerous agent installs, some companies (e.g., KeyNotes and Mercury Interactive) offer a subscription service whereby one may use their preinstalled agents for response-time monitoring. There are two main problems with this approach. One is that the agents are not monitoring “real” client traffic but are artificially generating a handful of “defined” transactions. The other is that the monitoring does not generally cover the full range of client sites—the monitoring is limited to where the service provider has installed agents.
Developers continue to improve methods and systems for testing networks, servers, and services for availability and performance. Among what is needed is the ability to visualize the operations of a computer network for identifying performance management issues and problems, together with probable causes of related problems.
SUMMARYTechniques for visualizing network performance characteristics are disclosed, which techniques both improve the operation of the associated networks and support more associated performance management functions.
According to one aspect of the disclosed subject matter, there is here provided a method and system for visualizing and monitoring quality of service of a computing network. The method includes the steps and the system includes the structures for monitoring application network transactions and behaviors for the computing network. The computing network includes one or more client subnets accessing one or more servers. The monitoring may be independent of client site monitors. The method and system gather statistical data relating to at least one network, a server, and associated applications and generate a plurality of measurements of at least one quality of service indicator. The quality of service indicators relate to the performance of the computer network. The method and system further display graphically the plurality of measurements of the at least one quality of service indicator according to the date and time of gathering the statistical data and the degree by which each of said plurality of measurements of the quality of service indicator varies from a predetermined threshold quality of service level for the computing network.
According to another aspect of the disclosed subject matter, here is disclosed a method and system for visualizing and monitoring the performance of a computer network that include the steps and structures for displaying graphically a plurality of averaged network quality of service indicators. The averaged network quality of service indicators are associated on a radial plot and visually interlinked to form a nominal performance polygon. The nominal performance polygon includes a plurality of corners. Each of said corners corresponds to a separate one of the plurality of averaged network quality of service indicators. The method and system furthermore dynamically measure a plurality of network quality of service indicators. Each of the plurality of network quality of service indicators corresponds to one of the plurality of averaged network quality of service indicators. The method and system display graphically the dynamically measured plurality of network quality of service indicators as a radial plot point on the radial plot and visually interlink the radial plot points for forming a dynamic performance polygon. The dynamic performance polygon relates to the dynamic performance of the computer network. The disclosed subject matter allows monitoring of the dynamic performance of the computer network by dynamically comparing variations in said dynamic performance polygon with said nominal performance polygon.
A technical advantage of the disclosed subject matter includes the ability to directly compare metrics or measurements of different network quality of service indicators, regardless of the particular units of measure that are associated with the different indicators. Because the method and system here disclosed compare normalized indicator measurements to averaged values of network quality of service indicators, the indicators may be in milliseconds, percents, counts, or other measurement units.
These and other aspects of the disclosed subject matter, as well as additional novel features, will be apparent from the description provided herein. The intent of this summary is not to be a comprehensive description of the claimed subject matter, but rather to provide a short overview of some of the subject matter's functionality. Other systems, methods, features and advantages here provided will become apparent to one with skill in the art upon examination of the following FIGUREs and detailed description. It is intended that all such additional systems, methods, features and advantages that are included within this description, be within the scope of the accompanying claims.
The features, nature, and advantages of the disclosed subject matter will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein:
Performance management functions 22 to which the disclosed subject matter relates provide (a) proactive, measurement-based management functions for permitting root cause/routing bottleneck diagnosis using fault management functions 24; (b) capacity planning and design, server location decisions, technology evaluations, and requirement predictions using configuration management functions 26; (c) cost/performance trade-off analyses using accounting management functions 28, and privacy and intrusion detection and prevention policies and procedures using security management functions 30. In addressing these characteristics of performance management environment 20, the disclosed subject matter provides a method and system for visualizing network performance characteristics.
In network optimization system 40, router 42 collects flow-based statistics on network traffic, such as protocols used, ports used, and other information. Application statistics functions may be dedicated services that gather application-level information and statistics through protocols, such as SNMP, that may, for example, follow the RMON2 standard. In addition, network optimization system 40 uses network devices 46 to acquire statistics that may be gathered through SNMP. Response time data may be collected from individual program databases, such as an end-to-end database. Network specific databases 50 may receive the information from these network resources and generate a highly scalable network-specific database that uses data-mining techniques and data pre-processing functions to process a large set of statistical information. At workstation 52, computer network 12 analysis and reporting functions occur. In essence, workstation 52 provides dynamic incident tracking and investigation, supporting various performance management functions 22. From workstation 52, the disclosed subject matter provides report visualization functions 54 as an output of the present method and system for visualizing network performance characteristics.
One aspect of network optimization system 40 includes application response time functions that quickly track and measure end-user response time. The application response time functions operate without desktop or server agents and separate response time into application, network, and server delay components. The application response time functions, therefore, enable rapid troubleshooting of application performance bottlenecks. Network optimization system 40 also includes automated processes to measure and analyze application response time for all user transactions. Report visualization functions 54 permit comparisons of response times and other computer network 12 performance indictors against intelligent baselines. Moreover, network optimization system 40 may automatically investigate the cause of problems as they occur.
Network optimization system 40 also provides report analyzer functions that operate on workstation 52 and in conjunction with report visualization functions 54. The result is a flexible analysis engine that enables network managers to understand how application traffic impacts computer network 12 performance. Using report visualization functions 54 and the associated analysis tools of workstation 52, the present embodiment allows for identification of which applications are using excessive bandwidth, the location of such users, and when such applications are being used. Report analyzer functions of workstation 52 cooperate with network specific databases 50 to store and report enterprise-wide router 42 data and application statistics 44, for extended periods of time (e.g., an entire year). Such storage in network specific databases 50 allows network managers to make important cost reduction, troubleshooting, capacity planning, and traffic analysis decisions.
Network optimization system 40 separates application response times into network, server, and application delays, and generates alarms based on customer defined thresholds. However, network optimization system 40 does not require the deployment of agents on workstations within computer network 12. Network optimization system 40 collects large amounts of data from multiple sources and presents them as meaningful information and can aggregate data for reporting and analysis. Network optimization system 40 provides custom exception reporting and may drill down from the enterprise level to individual hosts and conversations occurring on computer network 12.
In addition, network optimization system 40 enables a variety of computer network 12 advisory services to occur. That is, network optimization system 40 permits analysis of application response times without deploying client-side agents. Using the disclosed subject matter, network optimization system 40 permits the analysis of huge volumes of data from multiple sources for rapid identification by application of network traffic and congestion sources. As such, network optimization system 40 enables advisory service for making recommendations that translate into lower network costs and improved response, thus making advantageous use of the visualizations herein described.
Violations of various service agreements, for example, may relate to the degree by which response time plot 68 exceeds threshold bar 66 time limits, e.g., one second. For example, peak 74 may be viewed as a major violation, since the response time exceeds threshold bar 66 by approximately two seconds. Peak 76, which exceeds threshold bar 66 by slightly more than one second, may be viewed as an intermediate violation. Finally, peak 78 may be considered a minor violation, since the one-second threshold bar 66 is exceeded by less than 0.25 seconds.
Response time data plot 60 shows a specific server. The present embodiment may also provide information for a single interface. However, other embodiments may also provide for multiple servers, i.e., at a next level of granularity. Thus, using one diagram, it is possible to determine the aggregate violations of a set of servers. This potentially provides such additional valuable information that may be useful for managing the operation of a network.
Particularly attractive features of violation intensity charts include workday regions 96 and weekend regions 98. Workday regions 96 brackets the hours during which a company generally works. Weekend regions 98 highlights the weekend days. Identifying these time and day regions, violation intensity chart 80 allows a network manager to focus attention on specific violation periods. Thus, for example, in the event that an excessive number of red tick marks 90 arise in work hour regions 96 and outside of weekend regions 98, time response violation may be a major concern for computer network 12 which requires immediate attention. On the other hand, if red tick marks 90 only occur during weekend regions 98 and outside work hour regions 96, then immediate action may not be appropriate.
The shadings in color intensity provide the ability to determine utilization, as well as violations. The shadings in color intensity also provide the ability to determine the degree of the utilization and/or violation of a particular network.
In addition to violation intensity chart 80, the disclosed subject matter provides meaningful visualizations of associated and interdependent network quality of service indicators.
Server print plot 100 displays key indicators for a network problem solution in one diagram. Prior approaches may have required up to three browsers and many different plots at a single time to see all of the information appearing in server print plot 100.
Dynamic performance polygon 134 of
The example of
In
Network print plot 170 portrays computer network 12 quality of service indicators along network round trip time (NRTT) axis 184, percent (%) byte loss axis 186, volume(to) axis 188, volume(from) axis 190, total sessions axis 192, retransmission axis 194, and users axis 196.
Dynamic performance polygons 174 through 180 provide information in reverse order from the violation. This allows a view of dynamic performance polygon 180, which occurs only five minutes before dynamic performance polygon 182. Dynamic performance polygon 180 shows a large percentage byte loss. Another out of specification indicator is the retransmission indicator. There was also more volume to the server. Going back one more frame to dynamic performance polygon 178, it is possible to see that the only indicator that is out of specification is the number of users.
By continuing to back up the measurements, it is possible to isolate the first out of range indicator. This may assist in determining the root cause of the network malfunctions or mis-configurations. In dynamic performance polygon 174, the total sessions and users indicators are high. Thus, what caused the network to malfunction was the presence of too many sessions and users. This situation, however, is not at all apparent from the measurement, i.e., dynamic performance polygon 182 that resulted in the service agreement violation. That is, the violation was an effect, and certainly not a cause of the network malfunction. This demonstrates the dynamic, interrelated nature of computer network 12 and how a network degradation may affect different quality of service indicators.
Thus, using the combination of dynamic performance polygons and nominal performance polygons in server and network print plots, there is the potential for indicating correlations and causalities.
The disclosed subject matter may provide the ability to determine a network violation at some period before it occurs. In such case, there may be the ability to respond to an indicator change and, thereby, take preemptive action that could reduce or eliminate serious network effects. Such preemptive action may include avoiding over-use of network resources or timing of excessive network loading to occur at more optimal times.
In yet a further embodiment of the disclosed subject matter, there is the ability to associate a plurality of server or network print plots. It may be possible to categorically identify the different violations that occur by viewing a broad array of server or network print plots. Upon categorically identifying such violations, based on the server or network print plots, the disclosed server or network print plots may provide insights into how to categorically eliminate network violations or out of range conditions. By categorically eliminating problems, based on the characteristic server or network print plots that such problems generate, the disclosed subject matter may very significantly improve overall network operations.
Moreover, by creating and diagnosing categories of server or network print plots, the present embodiment may suggest correlations between different categories of network conditions that generate characteristic server print plots. By responding to server or network print plot data, even prior to an out of range condition arising, the disclosed subject matter may even more significantly improve overall network performance.
On an even larger scale, by associating categories of server or network print plots from various points of a network system, the disclosed subject matter may provide real-time data for assisting in the diagnosis of network problems at many different levels. Thus, in addition to providing comparisons of real time visualizations of computer network 12 performance, the disclosed subject matter allows for the aggregation of statistics over longer periods of time. Such aggregations enable trend analyses for both longer term and larger scale performance management functions. For example,
In contrast,
While
In summary, therefore, the disclosed subject matter provides a method and system for visualizing and monitoring quality of service of a computing network. The method includes the steps and the system includes the structures for monitoring application network transactions and behaviors for the computing network. The computing network includes one or more client subnets accessing one or more servers. The monitoring may be independent of client site monitors. The method and system gather statistical data relating to at least one network, a server, and associated applications and generate a plurality of measurements of at least one quality of service indicator. The quality of service indicators relate to the performance of the computer network. The method and system further display the plurality of measurements of the at least one quality of service indicator according to the date and time of gathering the statistical data and display graphically the degree by which each of said plurality of measurements of the quality of service indicator varies from a predetermined threshold quality of service level for the computing network.
In further summary, the disclosed subject matter provides a method and system for visualizing and monitoring the performance of a computer network that include the steps and structures for displaying graphically a plurality of averaged network quality of service indicators. The averaged network quality of service indicators are associated on a radial plot and visually interlinked to form a nominal performance polygon. The nominal performance polygon includes a plurality of corners. Each of said corners corresponds to a separate one of the plurality of averaged quality of service indicators. The method and system furthermore dynamically measure a plurality of network quality of service indicators. Each of the plurality of network quality of service indicators corresponds to one of the plurality of averaged network quality of service indicators. The method and system display graphically the dynamically measured plurality of network quality of service indicators as radial plot points on the radial plot and visually interlink the radial plot points for forming a dynamic performance polygon. The dynamic performance polygon relates to the dynamic performance of the computer network. The disclosed subject matter allows monitoring the dynamic performance of the computer network by dynamically comparing variations in said dynamic performance polygon with said nominal performance polygon.
The foregoing description of the preferred embodiments, therefore, is provided to enable any person skilled in the art to make or use the claimed subject matter. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the innovative faculty. Thus, the claimed subject matter is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A method for visualizing and monitoring quality of service of a computing network, the method comprising:
- monitoring application network transactions and behaviors for the computing network, the computing network including one or more client subnets accessing one or more servers, the monitoring capable of being independent of client site monitors;
- gathering statistical data relating to at least network, server, and application for generating a plurality of measurements of at least one quality of service indicator associated with the performance of said computer network;
- graphically displaying said plurality of measurements of said at least one quality of service indicator according to the date and time of gathering said statistical data; and
- further displaying graphically the degree by which each of said plurality of measurements of said quality of service indicator varies from a predetermined threshold quality of service level for the computing network.
2. The method of claim 1, further comprising the step of identifying a subset of said plurality of measurements of said at least one quality of service indicator according to the week day on which said subset of plurality of measurement occurs.
3. The method of claim 1, further comprising the step of identifying a subset of said plurality of measurements of said at least one quality of service indicator according to the hour during a day in which said subset of plurality of measurement occurs.
4. A system for visualizing and monitoring quality of service of a computing network, the system comprising:
- network monitoring circuitry for monitoring application network transactions and behaviors for the computing network, the computing network including one or more client subnets accessing one or more servers, the monitoring capable of being independent of client site monitors;
- statistical data gathering circuitry for gathering statistical data relating to at least network, server, and application for generating a plurality of measurements of at least one quality of service indicator associated with the performance of said computer network;
- a display for graphically displaying said plurality of measurements of said at least one quality of service indicator according to the date and time of gathering said statistical data; and
- said display further comprising graphical display circuitry for further displaying graphically the degree by which each of said plurality of measurements of said quality of service indicator varies from a predetermined threshold quality of service level for the computing network.
5. The system of claim 4, further comprising instructions for associating said predetermined threshold quality of service level with a service level agreement associated with the computing network.
6. The system of claim 5, further comprising instructions for grouping said plurality of measurements of said at least one quality of service indicator according to the degree by which each of said measurements violates said service level agreement.
7. A method for visualizing and monitoring the performance of a computer network, comprising the steps of:
- displaying graphically a plurality of averaged network quality of service indicators, said averaged network quality of service indicators associated on a radial plot and visually interlinked to form an nominal performance polygon, said nominal performance polygon comprising a plurality of corners, each of said corners corresponding to a separate one of said plurality averaged quality of service indicators;
- dynamically measuring a plurality of network quality of service indicators, each of said plurality of network quality of service indicators corresponding to one of said plurality of averaged network quality of service indicators;
- displaying graphically said dynamically measured plurality of network quality of service indicators as a radial plot point on said radial plot;
- visually interlinking said radial plot points for forming a dynamic performance polygon, said dynamic performance polygon relating to the dynamic performance of said computer network; and
- monitoring the dynamic performance of the computer network by dynamically comparing variations in said dynamic performance polygon with said nominal performance polygon.
8. The method of claim 7, further comprising the step of graphically displaying said plurality of measurements of said at least one quality of service indicator from the group consisting essentially of network round trip time, percent (%) byte loss, volume (to) traffic, volume (from) traffic, number of total sessions, number of retransmission, and number of users.
9. The method of claim 7, further comprising the step of determining a plurality of network quality of service problems as a result of monitoring the dynamic performance of the computer network using said dynamic performance polygon and said nominal performance polygon.
Type: Application
Filed: Feb 19, 2008
Publication Date: Jul 24, 2008
Applicant: NETQOS, INC. (Austin, TX)
Inventor: Peter Mullarkey (Austin, TX)
Application Number: 12/033,816
International Classification: G06F 15/173 (20060101);