Server Monitoring

Server monitoring. It’s never been more critical. It’s also never been more challenging. Monitoring of network components and servers is an important part of any Network/Systems Administrator duties.  Generally, monitoring is either an automated or manual task that forms part of a routine maintenance schedule for servers and the network. The network can consist of many different components, and you’ll need to use appropriate tools and methods in order to undertake useful monitoring.

Server Monitoring: Why?

Once a network or server has been installed, how do you know it is working as it should?  Just like a car or any appliance, it may need maintenance or parts replaced to keep it in top working order.  Network and server monitoring allows the Network Administrator to see how hardware and software are performing.  We can look for certain signs or warnings that the system is not working efficiently and take action to fix things to prevent system degradation or failure.

Comprehensive Server Monitoring Tasks

  • A complete range of monitoring services, from basic server availability monitoring, to resource utilization reporting, and service level insights for configured applications.
  • Robust reporting and alerting that ensure administrators stay informed and get the insights they need.
  • Fast return on investment, providing expertise, fast setup, and effective service—while minimizing any effort or distraction for your organization.
  • Availability, uptime
  • CPU, memory, disk
  • Processes, services, jobs, NLMs
  • Event and application logs
  • Message queues and screens
  • Print jobs and queues
  • Directory and file systems
  • Network interfaces
  • Perf. counters and statistics

Server Monitoring Performance and Optimization

Network and server installations vary in requirements and configurations.  Once software and equipment have been installed it may require some tuning or optimization to have the systems working and performing at optimum levels. You will need to undertake some sort of monitoring to determine current performance and then determine if this can be improved or optimized. There are various tools and utilities you can use for this process which will be discussed further in this reading.

Server Monitoring: Benchmarks

Network and server monitoring is required to establish benchmarks.

A benchmark is the result of an objective test used to measure the performance of a computer system relative to some known standards.  Benchmarking your network and servers gives you a starting point with respect to performance and optimization  This will give you information like bandwidth and data throughput that can be compared to manufacturers’ performance specifications. You can then attempt to improve or optimize the performance of the components using these test results as indicators for improvements. Benchmarking programs are a great way to see the relative performance increase that your tweaks and changes have achieved.

Once you have optimized your network and servers the results from your final benchmark tests will become your baseline — that is, what you will compare future monitoring and test results to. The baseline is the level at which your system should perform and any future test results below this level indicate system deterioration.

The Server Monitoring Process

Understanding the purpose of monitoring is the first step in developing a monitoring process. The next step is determining what to monitor and how.

What to monitor?

What are we going to monitor? Is it a network or a system?

Usually we can interchange the terms ‘network’ and ‘system’. Users and IT professionals will often use the words interchangeably but sometimes there can be a distinction. The network is normally the infrastructure that provides users access to data, information and services. Organisations may install and develop applications for the business on the network. These applications may also be known as systems. A network operating system, for example Windows 2008 Standard Server, may also be considered a system.

Most applications today will run on a network. When analyzing the performance of an application it is necessary to evaluate the way it is configured and used on the network and how network performance will impact the application. This includes the performance of the servers that run or interact with the application.

So that brings us back to our original question: what will we monitor? We will want to monitor everything that can impact network and system performance. So what can impact it?

Reflection

List all the components that make up a typical network.

Feedback

There are many components in a typical network. These include:

  • servers
  • workstations
  • printers
  • users
  • cable
  • hubs and switches
  • routers and bridges
  • server operating systems
  • desktop operating systems
  • server applications such as mail servers
  • database servers hardware and software
  • applications
  • disks
  • utilities.

All of these can have an impact on performance.

Server monitoring

Servers are considered network devices; however, there are additional monitoring considerations.  Servers are usually configured to perform a specific role or provide a specific service (file server, application server, etc). Monitoring may need to look at the status of the server with respect to its role. For example, monitoring hard disk performance for a file server may be of value.

For any server, in addition to the network device monitoring options, you need to consider monitoring the following server resources:

  1. processor — the percentage of CPU time is taken by various processes in the server
  2. memory — how the available memory is divided up and which processes it is used for
  3. disk — excessive disk activity, read, write and paging performance
  4. network — volume of network traffic in and out of the server.

The importance of these resources and the impact on server performance is discussed further under ‘System optimization’ in this reading.

As with network devices, the process of monitoring a server can employ manual methods or automated methods. Monitoring software and utilities can be either native to the server operating system (supplied as part of the OS) or third party utilities.

As with network devices, what you monitor and how you monitor it will be determined by your organisational requirements and policies and your need for monitoring.

 

Benchmarking and documentation

As previously discussed a benchmark is an objective test that can be used to measure the performance of a computer system. When an installation is complete the baseline benchmark should be documented.

In relation to change management, benchmarking programs are a great way to see the relative performance increase that your tweaks and changes have achieved. Running a benchmark before and after a change will give you a good idea of where you stand.

Along with your benchmark data the monitoring process should be documented and become part of the Network and System Administrators’ procedures manual. This will ensure that all personnel involved in network and system maintenance or administration know what is being monitored and how it should be conducted.

Monitoring with data collection can produce large quantities of information.  Your monitoring process should address what to do with this data, where it is stored, and how and when to analyse it. Monitoring should be meaningful and useful. It is possible to collect too much data that no one will view, thus monitoring becomes a pointless task.

Your monitoring documentation should address:

  1. purpose of monitoring — why monitoring will be conducted (eg security, performance, network status) and what the outcomes of monitoring will be (eg optimization, review SLA, ongoing capacity planning)
  2. roles and responsibilities — who will perform what tasks and what role management plays, as well as when and how to review the documentation
  3. what will be monitored — specifically state the monitoring requirements (eg disk space usage and I/O performance in all server, network bandwidth utilization)
  4. how monitoring is conducted — detail what utilities will be used and how to use them, including schedules and routines
  5. information management — where collected data will be stored and in what format, how long to keep data and how to archive if required
  6. analysis process — what will happen with the collected data; how it will be analysed and for what purpose (eg planning capacity, looking for security breaches of internal network)
  7. change management — if the monitoring and data analysis suggest the need for network changes to achieve required outcomes (eg improve network performance, improve network security) and how the changes will be implemented
  8. baseline data — list all relevant historical monitoring information like benchmarks and baselines.

Analysis and interpretation of monitoring results

Having run monitoring utilities and software you can start analyzing the collected data. But what are you looking for? Is the system performing or not? Is the network really slow?

How will you know if the system is performing as it should? We live in an instant world. We want things to happen in seconds, not minutes. It was not that long ago that a user could wait for days to get a report, and that report could take hours of computer time to run and print. Now we expect the same report to flash up on a screen.

To analyse system performance you need to have either performance specifications or some form of benchmark.

Ideally, documentation that includes the specifications for the performance of hardware, applications and the network should be available from vendors and manufacturers. The user requirement statement for installations also states performance criteria.

Monitoring and Optimization Tools and Utilities

Monitoring network components and servers can produce large quantities of data in the form of log files and reports. For this information to be useful we need to analyse it and come to some conclusions as to the status and performance of the network and its components. Manual checking and reading of log files is a task envied by no Network or System Administrator.  It is a slow and tedious task, open to incorrect analysis and conclusions due to human error. Fortunately, there are many tools and utilities that can be used to analyse and interpret monitoring data and present information to us in a more usable form. These tools and utilities come with the server operating system software or can be provided by third party vendors.

Over time, monitoring tools and practices evolved and, today, network monitoring includes more proactive performance measurement of network components and servers.

Performance measurement tools may monitor such things as CPU and disk utilization, server load, memory usage, switch, router and network utilization  It might include polling of every piece of equipment on the network to determine the health of these components. Network monitoring may even measure the response time of transactions and applications that are critical to the company or its bandwidth utilization  Measurements that fall outside of the boundaries of pre-set performance benchmarks can trigger an alert to monitoring personnel, or even activate a pre-defined action to correct the situation before a failure occurs. Such measurements can be stored in a database for trend analysis and capacity planning.

Most network monitoring tools will monitor the traffic on the network and show you a graphical representation of flow and detailed statistics of the load on the network. You should be able to analyse the type of packets on it.  The network traffic data can be stored and then later you can search, sort, and filter it. You should be able to select various protocols, sending and receiving hosts, etc. This can help to identify applications and/or users that may be causing performance problems by downloading music or movies.

Large organisations with thousands of users on a network will almost certainly invest in specialized monitoring tools that may be hardware or software products. They may even have a team of technical specialists that do nothing else but check the network.

With a large heterogeneous network it may be necessary to purchase a third party tool that can be used to monitor the many different types of equipment that make up the network. An example of this is HP OpenView software. There are also several tools that can be freely downloaded from the Internet. As a start, follow links to free tools at http://netsecurity.about.com.  Some of these can even allow you to monitor your network from an Internet connection. This is very useful if you are travelling.

We will take a simpler approach here and consider the readily available tools and techniques that are normally found within the network operating system.

Native tools and utilities

Native tools and utilities are software tools and utilities that are provided with the server operating system or network device. They are designed to monitor the required resources and present information in a usable and easily read format. The tools and utilities generally allow you to set options and monitor features specific to the operating system or network device.  These utilities also allow for the setting of alert thresholds so that some form of notification occurs when certain events occur.

Third party tools and utilities

There are many tools and utilities produced by various manufacturers that provide monitoring and optimization functions for various systems. These are generally commercial products that provide monitoring and optimization information in a more useful and functional form than native tools. In most cases, third party tools are generally more specialized or geared to provide specific information and functions.

For example, Oracle provides system tuning utilities that run on servers to monitor and optimize configurations to suit database applications. SysInternal produce software that reports on what files and registry entries are being read or accessed by an operating system.

Other third party tools and utilities can be independent applications that are installed on a network to specifically monitor network and device activity.  Products like OpenView, Whats UP and CAs Unicenter provide network and application monitoring across large networks and organisations.

Running monitoring utilities usually requires resources like memory and CPU time. As a consequence, running monitoring comes with an overhead and can itself impact on the performance of a server or network device.

An example of a network monitoring system that collects data and monitors in real time is JFFNMS. You can download this software at: http://www.jffnms.org/

JFFNMS displays network devices and applications as coloured rectangles. Red rectangles indicate that an alert has occurred for a component. The alerts are indicated in the bottom half of the screen so that alerts can be acknowledged and appropriate action taken.

On a smaller scale there are numerous products available to optimize or ‘tweak’ servers and network devices. If you intend to use these types of products ensure you test them on isolated systems prior to use on production servers and devices.