Implementing Application Monitoring

Early planning is the best way to start an implementation of monitoring. Planning early allows for the monitoring to be added into the application at the time development instead of a later time. Adding monitoring later in the process tends to focus on only areas of known issues, this may allow some issues to go undetected.

Implementing Application Monitoring Proactively

Proactive application monitoring is the hardest monitoring to implement. In proactive application monitoring the problems are found and dealt with before the consumer even knows there is a problem. This involves monitoring very deep and specifically into the application. This way if an error does happen, the system knows exactly where it occurred and what caused it. [1]

The best approach to proactive monitoring involves a tools based approach. With a tools based approach the monitoring involves no code. This is the best advantage to this approach, allowing for the collection of data to be very easily and readily set up. This is done through class loader instrumentation. This also means that the tools aren’t specific to a single application and can be reused across multiple applications of the same language. The other approach is a code based approach where logging and or monitoring software is inserted directly into the code base during development or by the support team thereafter the initial development. This approach is less efficient as the developers have less time to spend on logic within the program since they have to worry about how to monitor it as well.

Implementing Application Monitoring: Create a Recovery Plan

Having a recovery plan in place is vital to a fast resolution of an issue. Overtime applications may migrate to different support groups. Having a recovery plan with common issues and their fixes helps the teams stay current. An example on how to create a recovery plan is displayed in Figure 1 courtesy of a free spreadsheet available from the website governancesforum.com.

Implementing Application Monitoring

Figure 1: Recovery Plan Template

Implementing Application Monitoring: Create and Use Service Level Agreements

Even if the service level agreement is unofficial it still is worthwhile to have a structure in place.

Service level agreements include items such as the following:

  • What percentage of time that the services will be up (uptime)
  • How many people can use the application at once without performance issues
  • Performance metrics and benchmarks to be used with performance monitoring alerts
  • The rules for notification announcements
  • What statistics will be monitored and when and where they will be available
  • Acceptable response time

An Example of an SLA is given in Figure 2.

 

Service-Level-Agreements-SLA

Figure 2: Example of SLA

 

Application Monitoring Challenges

Various types of systems cause many different challenges in monitoring of the applications. Each system has a different setup on or off of servers. Some systems may be virtualized and not have access to a shared drive where logs can be stored. The most common types of systems include shared systems and clustered systems. Another major challenge of application monitoring is production logging. Since systems may already be in a production environment, implementing logging may come at a considerable price.  

Shared Systems 

This is a major challenge for application monitoring. Monitoring an application on its own system allows for ease of tracking statistics such as memory usage, disk space, CPU usage and network bandwidth. Most applications however share a common system on which they share all of the above. This makes tracking usage of each system difficult and may lead to applications being impacted for no fault of their own. If the application is on a shared system which the developer/owner of the application does not own it may be difficult to make the corrective actions. 

Clustered Systems 

Clustered systems are used to avoid a single point of failure. This is the act of using multiple machines in different areas of the network to avoid the point of failure. This is challenging to monitor because in order to find which system is causing the shortfall, all logs must be parsed to see which is having the bad performance. Also depending on the type of cluster there may be a main server which controls all the other servers which also needs to be monitored. If the main server were to crash all others would be rendered useless. 

Production Logging 

Production logging is generally limited since file input and output is very time consuming. Some high volume transaction applications may have tight threshold levels and custom production logging may bring them below those levels. If an error cannot be repeated in a test environment there may need to be changes deployed to production to catch and handle/log the error more specifically. Updating a production operation requires downtime of the application in many cases. This downtime may not be acceptable and could cause a loss of profit for the company.

Solving Network Problems

Preventing Problems with Network-Management and Planning

  1. The two ways to solving network problems are: pre-emptive troubleshooting and troubleshooting. In a perfect world, we would be able to prevent problems before they occur (pre-emptive troubleshooting), however, network administrators often find themselves repairing problems that already exist (troubleshooting).
  2. Policies and procedures should be applied during the planning stages of a network as well as throughout the network’s life. Tasks that should be included in such policies are: back-up methods, security, hardware and software standards, upgrade guidelines, and documentation

Solving Network Problems

Backing Up Network Data

  1. To formulate any back-up plan, consider the following topics and issues:
    • Determine what data should be backed up as well as how often.  Some files seldom change and may require backup only weekly or monthly.
    • Develop a schedule for backing up your data that includes the type of backup to be performed, how often, and at what time of the day.
    • Identify the person(s) responsible for performing backups.
    • Test your back-up system regularly.

Setting Security Policies

  1. Security policies vary based on sensitivity of data, network size, and the company’s security standards. Once the detailed policy has been outlined in a network plan it should be followed closely in order to be effective.
  2. Security policy should not only include data security, but hardware security as well. If file server or other networking equipment is in a common area that anyone can access, security can easily be compromised.
  3. Special security requirements may be necessary for dial-in users.
  4. One of the most important security considerations is who should be granted network administrative access to the network. The more people who have this level of access, the more likely security problems will occur.

Setting Hardware and Software Standards

  1. It is very important to implement hardware and software standards. Since administrators are responsible for supporting networks, which includes hardware and software, administrators should be involved in deciding what hardware and software will be permitted on the network.
  2. Standards should be defined for desktop computers (standards for several levels of users might be necessary), networking devices, and server configurations.
  3. Regular evaluations of standards help ensure that your network does not become outdated.

Establishing Upgrade Guidelines

  1. Upgrading computer and networking technologies is a never-ending task. Establishing upgrade guidelines will help make the process easier.
  2. Always give users advance notice so that they know changes will take place and can respond to them.  Additionally, disruptive upgrades should not be performed during normal working hours. A plan should be included on “undoing” an upgrade.

Maintaining Documentation

  1. Documentation is often viewed as the most boring and time-consuming aspect of an administrator’s duties, however, it can be one of the most important elements of troubleshooting and solving network problems. A well-documented network is much easier to troubleshoot than a network with very little documentation.
  2. If you work in a networking environment that encompasses multiple LANs, each LAN should have its own set of documentation. One of the most important lists is the address list. If a specific device is causing a problem on a network, it will most likely be identified by a MAC or IP address. If an updated address list is kept, location of a particular device is easily identified. An address list is an example of a list that can be kept as a database.
  3. Documentation should be kept in both hard copy and electronic form so that it is readily accessible. When information is updated, it should be documented on both forms of documentation.

 

Performing Pre-emptive Troubleshooting

  1. Pre-emptive troubleshooting saves time, prevents equipment problems, and ensures data security. More importantly, it can save a lot of frustration when trying to identify the cause of problems.
  2. The five ISO pre-emptive troubleshooting network-management categories are: accounting management, configuration management, fault management, performance management, and security management.

Practicing Good Customer-Relation Skills

  1. Users are customers and your customers are your best source of information when something goes wrong. It is extremely important to build good relationships with users.

Using Network-Monitoring Utilities

  1. It is important to establish a baseline of “normal” networking activity. This baseline can be used to identify when the network is acting “abnormally”.
  2. Network monitoring utilities gather the following information: Events, system usage statistics, and system performance statistics.
  3. Some of the specific network aspects that a baseline can be helpful in monitoring are: daily network utilization patterns, possible bottlenecks, heavy usage patterns, and protocol traffic patterns. This information is obviously very useful. For example, if utilization levels are constantly measured at 60% or greater, it’s time to look at an upgrade.
  4. SNMP is part of the TCP/IP protocol suite and it is used to manage a monitor device. Agents are used to gather information that is stored in a management information base (MIB).
  5. RMON is an advanced network monitoring protocol that extends the capabilities of SNMP. The two versions of RMON are: RMON1 and RMON2

 

Solving Network Problems: Network Troubleshooting

  1. Troubleshooting is often a learned skill instead of a skill that can be taught. However, there is a methodology that can be followed to help in troubleshooting problems.
  1. The following set of steps helps you troubleshoot most common networking problems:
    • Eliminate any potential user errors.
    • Verify that physical connections are indeed working.
    • Verify the status of any suspect NICs.
    • Restart the computer.

 

Structured Approach

  1. Sometimes even if the basic steps are followed, a more detailed approach might be necessary.
  2. The five-steps of the structured troubleshooting approach: prioritize, collect information, establish possible causes, isolate the problem, and test results.
  3. When collecting information, network administrators should find out as much information as they can by asking specific questions, such as those listed in the text.
  4. Once possible causes are identified, the most likely cause should be tested first. It is important to understand that only one change should be implemented at a time. If a technician tries several changes before testing each one, he/she will not know for sure which change solved the problem. Also, the technician may have fixed the original problem, but cause another one with an unnecessary change.

 

Using Special Tools

 

  1. There is more to troubleshooting than instincts and experience. Sometimes special tools are essential in solving problems.
  2. Some of the most common problems found in networks involve the physical cabling. A digital voltmeter (DVM) can measure a cable’s resistance and determine if a cable break has occurred.
  3. A TDR can also determine if a cable is broken. The difference between a TDR and DVM is that a TDR can pinpoint the exact location of the cable break. TDR devices are an expensive device, but can be rented at minimal cost.
  4. Basic cable testers are inexpensive, but typically can only test correct termination of twisted-pair cabling. Advanced cable testers can measure impedance, resistance, attenuation, and length.
  5. Oscilloscopes can be used to identify shorts, sharp bends or crimps, cable breaks, and attenuation problems.
  6. Network monitors evaluate the overall health of a network by monitoring all traffic. These programs can generate reports and graphs based on the data collected.
  7. Protocol analyzers combine hardware and software to provide the most advanced network troubleshooting available. This tool not only monitors traffic in real time but also can capture and decode packets.

 

Network Support Resources

 

  1. Microsoft TechNet is a subscription information service for supporting all aspects of networking. It can be an essential tool when supporting a Microsoft-based network.
  2. If a subscription to Microsoft TechNet is too expensive, Microsoft offers a free support service known as Microsoft Knowledge Base on their web site. It will not offer as much information as the TechNet subscription.
  3. There are support services offered by Linux and Novell NetWare. Many of these sites provide articles about known problems, workarounds and downloads of upgrades and bug fixes.
  4. There are also many online support services that allow you to tap into the knowledge of experienced networking professionals. Additionally, many of the networking periodicals are now available online as well.

 

Common Troubleshooting Situations

 

  1. Cabling and faulty NICs cause some of the more common network problems.

Application Performance and IT

APPLICATION PERFORMANCE MANAGEMENT

The rapid proliferation and adaptation of information technology (IT) have ushered in the Information Age.  Unprecedented advances in computing, networking, and related technologies are enabling information to be generated in ever-greater quantities and transmitted globally with ever-greater speed.  These technologies greatly reduce the limitations that time and space have traditionally imposed on human activities (Lee & Whitley, 2002).  Alberts, Papp, and Kemp (2002) predict that continued IT improvements will have lasting impact on information and communication flows, manifested in increased speed, greater capacity, enhanced flexibility, greater access, more types of messages and heightened demand.

Application Performance Management and IT

Although IT has penetrated all levels of society, large-scale implementation and regular demands for technological innovations are most evident at the organizational level.  Organizations have long recognized information as a strategic asset (de Heer, 1998) and invested in IT in pursuit of effective information management (Owens, Wilson, & Abell, 1995).  Information provides the raw material that organizations can sieve through to extract actionable knowledge.  Knowledge management is a growing concern within organizations that seek to maximize the usefulness of intellectual capital and information resources.  Hence, a major task of knowledge management is the building and enhancement of organizational knowledge infrastructure.  The knowledge management concepts of connect and collect call for the integration of two processes in the design and implementation of any systems to capture organizational knowledge (Weidner, 2002).  Connect focuses on promoting “communities of practice” or the collaboration of potential knowledge contributors; collect, on the other hand, refers to the creation and enrichment of knowledge repositories that can be readily accessible to address organizational needs.

It is perhaps no surprise that IT plays an integral part in the management of organizational knowledge, considering the amounts of information available and expected and the need to minimize information delays.  There is a variety of IT tools, from simple databases to elaborate IT systems, to support the development of organizational knowledge bases.  For large and complex organizations, online systems offer the convenience of single point access via the Internet, which has the added benefits of allowing virtual collaboration and real time information-sharing unimpeded by distance and location.  By consolidating knowledge and making it readily available, such applications have enormous potential in enhancing organizational communication, transparency, and planning.