Application Monitoring Challenges

Various types of systems cause many different challenges in monitoring of the applications. Each system has a different setup on or off of servers. Some systems may be virtualized and not have access to a shared drive where logs can be stored. The most common types of systems include shared systems and clustered systems. Another major challenge of application monitoring is production logging. Since systems may already be in a production environment, implementing logging may come at a considerable price.  

Shared Systems 

This is a major challenge for application monitoring. Monitoring an application on its own system allows for ease of tracking statistics such as memory usage, disk space, CPU usage and network bandwidth. Most applications however share a common system on which they share all of the above. This makes tracking usage of each system difficult and may lead to applications being impacted for no fault of their own. If the application is on a shared system which the developer/owner of the application does not own it may be difficult to make the corrective actions. 

Clustered Systems 

Clustered systems are used to avoid a single point of failure. This is the act of using multiple machines in different areas of the network to avoid the point of failure. This is challenging to monitor because in order to find which system is causing the shortfall, all logs must be parsed to see which is having the bad performance. Also depending on the type of cluster there may be a main server which controls all the other servers which also needs to be monitored. If the main server were to crash all others would be rendered useless. 

Production Logging 

Production logging is generally limited since file input and output is very time consuming. Some high volume transaction applications may have tight threshold levels and custom production logging may bring them below those levels. If an error cannot be repeated in a test environment there may need to be changes deployed to production to catch and handle/log the error more specifically. Updating a production operation requires downtime of the application in many cases. This downtime may not be acceptable and could cause a loss of profit for the company.