Application Log Monitoring


In the multi-server (multi-cluster) environment on-premise or on cloud, there is always requirement to monitor it for any issues encountered or about to come.

In a SAAS environment, we never wanted any downtime because of any issue with the application or with the server environment. We need to have some monitoring system in-place to monitor the health of application as well as servers on which it is hosted. Monitoring system will not only help in pro-active prevention of any upcoming issue/s or doing troubleshooting when application faces issue/s, it also helps us in monitoring the security of the application. For example, if some unauthorized person tries to get hold of the application, application will write this access request with details like person ID, IP address and time of access to the log. One can then investigate it and pro-actively acts on it.

Logs generated by application can be written in the eventlog or anywhere in the application specific log file.

There are many ways to monitor the logs. It not only requires the capturing of log data from eventlog or any other source, it also requires strong analytics tool to produce sensible information from the log files. Note that we can have Gigs/TBs of log data. Manually, it will be very difficult to analyze it.

Here in this article, I have discussed about two approaches –

  1. Automating it using PowerShell.
  2. Using industry standard tools like SPLUNK.

     

Let’s discuss both approaches here –

Automation using PowerShell –

Application is writing logs in the eventlog. Approach will be to write a PowerShell script that will read the eventlog for any error or warning messages regularly and will then either take some action like restarting a service or just send the email with log as attachment to the application administrator.

Here is the script that can be used for the same –

 

In the above PowerShell, I have used “Get-WinEvent” cmdlet (Gets events from event logs and event tracing log files on local and remote computers.). I added filters to get the “error” and “warning” data for the last 48 hours. It is taking the below actions –

  1. It is analyzing each row to find some specific keywords and based on it is taking some action. In the above script it is restarting a windows service.
  2. It is exporting the data into csv file and sending it as attachment to the application administrator using “Send-MailMessage” cmdlet.

 

We can now add this PowerShell script to the Windows scheduler to run it regularly. Above script will run as a job and you can get its status using the cmdlet ‘Get-Job’.

Automation using tools like SPLUNK –

Suppose we have n number of servers (say 15) in cluster on which we have deployed the application. Application will be running on all these servers using load balancer. While application is running on these servers, it or web server may encounter some issues. How to troubleshoot these issues efficiently? We may use PowerShell way as discussed above but it needs complex script to visualize data out of tons of logs generated. Tools like SPLUNK provides automated ways to collect the log data from all the server in real time and can be queried and visualized using its analytics.

Splunk have many components that needs to be setup for making it work. One of the important component is “forwarder” which needs to be installed on all the servers. This part can easily be done using SCCM based deployment. It has the indexer that will index the data for efficient querying of the data.

Now suppose an event has been triggered in 3 of the servers in cluster and application/system administrators are not aware of. Splunk will get the logs and analyze the collected logs for different keywords using the defined queries. It will then send the alert to administrators using email. With it you will be able to proactively address any upcoming issues.

One more good example is, suppose the InfoSec (Information security) team has mandated the requirement that only special type of user accounts should be added to the servers. Suppose someone added user account which is not supposed to be added in the server. As soon as it is added, system will add a log in the log about the addition of user in the administrator group. Splunk will collect that log and the predefined query will run and immediately it will find the non-compliance and will send the email notification to the Infosec team.

—–End of Article—-

Leave a comment