Wednesday 26 March 2014

Auto monitoring of aws instance using python boto

Over here, we can discuss about " monitoring automation ", a generic way: which will work for any one who is using "AWS" "Nagios" for monitoring.

Features:
[1]: automatically add the new host into monitoring, when ever we add a new system into the aws system.
[2]: automatically will remove the system from monitoring if we terminate the system from aws system.
[3]: read group information from custom tags.

NOTE: As we are going auto monitoring, few rule we have to maintain, else the monitoring will fail.

Rule1: We can have only two tags to any of our aws instance. [1. default: Name, 2. groups ] NOTE, these are case sensitive, so please maintain the same.

Rule2: As of now we have the following key words that can be part of the groups custom tags: [Note: if  you need new, you have to let me know before putting the value. This is also case sensitive ] [ you can update the nagios hostgroup config file to add new hostgroup, before adding them into groups custom tags.]

        hostgroup_name      hadoop
        hostgroup_name      db
        hostgroup_name      http


Following is the python boto script:

#!/usr/bin/env python
import boto.ec2
import subprocess
#import os, subprocess

conn=boto.ec2.connect_to_region('us-east-1')

reservations = conn.get_all_instances()
for res in reservations:
    for inst in res.instances:
            print ("define host{")
            print "%s \t %s" % ("use","generic-host") # \t for tab
            print "%s  %s" % ("host_name", inst.tags['Name'])
            if inst.tags['Name'] == 'qa1':
                print "%s \t%s" % ("check_command", "check_ssh")
                # different check for qa1 as it is fedora system.
            print "%s \t %s: %s" % ("alias", inst.tags['Name'], inst.public_dns_name)          
            print "%s  %s" % ("address", inst.private_ip_address)
            # Swapped the alias and address value, because of cost effective)
            ## Following few code block will check for a custom tags knonw as groups
            ## if its find the groups, then that host will be part of those hosts.
            alltags = (inst.tags) # Will get all the other tags.
            alltagsC = str(alltags) # changing the variable type to string.
            isgroup = (alltagsC.find('groups'))
            if isgroup > 0:
                sp = isgroup+11 #found the groups index value and picking the other groups
                #global otherGroups
                otherGroups = alltagsC[sp:-2]
                #print  "%s  %s  %s" % ("hostgroups", inst.instance_type, otherGroups)
                print "%s  %s" % ("hostgroups", otherGroups)
            #else:
                #print "%s %s" % ("hostgroups", inst.instance_type)
            print ("}\n")

NOTE: As of now I don't know how to get the custom tags value so did some hacks.
NOTE: Removing instance type as part of group, because the monitor will fail, if we have define any group with a instance type and no host is part of that group.


And put the following script into a file and put the file under root crontab:

#!/bin/bash
sudo /path/to/getInstanceDetails.py > /path/to/all_hosts.cfg
sleep 2
sudo service nagios3 restart

##Added this above script in cron as root user: sudo crontab -e
## */15 * * * * sudo /path/to/aboveScrptName.sh


## Now where I will update, what to check where ##

define service{
        hostgroup_name                  db ;<-NOTE: over here you just have to put hostgroup.
        service_description             MYSQL
        check_command                   check_nrpe_1arg!check_mysql
        use                             generic-service-after-15 ; Name of service template to use
        notification_interval           0 ; set > 0 if you want to be renotified
}

NOTE: you can create generic-service-xxx names with its own properties and add them over here.

No comments:

Post a Comment