Como obter saída do Nagios em um arquivo

1

Configurei a ferramenta de monitoramento Nagios para um servidor linux m / c para monitorar outro host linux m / c (até agora estou monitorando apenas um m / c). Eu segui a documentação oficial e instalei o servidor nagios no lado do servidor e o daemon NRPE no lado do cliente. De acordo com a documentação, o nagios está funcionando com sucesso e mantendo suas verificações periódicas para todos os serviços que eu fiz para monitorar e também instalou alguns plugins adicionais também.

Mas eu quero saber o procedimento como obter saída do hostory de monitoramento em um arquivo específico em um formato adequado. Como eu não instalei a interface web através do Apache, ainda existe alguma solução para o meu problema

A seguir, o arquivo de log que estou recebendo para o monitoramento de nagios:

[1349064000] LOG ROTATION: DAILY
[1349064000] LOG VERSION: 2.0
[1349064000] CURRENT HOST STATE: localhost;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.03 ms
[1349064000] CURRENT HOST STATE: remotehost;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.17 ms
[1349064000] CURRENT SERVICE STATE: localhost;Current Load;OK;HARD;1;OK - load average: 0.00, 0.00, 0.00
[1349064000] CURRENT SERVICE STATE: localhost;Current Users;OK;HARD;1;USERS OK - 7 users currently logged in
[1349064000] CURRENT SERVICE STATE: localhost;HTTP;OK;HARD;1;HTTP OK HTTP/1.1 200 OK - 1889 bytes in 0.001 seconds
[1349064000] CURRENT SERVICE STATE: localhost;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.04 ms
[1349064000] CURRENT SERVICE STATE: localhost;Root Partition;CRITICAL;HARD;100;DISK CRITICAL - free space: / 108 MB (1% inode=61%):
[1349064000] CURRENT SERVICE STATE: localhost;SSH;OK;HARD;1;SSH OK - OpenSSH_5.1 (protocol 2.0)
[1349064000] CURRENT SERVICE STATE: localhost;Swap Usage;OK;HARD;1;SWAP OK - 97% free (841 MB out of 870 MB)
[1349064000] CURRENT SERVICE STATE: localhost;Total Processes;OK;HARD;1;PROCS OK: 79 processes with STATE = RSZDT
[1349064000] CURRENT SERVICE STATE: remotehost;CPU Load;OK;HARD;1;OK - load average: 0.08, 0.02, 0.01
[1349064000] CURRENT SERVICE STATE: remotehost;Current Users;WARNING;HARD;3;USERS WARNING - 3 users currently logged in
[1349064000] CURRENT SERVICE STATE: remotehost;File Size;WARNING;HARD;3;WARN: /home/new/ctags.1p has size 13864 Byte. Warn at 13000. :
[1349064000] CURRENT SERVICE STATE: remotehost;Swap Usage;OK;HARD;1;SWAP OK - 100% free (869 MB out of 870 MB)
[1349064000] CURRENT SERVICE STATE: remotehost;Total Processes;OK;HARD;1;PROCS OK: 106 processes
[1349064000] CURRENT SERVICE STATE: remotehost;Zombie Processes;OK;HARD;1;PROCS OK: 0 processes with STATE = Z
[1349064028] SERVICE NOTIFICATION: nagiosadmin;remotehost;Current Users;WARNING;notify-service-by-email;USERS WARNING - 3 users currently logged in
[1349064988] Auto-save of retention data completed successfully.
[1349065258] SERVICE NOTIFICATION: nagiosadmin;remotehost;File Size;WARNING;notify-service-by-email;WARN: /home/new/ctags.1p has size 13864 Byte. Warn at 13000. :
[1349065938] SERVICE NOTIFICATION: nagiosadmin;localhost;Root Partition;CRITICAL;notify-service-by-email;DISK CRITICAL - free space: / 109 MB (1% inode=61%):
[1349067628] SERVICE NOTIFICATION: nagiosadmin;remotehost;Current Users;WARNING;notify-service-by-email;USERS WARNING - 3 users currently logged in
[1349068588] Auto-save of retention data completed successfully.
[1349068858] SERVICE NOTIFICATION: nagiosadmin;remotehost;File Size;WARNING;notify-service-by-email;WARN: /home/new/ctags.1p has size 13864 Byte. Warn at 13000. :
[1349069538] SERVICE NOTIFICATION: nagiosadmin;localhost;Root Partition;CRITICAL;notify-service-by-email;DISK CRITICAL - free space: / 109 MB (1% inode=61%)

Diga-me por favor se estou errado em algo sobre isso. Se mais alguma informação nagios for necessária para este problema, me avise que compartilharei isso com certeza.

Obrigado antecipadamente.

    
por Manroop 01.10.2012 / 08:52

1 resposta

1

Primeiro, deixe-me dizer-lhe que lamento muito por não ter respondido à sua pergunta, pois estava um pouco ocupado nos últimos dias.

Aqui vou fornecer-lhe duas respostas à sua pergunta.

Primeira resposta: (plano e não inovador) :

!/bin/sh
#
# Log file pattern detector plugin for Nagios
#
# Usage: ./check_log <log_file> <old_log_file> <pattern>
#
# Description:
#
# This plugin will scan a log file (specified by the <log_file> option)
# for a specific pattern (specified by the <pattern> option).  Successive
# calls to the plugin script will only report *new* pattern matches in the
# log file, since an copy of the log file from the previous run is saved
# to <old_log_file>.
#
# Output:
#
# On the first run of the plugin, it will return an OK state with a message
# of "Log check data initialized".  On successive runs, it will return an OK
# state if *no* pattern matches have been found in the *difference* between the
# log file and the older copy of the log file.  If the plugin detects any 
# pattern matches in the log diff, it will return a CRITICAL state and print
# out a message is the following format: "(x) last_match", where "x" is the
# total number of pattern matches found in the file and "last_match" is the
# last entry in the log file which matches the pattern.
#
# Notes:
#
# If you use this plugin make sure to keep the following in mind:
#
#    1.  The "max_attempts" value for the service should be 1, as this
#        will prevent Nagios from retrying the service check (the
#        next time the check is run it will not produce the same results).
#
#    2.  The "notify_recovery" value for the service should be 0, so that
#        Nagios does not notify you of "recoveries" for the check.  Since
#        pattern matches in the log file will only be reported once and not
#        the next time, there will always be "recoveries" for the service, even
#        though recoveries really don't apply to this type of check.
#
#    3.  You *must* supply a different <old_file_log> for each service that
#        you define to use this plugin script - even if the different services
#        check the same <log_file> for pattern matches.  This is necessary
#        because of the way the script operates.
#
# Examples:
#
# Check for login failures in the syslog...
#
#   check_log /var/log/messages ./check_log.badlogins.old "LOGIN FAILURE"
#
# Check for port scan alerts generated by Psionic's PortSentry software...
#
#   check_log /var/log/message ./check_log.portscan.old "attackalert"
#

# Paths to commands used in this script.  These
# may have to be modified to match your system setup.
# TV: removed PATH restriction. Need to think more about what this means overall
#PATH=""

ECHO="/bin/echo"
GREP="/bin/egrep"
DIFF="/bin/diff"
TAIL="/bin/tail"
CAT="/bin/cat"
RM="/bin/rm"
CHMOD="/bin/chmod"
TOUCH="/bin/touch"

PROGNAME='/bin/basename $0'
PROGPATH='echo $0 | sed -e 's,[\/][^\/][^\/]*$,,''
REVISION="@NP_VERSION@"

. $PROGPATH/utils.sh

print_usage() {
echo "Usage: $PROGNAME -F logfile -O oldlog -q query"
echo "Usage: $PROGNAME --help"
echo "Usage: $PROGNAME --version"
}

print_help() {
print_revision $PROGNAME $REVISION
echo ""
print_usage
echo ""
echo "Log file pattern detector plugin for Nagios"
echo ""
support
}

# Make sure the correct number of command line
# arguments have been supplied

if [ $# -lt 1 ]; then
print_usage
exit $STATE_UNKNOWN
fi

# Grab the command line arguments

#logfile=$1
#oldlog=$2
#query=$3
exitstatus=$STATE_WARNING #default
while test -n "$1"; do
case "$1" in
    --help)
        print_help
        exit $STATE_OK
        ;;
    -h)
        print_help
        exit $STATE_OK
        ;;
    --version)
        print_revision $PROGNAME $REVISION
        exit $STATE_OK
        ;;
    -V)
        print_revision $PROGNAME $REVISION
        exit $STATE_OK
        ;;
    --filename)
        logfile=$2
        shift
        ;;
    -F)
        logfile=$2
        shift
        ;;
    --oldlog)
        oldlog=$2
        shift
        ;;
    -O)
        oldlog=$2
        shift
        ;;
    --query)
        query=$2
        shift
        ;;
    -q)
        query=$2
        shift
        ;;
    -x)
        exitstatus=$2
        shift
        ;;
    --exitstatus)
        exitstatus=$2
        shift
        ;;
    *)
        echo "Unknown argument: $1"
        print_usage
        exit $STATE_UNKNOWN
        ;;
esac
shift
done

# If the source log file doesn't exist, exit

if [ ! -e $logfile ]; then
$ECHO "Log check error: Log file $logfile does not exist!\n"
exit $STATE_UNKNOWN
elif [ ! -r $logfile ] ; then
$ECHO "Log check error: Log file $logfile is not readable!\n"
exit $STATE_UNKNOWN
fi

# If the old log file doesn't exist, this must be the first time
# we're running this test, so copy the original log file over to
# the old diff file and exit

if [ ! -e $oldlog ]; then
$CAT $logfile > $oldlog
$ECHO "Log check data initialized...\n"
exit $STATE_OK
fi

# The old log file exists, so compare it to the original log now

# The temporary file that the script should use while
# processing the log file.
if [ -x /bin/mktemp ]; then
tempdiff='/bin/mktemp /tmp/check_log.XXXXXXXXXX'
else
tempdiff='/bin/date '+%H%M%S''
tempdiff="/tmp/check_log.${tempdiff}"
$TOUCH $tempdiff
$CHMOD 600 $tempdiff
fi

$DIFF $logfile $oldlog | $GREP -v "^>" > $tempdiff

# Count the number of matching log entries we have
count='$GREP -c "$query" $tempdiff'

# Get the last matching entry in the diff file
lastentry='$GREP "$query" $tempdiff | $TAIL -1'

$RM -f $tempdiff
$CAT $logfile > $oldlog

if [ "$count" = "0" ]; then # no matches, exit with no error
$ECHO "Log check ok - 0 pattern matches found\n"
exitstatus=$STATE_OK
else # Print total matche count and the last entry we found
$ECHO "($count) $lastentry"
exitstatus=$STATE_CRITICAL
fi

exit $exitstatus

Mas esteja avisado , eu não executei este, então se ele mostrar alguns erros, você terá que modificá-los por conta própria.

você tem que adicionar esta linha em commands.cfg

define command{
      command_name    check_log
      command_line    $USER1$/check_log -F $CURRENTLOG -O $OLDLOG -q $PATTERN

}

Defina o serviço em localhost.cfg

define service{

    use  local-service           ; Inherit default values from a template
    host_name      localhost
    service_description   check_log
    check_command check_log!/var/log/secure!/usr/local/nagios/libexec/secure.my!"Failed password"
 }

Segunda resposta: (um pouco inovadora):

Tanto quanto eu sei, o arquivo de log para nagios é mantido no seguinte lugar: /var/log/httpd/access_log

Agora, seu arquivo de log, como todo log, conterá informações sobre o carimbo de data / hora. Portanto, precisamos registrar a hora do sistema quando o servidor é iniciado. Da minha experiência, posso dizer que quando iniciamos o WAS, ele gera um processo java.exe. Eu não sei o que é chamado no Nagios. Vamos considerar que é o LNT.exe. Portanto, precisamos encontrar o tempo de desova para o LNT.exe.

Inicie o servidor agora, os logs serão gerados. Agora você precisa ler os logs no arquivo de log somente depois desse tempo para ver apenas os registros atuais.

Primeiro, obtenha o id do processo: ( ps -ef LNT.exe ) e armazene-o em uma variável como processID . Em seguida, faça isso: ls -ld /proc/${processID} e armazene o tempo na variável startedTime

Agora você tem que ler o arquivo linha por linha e você tem que comparar o tempo que você começa com o startedTime . Se startedTime > timeRead , então você tem que tomar esse ponto como referência e então você tem que começar a ler o arquivo daquele local.

    
por 09.10.2012 / 17:34

Tags