Fluentd plugin to send NSCA / Nagios service checks

I wrote a Fluentd plugin which sends service checks to an NSCA / Nagios monitoring server. You can utilize the plugin to detect anomaly of logs and send alerts using Nagios.

Installation

Install fluent-plugin-nsca gem.

You don't have to install send_nsca command, because this plugin uses a pure ruby NSCA client library.

Use case: “too many server errors” alert

Assume you have

  • “web” server (192.168.42.123) which runs Apache HTTP Server and Fluentd, and
  • “monitor" server (192.168.42.210) which runs Nagios and NSCA.

You want to be notified when Apache responds too many server errors, for example 5 errors per minute as WARNING, and 50 errors per minute as CRITICAL.

This can be implemented as the following figure shows.

http://d.hatena.ne.jp/miyakawa_taku/files/2015-03-07_deployment.png?d=.png

Nagios configuration on “monitor" server

Create web.cfg file shown as below, under the Nagios configuration directory.

# File: web.cfg

# "web" server definition
define host {
  use generic-host
  host_name web
  alias web
  address 192.168.42.123
}

# Server errors service definition
define service {
  use generic-service
  name server_errors
  active_checks_enabled 0
  passive_checks_enabled 1
  flap_detection_enabled 0
  max_check_attempts 1
  check_command check_dummy!0
}

# Delete this section if check_dummy command is defined elsewhere
define command {
  command_name check_dummy
  command_line $USER1$/check_dummy $ARG1$
}

Fluentd configuration on “web” server

This setting utilizes fluent-plugin-datacounter, fluent-plugin-record-reformer, and of course fluent-plugin-nsca. So, first of all, install the gems of those plugins.

Next, add these lines to the Fluentd configuration file.

# Parse Apache access log
<source>
  type tail
  tag access
  format apache2

  # The paths vary by setup
  path /var/log/httpd/access_log
  pos_file /var/log/fluentd/httpd-access_log.pos
</source>

# Count 5xx errors per minute
<match access>
  type datacounter
  tag count.access
  unit minute
  aggregate all
  count_key code
  pattern1 error ^5\d\d$
</match>

# Calculate the severity level
<match count.access>
  type record_reformer
  tag server_errors
  enable_ruby true
  <record>
    level ${error_count < 5 ? 'OK' : error_count < 50 ? 'WARNING' : 'CRITICAL'}
  </record>
</match>

# Send checks to NSCA
<match server_errors>
  type nsca
  server 192.168.42.210
  port 5667
  password peng!

  host_name web
  service_description server_errors
  return_code_field level
</match>

The next figure shows the data flow.

http://d.hatena.ne.jp/miyakawa_taku/files/2015-03-07_fluentd.png?d=.png

You can use record_transformer filter instead of fluent-plugin-record-reformer on Fluentd 0.12.0 and above.

If you are concerned with scalability, fluent-plugin-norikra may be a better option than datacounter and record_reformer.

Contirubting

Please submit an issue or a pull request on the GitHub repository.

Feed back to @miyakawa_taku on Twitter is also welcome.