I wrote a Fluentd plugin which sends service checks to an NSCA / Nagios monitoring server. You can utilize the plugin to detect anomaly of logs and send alerts using Nagios.
Installation
Install fluent-plugin-nsca gem.
You don't have to install send_nsca command, because this plugin uses a pure ruby NSCA client library.
Use case: “too many server errors” alert
Assume you have
- “web” server (192.168.42.123) which runs Apache HTTP Server and Fluentd, and
- “monitor" server (192.168.42.210) which runs Nagios and NSCA.
You want to be notified when Apache responds too many server errors, for example 5 errors per minute as WARNING, and 50 errors per minute as CRITICAL.
This can be implemented as the following figure shows.
Nagios configuration on “monitor" server
Create web.cfg file shown as below, under the Nagios configuration directory.
# File: web.cfg # "web" server definition define host { use generic-host host_name web alias web address 192.168.42.123 } # Server errors service definition define service { use generic-service name server_errors active_checks_enabled 0 passive_checks_enabled 1 flap_detection_enabled 0 max_check_attempts 1 check_command check_dummy!0 } # Delete this section if check_dummy command is defined elsewhere define command { command_name check_dummy command_line $USER1$/check_dummy $ARG1$ }
Fluentd configuration on “web” server
This setting utilizes fluent-plugin-datacounter, fluent-plugin-record-reformer, and of course fluent-plugin-nsca. So, first of all, install the gems of those plugins.
Next, add these lines to the Fluentd configuration file.
# Parse Apache access log <source> type tail tag access format apache2 # The paths vary by setup path /var/log/httpd/access_log pos_file /var/log/fluentd/httpd-access_log.pos </source> # Count 5xx errors per minute <match access> type datacounter tag count.access unit minute aggregate all count_key code pattern1 error ^5\d\d$ </match> # Calculate the severity level <match count.access> type record_reformer tag server_errors enable_ruby true <record> level ${error_count < 5 ? 'OK' : error_count < 50 ? 'WARNING' : 'CRITICAL'} </record> </match> # Send checks to NSCA <match server_errors> type nsca server 192.168.42.210 port 5667 password peng! host_name web service_description server_errors return_code_field level </match>
The next figure shows the data flow.
You can use record_transformer filter instead of fluent-plugin-record-reformer on Fluentd 0.12.0 and above.
If you are concerned with scalability, fluent-plugin-norikra may be a better option than datacounter and record_reformer.
Contirubting
Please submit an issue or a pull request on the GitHub repository.
Feed back to @miyakawa_taku on Twitter is also welcome.