I’ve just finished version 1.0 of a CPU check plugin which is sh-compliant (as always) and calculates utilization through Jiffies rather than using another third party tool like iostat or top. This means it should work on all Linux machines right out of the box without the need to install additional software.
Frankly speaking the performance of the script is slightly worse than the performance of most of the other CPU check plugins due to the sacrifice of parsing iostat or top output, but I nevertheless wanted to do this since calculating CPU utilization through Jiffies instead of parsing another tool’s output is somewhat different and interesting. I didn’t take a look into iostat’s or top’s source code yet, but I guess these tools doesn’t do it in another way than parsing /proc/stat as well. The difference in performance isn’t that huge anyway, a double-digit amount of milliseconds, but not more.
For getting along with the Jiffies, I crawled through the web, especially Paul Colby’s article about calculating CPU utilization through /proc/stat helped a lot!
The script
As always, the script is downloadable from NagiosExchange, via svn directly from here or the copy’n'paste way below. A proper PNP template is included as well.
user@host ~ $ svn co https://svn.matejunkie.com/svn/nagios-plugins/stable/check_cpu/ check_cpu/
#!/bin/sh # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA PROGNAME=`basename $0` VERSION="Version 1.0," AUTHOR="2009, Mike Adolphs (http://www.matejunkie.com/)" ST_OK=0 ST_WR=1 ST_CR=2 ST_UK=3 interval=1 print_version() { echo "$VERSION $AUTHOR" } print_help() { print_version $PROGNAME $VERSION echo "" echo "$PROGNAME is a Nagios plugin to monitor CPU utilization. It makes" echo "use of /proc/stat and calculates it through Jiffies rather than" echo "using another frontend tool like iostat or top." echo "When using optional warning/critical thresholds all values except" echo "idle are aggregated and compared to the thresholds. There's" echo "currently no support for warning/critical thresholds for specific" echo "usage parameters." echo "" echo "$PROGNAME [-i/--interval] [-w/--warning] [-c/--critical]" echo "" echo "Options:" echo " --interval|-i)" echo " Defines the pause between the two times /proc/stat is being" echo " parsed. Higher values could lead to more accurate result." echo " Default is: 1 second" echo " --warning|-w)" echo " Sets a warning level for CPU user. Default is: off" echo " --critical|-c)" echo " Sets a critical level for CPU user. Default is: off" exit $ST_UK } while test -n "$1"; do case "$1" in --help|-h) print_help exit $ST_UK ;; --version|-v) print_version $PROGNAME $VERSION exit $ST_UK ;; --interval|-i) interval=$2 shift ;; --warning|-w) warn=$2 shift ;; --critical|-c) crit=$2 shift ;; *) echo "Unknown argument: $1" print_help exit $ST_UK ;; esac shift done val_wcdiff() { if [ ${warn} -gt ${crit} ] then wcdiff=1 fi } get_cpuvals() { tmp1_cpu_user=`grep -m1 '^cpu' /proc/stat|awk '{print $2}'` tmp1_cpu_nice=`grep -m1 '^cpu' /proc/stat|awk '{print $3}'` tmp1_cpu_sys=`grep -m1 '^cpu' /proc/stat|awk '{print $4}'` tmp1_cpu_idle=`grep -m1 '^cpu' /proc/stat|awk '{print $5}'` tmp1_cpu_iowait=`grep -m1 '^cpu' /proc/stat|awk '{print $6}'` tmp1_cpu_irq=`grep -m1 '^cpu' /proc/stat|awk '{print $7}'` tmp1_cpu_softirq=`grep -m1 '^cpu' /proc/stat|awk '{print $8}'` tmp1_cpu_total=`expr $tmp1_cpu_user + $tmp1_cpu_nice + $tmp1_cpu_sys + \ $tmp1_cpu_idle + $tmp1_cpu_iowait + $tmp1_cpu_irq + $tmp1_cpu_softirq` sleep $interval tmp2_cpu_user=`grep -m1 '^cpu' /proc/stat|awk '{print $2}'` tmp2_cpu_nice=`grep -m1 '^cpu' /proc/stat|awk '{print $3}'` tmp2_cpu_sys=`grep -m1 '^cpu' /proc/stat|awk '{print $4}'` tmp2_cpu_idle=`grep -m1 '^cpu' /proc/stat|awk '{print $5}'` tmp2_cpu_iowait=`grep -m1 '^cpu' /proc/stat|awk '{print $6}'` tmp2_cpu_irq=`grep -m1 '^cpu' /proc/stat|awk '{print $7}'` tmp2_cpu_softirq=`grep -m1 '^cpu' /proc/stat|awk '{print $8}'` tmp2_cpu_total=`expr $tmp2_cpu_user + $tmp2_cpu_nice + $tmp2_cpu_sys + \ $tmp2_cpu_idle + $tmp2_cpu_iowait + $tmp2_cpu_irq + $tmp2_cpu_softirq` diff_cpu_user=`echo "${tmp2_cpu_user} - ${tmp1_cpu_user}" | bc -l` diff_cpu_nice=`echo "${tmp2_cpu_nice} - ${tmp1_cpu_nice}" | bc -l` diff_cpu_sys=`echo "${tmp2_cpu_sys} - ${tmp1_cpu_sys}" | bc -l` diff_cpu_idle=`echo "${tmp2_cpu_idle} - ${tmp1_cpu_idle}" | bc -l` diff_cpu_iowait=`echo "${tmp2_cpu_iowait} - ${tmp1_cpu_iowait}" | bc -l` diff_cpu_irq=`echo "${tmp2_cpu_irq} - ${tmp1_cpu_irq}" | bc -l` diff_cpu_softirq=`echo "${tmp2_cpu_softirq} - ${tmp1_cpu_softirq}" \ | bc -l` diff_cpu_total=`echo "${tmp2_cpu_total} - ${tmp1_cpu_total}" | bc -l` cpu_user=`echo "scale=2; (1000*${diff_cpu_user}/${diff_cpu_total}+5)/10" \ | bc -l | sed 's/^\./0./'` cpu_nice=`echo "scale=2; (1000*${diff_cpu_nice}/${diff_cpu_total}+5)/10" \ | bc -l | sed 's/^\./0./'` cpu_sys=`echo "scale=2; (1000*${diff_cpu_sys}/${diff_cpu_total}+5)/10" \ | bc -l | sed 's/^\./0./'` cpu_idle=`echo "scale=2; (1000*${diff_cpu_idle}/${diff_cpu_total}+5)/10" \ | bc -l | sed 's/^\./0./'` cpu_iowait=`echo "scale=2; (1000*${diff_cpu_iowait}/${diff_cpu_total}+5)\\ /10" | bc -l | sed 's/^\./0./'` cpu_irq=`echo "scale=2; (1000*${diff_cpu_irq}/${diff_cpu_total}+5)/10" \ | bc -l | sed 's/^\./0./'` cpu_softirq=`echo "scale=2; (1000*${diff_cpu_softirq}/${diff_cpu_total}\\ +5)/10" | bc -l | sed 's/^\./0./'` cpu_total=`echo "scale=2; (1000*${diff_cpu_total}/${diff_cpu_total}+5)\\ /10" | bc -l | sed 's/^\./0./'` cpu_usage=`echo "(${cpu_user}+${cpu_nice}+${cpu_sys}+${cpu_iowait}+\\ ${cpu_irq}+${cpu_softirq})/1" | bc` } do_output() { output="user: ${cpu_user}, nice: ${cpu_nice}, sys: ${cpu_sys}, \ iowait: ${cpu_iowait}, irq: ${cpu_irq}, softirq: ${cpu_softirq} \ idle: ${cpu_idle}" } do_perfdata() { perfdata="'user'=${cpu_user} 'nice'=${cpu_nice} 'sys'=${cpu_sys} \ 'softirq'=${cpu_softirq} 'iowait'=${cpu_iowait} 'irq'=${cpu_irq} \ 'idle'=${cpu_idle}" } if [ -n "$warn" -a -n "$crit" ] then val_wcdiff if [ "$wcdiff" = 1 ] then echo "Please adjust your warning/critical thresholds. The warning\\ must be lower than the critical level!" exit $ST_UK fi fi get_cpuvals do_output do_perfdata if [ -n "$warn" -a -n "$crit" ] then if [ "$cpu_usage" -ge "$warn" -a "$cpu_usage" -lt "$crit" ] then echo "WARNING - ${output} | ${perfdata}" exit $ST_WR elif [ "$cpu_usage" -ge "$crit" ] then echo "CRITICAL - ${output} | ${perfdata}" exit $ST_CR else echo "OK - ${output} | ${perfdata}" exit $ST_OK fi else echo "OK - ${output} | ${perfdata}" exit $ST_OK fi
<?php # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA # PNP Template for check_cpu.sh # Author: Mike Adolphs (http://www.matejunkie.com/ $opt[1] = "--vertical-label \"CPU [%]\" -u 100 -l 0 -r --title \"CPU Usage for $hostname / $servicedesc\" "; $def[1] = "DEF:used=$rrdfile:$DS[1]:AVERAGE " ; $def[1] .= "DEF:nice=$rrdfile:$DS[2]:AVERAGE " ; $def[1] .= "DEF:sys=$rrdfile:$DS[3]:AVERAGE " ; $def[1] .= "DEF:iowait=$rrdfile:$DS[4]:AVERAGE " ; $def[1] .= "DEF:irq=$rrdfile:$DS[5]:AVERAGE " ; $def[1] .= "DEF:softirq=$rrdfile:$DS[6]:AVERAGE " ; $def[1] .= "DEF:idle=$rrdfile:$DS[7]:AVERAGE " ; $def[1] .= "COMMENT:\"\\t\\t\\tLAST\\t\\t\\tAVERAGE\\t\\t\\tMAX\\n\" " ; $def[1] .= "AREA:used#E80C3E:\"user\\t\":STACK " ; $def[1] .= "GPRINT:used:LAST:\"%6.2lf %%\\t\\t\" " ; $def[1] .= "GPRINT:used:AVERAGE:\"%6.2lf \\t\\t\" " ; $def[1] .= "GPRINT:used:MAX:\"%6.2lf \\n\" " ; $def[1] .= "AREA:nice#E8630C:\"nice\\t\":STACK " ; $def[1] .= "GPRINT:nice:LAST:\"%6.2lf %%\\t\\t\" " ; $def[1] .= "GPRINT:nice:AVERAGE:\"%6.2lf \\t\\t\" " ; $def[1] .= "GPRINT:nice:MAX:\"%6.2lf \\n\" " ; $def[1] .= "AREA:sys#008000:\"sys\\t\t\":STACK " ; $def[1] .= "GPRINT:sys:LAST:\"%6.2lf %%\\t\\t\" " ; $def[1] .= "GPRINT:sys:AVERAGE:\"%6.2lf \\t\\t\" " ; $def[1] .= "GPRINT:sys:MAX:\"%6.2lf \\n\" " ; $def[1] .= "AREA:iowait#0CE84D:\"iowait\\t\":STACK " ; $def[1] .= "GPRINT:iowait:LAST:\"%6.2lf %%\\t\\t\" " ; $def[1] .= "GPRINT:iowait:AVERAGE:\"%6.2lf \\t\\t\" " ; $def[1] .= "GPRINT:iowait:MAX:\"%6.2lf \\n\" " ; $def[1] .= "AREA:irq#3E00FF:\"irq\\t\t\":STACK " ; $def[1] .= "GPRINT:irq:LAST:\"%6.2lf %%\\t\\t\" " ; $def[1] .= "GPRINT:irq:AVERAGE:\"%6.2lf \\t\\t\" " ; $def[1] .= "GPRINT:irq:MAX:\"%6.2lf \\n\" " ; $def[1] .= "AREA:softirq#1CC8E8:\"softirq\\t\":STACK " ; $def[1] .= "GPRINT:softirq:LAST:\"%6.2lf %%\\t\\t\" " ; $def[1] .= "GPRINT:softirq:AVERAGE:\"%6.2lf \\t\\t\" " ; $def[1] .= "GPRINT:softirq:MAX:\"%6.2lf \\n\" " ; $def[1] .= "AREA:idle#FFFF00:\"idle\\t\":STACK " ; $def[1] .= "GPRINT:idle:LAST:\"%6.2lf %%\\t\\t\" " ; $def[1] .= "GPRINT:idle:AVERAGE:\"%6.2lf \\t\\t\" " ; $def[1] .= "GPRINT:idle:MAX:\"%6.2lf \\n\" " ; ?>
The License
As always this little script is ment to be sh-compliant and released under the terms of the GPL Version 2 only. Feel free to subscribe via rss to get updates on this one. More options will be added in the future.
I’ll probably add an option to make use of iostat or top and to calculate utilization per CPU soon so that you’re able to choose your own way.
Hey mike,
thx for this nice plugin. Performance and perfdata-output looks good. And on top of that you deliver a pnp template with it – great.
2 side notes anyway
Exit codes at version/help ar a bit mixed:
[--help] exit code set in subroutine, compared to [--version] and exit codes for both cases in “get-opts” should be ST_UK not STATE_UNKNOWN !?
Did you intend to let the plugin run if called with no arguments? I prefer checkin that:
if [ $# -eq 0 ]; then
echo -e "\nWarning: Not enough command line arguments.\n";
print_help
exit $ST_UK
Thanks dudley! Using $STATE_UNKNOWN rather than $ST_UK fault is one of my alltime favorite copy’n'paste mistakes, … I’ve just corrected it in all Nagios checks I’ve written so far so that this hopefully won’t happen again.
In general, I think good shell script design implies that a shell script runs without passing any arguments to it at all. For almost every option there’s a common value which is often used. It just makes it way more user-friendly since the average user might be able to use the script without further knowledge.
Although I must admit it also has its downsides, security for example. Forcing the user to provide an SNMP community string instead of setting the standard value to ‘public’ for example makes it necessary for the user to think about its environment a bit more than usual. It could lead to much safer environments when the average user is forced to spend time on verifying a script before using it in a productive environment, but I think this should be etched on every experienced system administrator’s memory anyway and average users are often overwhelmed by those principles in the first place. This is why I stick to the no-argument-required way as long as it’s possible.
Yeah copy ‘n paste, tell me about it ;-)
Interesting point of view about the no argument-thing, good explanation. Thanks for sharing your thoughts, i think i will definitely consider this in the future, probably not only for those type of scripts.
Hi Mike,
Just installed the check_cpu plugin which works fine, however, the check_cpu.php doesn’t display anything. I have put it in the /usr/local/nagios/share web folder but done nothing else.
Do I need to modify the file in some way to get it to work?
Thanks
Andrew
Hi
Thanks for your article.
Recently I have found munin that is installed on server and give many statistic about system,such as cpu usage and so on :)
munin can send own statistic to nagios server, this feature is very good for monitoring different servers:):)
I suggest to try munin.