Matejunkie

Next stop: Hamburg, Germany

For the record:
I just quit at nugg.ad after 2 1/2 years full of hard, honest and mind-opening work for which I’m really thankful. Without nugg.ad I don’t think I wouldn’t have had such a good start. They’re all good people there and work like hell to make targeting successful without violating everyone’s privacy.
Specializing on a high availability system with over 12 billion PIs per months with a ruby (and several rails) backend(s) is still mind-blowing and gave me a lot of knowledge – something I should(‘ve) mention here more often in detailed articles though.

I’ll relocate to Luebeck, Germany this weekend. After a few weeks off I’ll start at my new employer’s place on 1st February as a Rails/Linux system administrator right in the middle of Hamburg, Germany at the famous Gaensemarkt.
Although – subscribers might know – I’ll visit Berlin regularly twice a month to bring the Monkey Island Tattoo project to an end before summer (and other Tattoo projects as well). The left leg is almost finished and work on my right arm has just begun so stay tuned.

Share and Enjoy:
  • del.icio.us
  • Digg
  • Slashdot
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Reddit
  • Yigg
  • Netvibes
  • MisterWong
  • Facebook
  • HackerNews
  • Identi.ca
  • FriendFeed
  • NewsVine

Binary Talks

Google Wave invitations

I still have 7 Google Wave invitations which I don’t really need. If you’re interested in trying out Google Wave leave a comment or write an E-Mail to mike@matejunkie.com. Please make sure to provide a valid E-Mail address. Otherwise the invitation won’t reach you.

Update: One invitation left.

Next Update: Again 10 invitations for your delight! ;-)

Share and Enjoy:
  • del.icio.us
  • Digg
  • Slashdot
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Reddit
  • Yigg
  • Netvibes
  • MisterWong
  • Facebook
  • HackerNews
  • Identi.ca
  • FriendFeed
  • NewsVine

Binary Talks

How to find a proper monitoring solution

As in our own environment at nugg.ad every startup/HA environment needs a proper monitoring solution to fit the minimum requirements to fullfill the high availability demands of your customers. Without a proper solution you can trust on you’re screwed. It’s simple as that.
You just need to know first whether your application servers are melting, your database/nosql backend begins to burst or someone accidentally the whole thing (you know the meme, do you?). Not only to inform customers before they call you, but also to be able to plan further growth of your environment.

Since I’m currently rewriting our whole monitoring environment based on 3 1/2 years experience with Nagios and its competitors I thought to share that knowledge with you. Just in case someone has a use for this.

Choosing a monitoring solution

First rule: Stick with the monitoring solution you’re at least a bit experienced with and which fits to your environment. In huge environments Zabbix is capable of doing the monitoring work for you and so is Zenoss or Nagios as well. In smaller environments MRTG or Munin might also do the job.
All big Open Source monitoring solutions are highly customizable and extendable. You just need to know how to find the right plugins or to ask properly within the community how something can be achieved.
If you’re not experienced with Open Source monitoring solutions at all, get a first look on the feature set of various solutions at Wikipedia. Choose wisely afterwards and most important: Stick to that solution for quite a while to explore its advantages and to get better at anger management when facing its disadvantages as well. Sooner or later you’ll get the big picture.

It’s not that important which software you choose. It’s more important what you make out of it for your environment!

The user’s demands

What I’ve expierienced within the last years is that the demands are quite comprehensive:

  1. The operations team needs to get informed almost instantly in case of a real emergency via various contact possibilities
    … and it’ll let you stay longer at the office to fix this when they get woken up each night due to false alarms
  2. The CTO needs proper escalation methods to get informed when something’s broken and not taken care of
  3. The executives and its board need a nice and shiny visualization of the platform to present the company’s growth and its state
  4. The consulting or support team needs a simple read only web-interface to get a proper impression whether everything’s allright in case of a unexpected customer call.

All of these demands have something in common: The basis of all operations is trust.

The operations team has to trust your monitoring solution to fix problems in a fast, but advised way instead of ignoring problems after the fourth false alarm during a week. The CTO needs to know that your escalation strategies are working and that you don’t screw him. The executives need proper graphs without spikes or even downtimes which they’re not able to explain within meetings and the consulting or support team’s need is to get an overview of an almost real time state of your environment without getting confused.

My monitoring environment

I’m currently using Nagios as the basic monitoring solution for our environment with several plugins attached to it.

To reuse all the data provided by Nagios I let it write its information to a MySQL database via ndoutils. This enables you to use almost any software which understands the ndoutils database layout, e.g. nice and shiny web-interfaces or visualization tools like nagvis.
For the graphing I’m currently using two solutions. pnp4nagios 0.6x with its highly recommended NPCD daemon acts as the basis for proper graphing of various system information. Since the pnp4nagios web-interface is only recommended for Unix/Linux system administrators I’m reusing the rrd databases within Cacti to provide a better overview of the whole platform. Cacti again is mostly being used for SNMP based checks which needs no alerting.

You’ll now see the difference: System metrics which needs a proper alerting are handled by Nagios itself and metrics that only needs to be monitored for its statistics are realized via Cacti (and therefore mostly SNMP). This eases your configuration work, keeps system resources in balance and avoids misunderstandings within the team that needs to take care of your environment while enjoying your Martini on the Keys.

To enable the consultants to get a good overview of the platform’s health, I’m using two different tools. At first NagVis provides a graphic overview of the system’s health without providing too detailed data. A traffic light based graphic with three states (green, yellow and red) might be too few information, but a Google Maps based view of your various datacenter locations with green, yellow or red icons will do the job. The supporters then are able to explain to the customer that datacenter 123 is down due to a failure within the system which is enough in most cases. All other cases with demand of a clearer view will be redirected to you, no worries. They know their job.
For the consultants which are more experienced with the system itself I provide a nice and shiny Nagios interface within Cacti itself. They’re able then to access Cacti’s performance graphs and the metrics provided by Nagios as well.

Conclusion

Maintaining an Open Source monitoring solution means a lot of work and you have to grapple with your favorite monitoring solution for quite a while before achieving your goals. But if you do, you’re the peacekeeper that lets your colleagues sleep during the night, the magician that casts nice and shiny graphs instantly and the master who has a global overview of your platform.
I hope, this gave you a basic view on how to find the right monitoring solution for you without going into the technical details. If you’ve got any questions let me know.

http://www.youtube.com/watch?v=JmS0Kjxs2v4
Share and Enjoy:
  • del.icio.us
  • Digg
  • Slashdot
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Reddit
  • Yigg
  • Netvibes
  • MisterWong
  • Facebook
  • HackerNews
  • Identi.ca
  • FriendFeed
  • NewsVine

Binary Talks

SNMP LTM 9.x MIB Navigation

I recently got a F5 BigIP 3600 to put my hands on and as we all know, the MIB tree can be a bit tricky to navigate through. For the LTM 9.x release there’s a really great spreadsheet available at F5’s DevCentral which contains the most useful OIDs to monitor the many different aspects of the F5 BigIP series. It’s even useful if you’re already running LTM 10.x since the changes seems to be not that major.

Small example:

explanation		oid (numeric)				oid (alphanumeric)																data type	sample value
global ClientPktsIn	.1.3.6.1.4.1.3375.2.1.1.2.1.2.0		.iso.org.dod.internet.private.enterprises.f5.bigipTrafficMgmt.bigipSystem.sysGlobals.sysGlobalStats.sysGlobalStat.sysStatClientPktsIn.0		Counter64	220788
global ClientBytesIn	.1.3.6.1.4.1.3375.2.1.1.2.1.3.0		.iso.org.dod.internet.private.enterprises.f5.bigipTrafficMgmt.bigipSystem.sysGlobals.sysGlobalStats.sysGlobalStat.sysStatClientBytesIn.0	Counter64	23139087
global ClientPktsOut	.1.3.6.1.4.1.3375.2.1.1.2.1.4.0		.iso.org.dod.internet.private.enterprises.f5.bigipTrafficMgmt.bigipSystem.sysGlobals.sysGlobalStats.sysGlobalStat.sysStatClientPktsOut.0	Counter64	239780
global ClientBytesOut	.1.3.6.1.4.1.3375.2.1.1.2.1.5.0		.iso.org.dod.internet.private.enterprises.f5.bigipTrafficMgmt.bigipSystem.sysGlobals.sysGlobalStats.sysGlobalStat.sysStatClientBytesOut.0	Counter64	84615725
Share and Enjoy:
  • del.icio.us
  • Digg
  • Slashdot
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Reddit
  • Yigg
  • Netvibes
  • MisterWong
  • Facebook
  • HackerNews
  • Identi.ca
  • FriendFeed
  • NewsVine

Binary Talks

Database fix for Nagios plugin for Cacti

If you’re interested in running Nagios within the Cacti interface you may be interested in the Nagios plugin for Cacti (NPC) project which does a good job overall in combining these two powerful tools within the monitoring/graphing world. Nevertheless, I had a database related problem during the initial setup due to the fact that NPC brings along its own schema of the ndo database which is outdated since the last release of ndoutils 1.4b8 (they added a new column called long_output).

Therefore, you want do the changes below via your mysql console to the following tables if you’d like to use the latest versions of ndoutils (1.4b8) with npc (2.0.4):

use databasename ;
alter table npc_servicechecks add column long_output VARCHAR(8192) NOT NULL default '' AFTER output;
alter table npc_servicestatus add column long_output VARCHAR(8192) NOT NULL default '' AFTER output;
alter table npc_systemcommands add column long_output VARCHAR(8192) NOT NULL default '' AFTER output;
alter table npc_statehistory add column long_output VARCHAR(8192) NOT NULL default '' AFTER output;

Afterwards NPC should work like a charm.

Share and Enjoy:
  • del.icio.us
  • Digg
  • Slashdot
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Reddit
  • Yigg
  • Netvibes
  • MisterWong
  • Facebook
  • HackerNews
  • Identi.ca
  • FriendFeed
  • NewsVine

Binary Talks

PNP template for check_apache2.sh

check_apache2 PNP TemplateThis is a PNP template which works quite well for me when there’s need to identify possible bottlenecks or unexpected behavior of the Apache webserver. It shows the CPU usage, req/sec, byte/req and the amount of busy/idle workers in a single graph. Feel free to copy the template from below or head over to Nagios Exchange.

#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

#   PNP Template for check_apache2.sh
#   Author: Mike Adolphs (http://www.matejunkie.com/)

$opt[1] = "--vertical-label \"Various stats \" -l 0 -r --title \"Apache metrics for $hostname / $servicedesc\" ";
$def[1]  = "DEF:cpu=$rrdfile:$DS[1]:AVERAGE " ;
$def[1] .= "DEF:reqpsec=$rrdfile:$DS[2]:AVERAGE " ;
$def[1] .= "DEF:bytepreq=$rrdfile:$DS[4]:AVERAGE " ;
$def[1] .= "DEF:wbusy=$rrdfile:$DS[5]:AVERAGE " ;
$def[1] .= "DEF:widle=$rrdfile:$DS[6]:AVERAGE " ;
 
$opt[2] .= "--vertical-label \"Byte/sec \" -l 0 -r --title \"Apache metrics for $hostname / $servicedesc\" ";
$def[2] .= "DEF:bytepsec=$rrdfile:$DS[3]:AVERAGE " ;
 
$def[1] .= "COMMENT:\"\\t\\t\\t\\tLAST\\t\\tAVERAGE\\t\\tMAX\\n\" " ;
 
$def[1] .= "LINE2:cpu#008000:\"CPU usage [%]\\t \" " ;
$def[1] .= "GPRINT:cpu:LAST:\"%6.0lf\\t\" " ;
$def[1] .= "GPRINT:cpu:AVERAGE:\" %6.0lf\\t\" " ;
$def[1] .= "GPRINT:cpu:MAX:\" %6.0lf\\n\" " ;
 
$def[1] .= "LINE2:reqpsec#0C64E8:\"Requests/sec\\t \" " ;
$def[1] .= "GPRINT:reqpsec:LAST:\"%6.0lf\\t\" " ;
$def[1] .= "GPRINT:reqpsec:AVERAGE:\" %6.0lf\\t\" " ;
$def[1] .= "GPRINT:reqpsec:MAX:\" %6.0lf\\n\" " ;
 
$def[1] .= "LINE2:bytepreq#FFA500:\"Byte/req\\t\\t \" " ;
$def[1] .= "GPRINT:bytepreq:LAST:\"%6.0lf\\t\" " ;
$def[1] .= "GPRINT:bytepreq:AVERAGE:\" %6.0lf\\t\" " ;
$def[1] .= "GPRINT:bytepreq:MAX:\" %6.0lf\\n\" " ;
 
$def[1] .= "LINE2:wbusy#1CC8E8:\"Busy workers\\t \" " ;
$def[1] .= "GPRINT:wbusy:LAST:\"%6.0lf\\t\" " ;
$def[1] .= "GPRINT:wbusy:AVERAGE:\" %6.0lf\\t\" " ;
$def[1] .= "GPRINT:wbusy:MAX:\" %6.0lf\\n\" " ;
 
$def[1] .= "LINE2:widle#E80C8C:\"Idle workers\\t \" " ;
$def[1] .= "GPRINT:widle:LAST:\"%6.0lf\\t\" " ;
$def[1] .= "GPRINT:widle:AVERAGE:\" %6.0lf\\t\" " ;
$def[1] .= "GPRINT:widle:MAX:\" %6.0lf\\n\" " ;
 
$def[2] .= "COMMENT:\"\\t\\t\\t\\tLAST\\t\\tAVERAGE\\t\\tMAX\\n\" " ;
 
$def[2] .= "LINE2:bytepsec#E80C3E:\"Byte/sec\\t\\t \" " ;
$def[2] .= "GPRINT:bytepsec:LAST:\"%6.0lf\\t\" " ;
$def[2] .= "GPRINT:bytepsec:AVERAGE:\" %6.0lf\\t\" " ;
$def[2] .= "GPRINT:bytepsec:MAX:\" %6.0lf\\n\" " ;
Share and Enjoy:
  • del.icio.us
  • Digg
  • Slashdot
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Reddit
  • Yigg
  • Netvibes
  • MisterWong
  • Facebook
  • HackerNews
  • Identi.ca
  • FriendFeed
  • NewsVine

Binary Talks

Update 1.3: Apache check plugin for Nagios

Long time no see. Sorry for the long downtime, but I’ve been on vacation for quite some time and spent my time out of reach of any electronical device except my phone. Anyway, strength and motivation for work during spare time is back again! Expect a few updates of various plugins I’ve written in the past within the next weeks.

For the beginning here’s the revised version of the Apache check plugin for Nagios. It now doesn’t write temporary files anymore, the code is cleaner and it should be a bit faster. Note that I haven’t done much testing for now (will be done tomorrow and on Wednesday). Therefore it might won’t work out of the box for you. At least in my environments it works well. Let me know if you have any problems or if you stumble upon something which is completely nonsense.

Copy’n'paste below or svn co via svn.matejunkie.com/nagios-plugins/trunk/check_apache/.

#!/bin/sh
 
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 
PROGNAME=`basename $0`
VERSION="Version 1.3,"
AUTHOR="2009, Mike Adolphs (http://www.matejunkie.com/)"
 
print_version() {
    echo "$VERSION $AUTHOR"
}
 
print_help() {
    print_version $PROGNAME $VERSION
    echo ""
    echo "Description:"
    echo "$PROGNAME is a Nagios plugin to check the Apache's server status."
    echo "It monitors requests per second, bytes per second/request, "
    echo "amount of busy/idle workers and its CPU load."
    echo ""
    echo "Example call:"
    echo "./$PROGNAME -H localhost -P 80 -t 3 -b /usr/sbin -p /var/run \\"
    echo "-n apache2.pid -s status_page [-S] [-R] [-wr] 100 [-cr] 250"
    echo ""
    echo "Options:"
    echo "  -H|--hostname)"
    echo "    Sets the hostname. Default is: localhost"
    echo "  -P|--port)"
    echo "    Sets the port. Default is: 80"
    echo "  -t|--timeout)"
    echo "    Sets a timeout within the server's status page must've been"
    echo "    accessed. Otherwise the check will go into an error state."
    echo "    Default is: 3"
    echo "  -b|--binary-path)"
    echo "    Sets the path to the apache binary. Used for getting Apache's"
    echo "    CPU load. Default is: /usr/sbin"
    echo "  -p|--pid-path)"
    echo "    Path to Apache's pid file. Default is: /var/run"
    echo "  -n|--pid-name)"
    echo "    Name of Apache's pid file. Default is: apache2.pid"
    echo "  -s|--status-page)"
    echo "    Defines the name of the status page. Default is: server-status"
    echo "  -R|--remote-server)"
    echo "    Disabled the pid check so that remote Apaches can be queried."
    echo "    Default is: off"
    echo "  -S|--secure)"
    echo "    Enables HTTPS (no certificate check though). Default is: off"
    echo "  -wr|--warning-req)"
    echo "    Sets a warning level for requests per second. Default is: off"
    echo "  -cr|--critical-req)"
    echo "    Sets a critical level for requests per second. Default is: off"
    exit $ST_UK
}
 
ST_OK=0
ST_WR=1
ST_CR=2
ST_UK=3
 
hostname="localhost"
port=80
remote_srv=0
path_binary="/usr/sbin"
path_pid="/var/run"
name_pid="apache2.pid"
status_page="server-status"
timeout=3
secure=0
running=0
 
wcdiff_req=0
wclvls_req=0
 
while test -n "$1"; do
    case "$1" in
        --help|-h)
            print_help
            exit $ST_UK
            ;;
        --version|-v)
            print_version $PROGNAME $VERSION
            exit $ST_UK
            ;;
        --hostname|-H)
            hostname=$2
            shift
            ;;
        --port|-P)
            port=$2
            shift
            ;;
        --timeout|-t)
            timeout=$2
            shift
            ;;
        --remote-server|-R)
            remote_srv=1
            ;;
        --binary_path|-b)
            path_binary=$2
            shift
            ;;
        --pid_path|-p)
            path_pid=$2
            shift
            ;;
        --pid_name|-n)
            name_pid=$2
            shift
            ;;
        --status-page|-s)
            status_page=$2
            shift
            ;;
        --secure|-S)
            secure=1
            ;;
        --warning-req|-wr)
            warn_req=$2
            shift
            ;;
        --critical-req|-cr)
            crit_req=$2
            shift
            ;;
        *)
            echo "Unknown argument: $1"
            print_help
            exit $ST_UK
            ;;
    esac
    shift
done
 
# check functions
val_wcdiff_req() {
    if [ ! -z "$warn_req" -a ! -z "$crit_req" ]
    then
        wclvls_req=1
 
        if [ ${warn_req} -gt ${crit_req} ]
        then
            wcdiff_req=1
        fi
    elif [ ! -z "$warn_req" -a -z "$crit_req" ]
    then
        wcdiff_req=2
    elif [ -z "$warn_req" -a ! -z "$crit_req" ]
    then
        wcdiff_req=3
    fi
}
 
check_pid() {
    if [ -f "$path_pid/$name_pid" ]
    then
        retval=0
    else
        retval=1
    fi
}
 
check_processes() {
    if [ $1 -lt 1 ]
    then
        echo "UNKNOWN - Your Apache server seems not to run. Is your Nagios \
privileged to run 'ps ax' and is the Apache2 binary really located in \
$path_binary?"
        exit $ST_UK
    fi
}
 
check_output() {
    stat_output=`stat -c %s ${output_dir}/server-status`
    if [ "$stat_output" = 0 ]
    then
        echo "UNKNOWN - Local copy of server-status is empty. Are we \
allowed to access http://${hostname}:${port}/server-status?"
        exit $ST_UK
    fi
}
 
# get functions
get_status() {
    if [ "$secure" = 1 ]
    then
        server_status1=`wget -qO- --no-check-certificate -t 3 \
-T ${timeout} https://${hostname}:${port}/${status_page}?auto`
    sleep 1
        server_status2=`wget -qO- --no-check-certificate -t 3 \
-T ${timeout} https://${hostname}:${port}/${status_page}?auto`
    else
        server_status1=`wget -qO- -t 3 -T ${timeout} \
http://${hostname}:${port}/${status_page}?auto`
        sleep 1
        server_status2=`wget -qO- -t 3 -T ${timeout} \
http://${hostname}:${port}/${status_page}?auto`
    fi
}
 
get_vals() {
    cpu_load="$(cpu_load=0; ps -Ao pcpu,args | grep "$path_binary/apache2" \
| awk '{print $1}' | while read line
    do
        cpu_load=`echo "scale=3; $cpu_load + $line" | bc -l`
    echo $cpu_load
    done)"
    cpu_load=`echo $cpu_load | awk '{print $NF}' | sed 's/^\./0./'`
 
    tmp1_req_psec=`echo ${server_status1} | awk '{print $3}'`
    tmp2_req_psec=`echo ${server_status2} | awk '{print $3}'`
    req_psec=`echo "scale=2; ${tmp2_req_psec} - ${tmp1_req_psec}" | bc -l \
| sed 's/^\./0./'`
 
    bytes_psec=`echo ${server_status1} | awk '{print $14}' | sed 's/^\./0./'`
    bytes_preq=`echo ${server_status1} | awk '{print $16}' | sed 's/^\./0./'`
    wkrs_busy=`echo ${server_status1} | awk '{print $18}' | sed 's/^\./0./'`
    wkrs_idle=`echo ${server_status1} | awk '{print $20}' | sed 's/^\./0./'`
}
 
do_output() {
    output="Apache serves $req_psec Requests per second with an average CPU \
utilization of $cpu_load%. Busy workers: $wkrs_busy, idle: $wkrs_idle"
}
 
do_perfdata() {
    perfdata="'cpu_load'=$cpu_load 'req_psec'=$req_psec \
'bytes_psec'=$bytes_psec 'bytes_preq'=$bytes_preq 'workers_busy'=$wkrs_busy \
'workers_idle'=$wkrs_idle"
}
 
# Let's go
val_wcdiff_req
 
if [ "$wcdiff_req" = 1 ]
then
    echo "Please adjust your warning/critical thresholds. The warning must \
be lower than the critical level!"
    exit $ST_UK
elif [ "$wcdiff_req" = 2 ]
then
    echo "Please also set a critical value when you want to use \
warning/critical thresholds!"
    exit $ST_UK
elif [ "$wcdiff_req" = 3 ]
then
    echo "Please also set a warning value when you want to use \
warning/critical thresholds!"
    exit $ST_UK
else
    if [ "$remote_srv" = 0 ]
    then
        running=`check_pid`
        check_pid $running
    fi
 
    get_status
    get_vals
 
    do_output
    do_perfdata
 
    if [ ${wclvls_req} = 1 ]
    then
        if [ ${req_psec} -ge ${warn_req} -a ${req_psec} -lt ${crit_req} ]
        then
            echo "WARNING - ${output} | ${perfdata}"
            exit $ST_WR
        elif [ ${req_psec} -ge ${crit_req} ]
        then
            echo "CRITICAL - ${output} | ${perfdata}"
        exit $ST_CR
        else
            echo "OK - ${output} | ${perfdata}"
            exit $ST_OK
        fi
    else
        echo "OK - ${output} | ${perfdata}"
        exit $ST_OK
    fi
fi
Share and Enjoy:
  • del.icio.us
  • Digg
  • Slashdot
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Reddit
  • Yigg
  • Netvibes
  • MisterWong
  • Facebook
  • HackerNews
  • Identi.ca
  • FriendFeed
  • NewsVine

Matejunkie

Becoming a bone marrow donor

Something completely different this time, but since it’s quite important I thought to let you know that I decided to become a bone marrow donor last week because of a guy within my circle of acquaintances who recently got ill with leukemia. Although the chances are minimal to find a potential donor that actually fits it’s at least a chance for him and for others as well.

Becoming a bone marrow donor is simple. All required information can be found on the DKMS’s website. Once you’ve signed up you’ll receive a package including two Q-tips with which you have to harvest cells at the inner side of your cheek which is mailed back afterwards to do the typecast. Then the information is being anonymized (Donor ID plus the characteristics of the biological tissue) and stored in the central database of the ZKRD (Zentrales Knochenmarkspender-Register Deutschland). From then the information can be queried to help affected people worldwide.

Be aware of the fact that the cost for becoming a bone marrow donor this way is about 50 Euros in Germany since the DKMS can’t afford to pay all typecasts by itself, but you may also give a blood donation which makes it completely free of charge. Therefore thanks to nugg.ad and especially Stephan Noller who enabled me to do this.

Share and Enjoy:
  • del.icio.us
  • Digg
  • Slashdot
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Reddit
  • Yigg
  • Netvibes
  • MisterWong
  • Facebook
  • HackerNews
  • Identi.ca
  • FriendFeed
  • NewsVine

Matejunkie

Urban Shooting

Every now and then it’s time to break out of the so called nerdiness which almost every keen technician in the information technology experiences. That’s why we’ve did a few shots in the middle of Berlin several weeks ago with David Noelte, a brilliant photographer.Urban Shooting

The other guy you’ll see within the album is Dirk, my roomie, who’s a famous Barkeeper here in Berlin, Germany. Feel free to browse the photo album and visit David Noelte’s portfolio! His work highly deserves it!

Share and Enjoy:
  • del.icio.us
  • Digg
  • Slashdot
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Reddit
  • Yigg
  • Netvibes
  • MisterWong
  • Facebook
  • HackerNews
  • Identi.ca
  • FriendFeed
  • NewsVine

Binary Talks

How to kick off Hadoop’s rack awareness

Hadoop, an Open Source framework for reliable, scalable, distributed computing and data storage, has a nice feature called rack awareness. This means nothing more than that you’re able to widely spread your Hadoop cluster over multiple machines within different racks and even different data centers that are worlds apart from each other. Sadly this isn’t well documented as almost anything regarding Hadoop since it’s under heavy development and because of the few people that are actually working with Hadoop compared to other huge Open Source projects.

Hadoop - Rack Awareness

Anyway, kicking off Hadoop’s rack awareness is no big deal in general. Here’s how to achieve this goal:

Put a small script in whatever language you prefer to a location of your choice which is accessible by the local Hadoop user on the namenode. The only requirement is that the script is able to print a record to stdout. In this example I’m using a small Python script written by Vadim Zaliva stored in the Hadoop user’s home directory under /home/hadoop:

#!/usr/bin/env python
 
'''
This script used by hadoop to determine network/rack topology.  It
should be specified in hadoop-site.xml via topology.script.file.name
Property.
 
 topology.script.file.name
 /home/hadoop/topology.py
 
'''
 
import sys
from string import join
 
DEFAULT_RACK = '/default/rack0';
 
RACK_MAP = { '10.72.10.1' : '/datacenter0/rack0',
 
             '10.112.110.26' : '/datacenter1/rack0',
             '10.112.110.27' : '/datacenter1/rack0',
             '10.112.110.28' : '/datacenter1/rack0',
 
             '10.2.5.1' : '/datacenter2/rack0',
             '10.2.10.1' : '/datacenter2/rack1'
    }
 
if len(sys.argv)==1:
    print DEFAULT_RACK
else:
    print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")

Then you need to add a property directive to the hadoop-site.xml you’re using for your cluster’s configuration (delete all leading underscores in the tags, this is just for display purposes):

<property>
 <name>topology.script.file.name</name>
 <value>/home/hadoop/topology.py</value>
</property>

Simply restart the namenode’s process and from now on the Namenode runs the script and looks for a record regarding the datanode everytime a new datanode tries to participate in the cluster.
Keep in mind that taking care of connections between multiple locations via VPN or else and proper DNS resolution is your business and not Hadoop’s. Make sure that resolving the datanode’s DNS record is possible and that it’s accessible within your Hadoop environment.

Share and Enjoy:
  • del.icio.us
  • Digg
  • Slashdot
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Reddit
  • Yigg
  • Netvibes
  • MisterWong
  • Facebook
  • HackerNews
  • Identi.ca
  • FriendFeed
  • NewsVine