After yesterday’s bauble with geoiplookup I was thinking about a more appropriate use of looking up the location of an IP. So what’s better than visualizing the areas of the world where most spam comes from, I guessed.
Therefore I took a look at the IX’s blacklist project NiX Spam and downloaded a snapshot of the blacklist’s content (save, right click as… you’ve been warned!) which includes a maximum of 40.000 IP addresses. The plain text file comes with a timestamp and the blacklisted IP addresses.
After some more or less simple batch-processing I’ve had a file including all the relevant GeoIP information, the longitude and latitude values of each IP. What was missing was a method to display them in a graphical way, e.g. on a world map.
Couldn’t be Google Maps a solution with its well documented and comprehensive API? So I digged into the Google Maps API and after an hour I was able to change the batch-processing in a way to create the proper output for creating markers on a map, including the well-know tooltip balloons including IP address and timestamp after clicking on a marker.
First tests with a few dozens data sets went just fine, but in this very special moment right before crossing the finish line, my browser decided to stop working for a couple of minutes.
It turned out that 40.000 markers, combined with a tooltip, was way too much data to be handled by the browser. Navigating or zooming the map just wasn’t possible. Even after I’ve reduced the amount of data sets to 10.000 it was slow as hell.
Besides the performance issues it turned out that the visualization is almost useless because of the many overlapping markers as you can see on the left. Spammer mostly use botnets, thousands of Zombie PCs, distributed all over the world.
Sure, it is possible to see areas with a much higher density than other regions, but these are also the regions where internet is accessible from almost everywhere. It’s easily explainable that there are less Spammers in the australian outback or South America’s jungle.
But each drawback has also its advantages. Besides the fact that I gained more knowledge of vim, sed and awk once more, it won’t be too hard to complete this.
With a slightly different approach, e.g. abandoning the longitude/latitude data and prefering the city’s name, it’d be possible to present shrinking or growing markers based on the amount of machines in a specific city. This would drastically reduce the amount of markers and is not that hard to implement.
Add your comment below, or trackback from your own site.
Subscribe to these comments.
Be nice. Keep it clean. Stay on topic. No spam.
You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">