obtaining the logs

the binary logs are available from http://people.openbeacon.org/meri/openbeacon/sputnik/data/23c3/. in the text below, i assume they are unzipped in a subdirectory named 'binlog'

the xml logs are available from http://www.openbeacon.org/dl/23C3/sputnik-observations.xml.bz2.

analyzing the binary logs

all binary logs contain 24 byte frames, log-2006-12-27-13 is corrupt: at offset 0xc4ff8 there are 8 bytes of a truncated frame, next full frame starts at 0xc5000

getting a plain hexdump

dump -x -w 24 binlog/log-2006-12-27-13 -l 0xc5000
dump -x -w 24 binlog/log-2006-12-27-13 -o 0xc5000

my analysis tool

i wrote a little tool to parse, analyze and convert the binary logs to textfiles suitable for processing with gnuplot.

compile rdlog.cpp with

g++ -O3 rdlog.cpp -o rdlog

general info about the logs

first get an idea of what type of data is in the logs
find binlog -type f | xargs ./rdlog -v -stats -bytes
this will output statistics per byte, and per field of each binary logfile. when you pipe the output through sort and uniq, like this:
find binlog -type f | xargs ./rdlog -v -stats -bytes | sort | uniq -c | sort -r | head -20
you will get a list of what fields are constant for all log frames.

at the bottom you will find under 'src ip' a list of used station ip addresses, save this list to 'stattionlist.txt' it will be used by the next steps to make sure every logfile uses the same translation of ip-address to station-id.

generating a plot of the plain logs

first create text version of binary logs:
mkdir txt
find binlog -type f | xargs ./rdlog -stations stationlist.txt -full txt 
then use gengp.pl to generate a gnuplot script:
perl gengp.pl txt pngfull > txt2png.gp
gnuplot tst2png.gp
now have a look at pngfull/lo/

note that some graphs have easily isolated lines split those off using:

mkdir horz
find binlog -type f | xargs ./rdlog -stations stationlist.txt -split horz
perl gengp.pl horz pnghorz > horz2png.gp
gnuplot horz2png.gp
this generates 571 seperate files

now check pnghorz/lo/...

i manually selected several files of which i calculated the linear least-squares curve fit using lsq. see tstlsq.sh

with 1 line

with 2 lines

from this i concluded that a average slope of 0.396 was best

the above graphs rotated would look like this:

now split using slope:

mkdir rotated
find binlog -type f | xargs ./rdlog -stations stationlist.txt -split rotated -slope 0.396
perl gengp.pl rotated pngrot > rot2png.gp
gnuplot rot2png.gp
this generates 17849 seperate files


  • http://cakelab.org/~kaner/sputnik_01/
  • http://www.openbeacon.org/ccc-sputnik.0.html
  • http://www.bogomips.w.tkb.pl/sputnik.html
  • http://pmeerw.net/23C3_Sputnik/
  • http://wiki.openbeacon.org/wiki/Datamining